Files
rank24.pl/.paul/phases/01-dataforseo-cost-optimization/01-01-PLAN.md
2026-05-05 20:31:55 +02:00

10 KiB

phase, plan, type, wave, depends_on, files_modified, autonomous, delegation
phase plan type wave depends_on files_modified autonomous delegation
01-dataforseo-cost-optimization 01 execute 1
autoload/class.Cron.php
true off
## Goal Reduce DataForSEO spending for Google rank checks by making cron scheduling and SERP depth adaptive, with a hard maximum of top 50 results.

Purpose

DataForSEO Organic SERP pricing changed on 2025-09-19 so the base price covers only the first page of 10 organic results; deeper depth values now cost more. The current cron logic uses depth values 30, 50, and 100, which makes many checks substantially more expensive. This plan keeps rank tracking useful while cutting avoidable API spend.

Output

Backend-only changes in autoload/class.Cron.php:

  • manual days_offset remains an explicit override
  • phrases without manual interval get an adaptive interval based on recent rank stability
  • DataForSEO requests never use depth above 50
  • deep checks are reduced for stable or low-priority cases
- **Depth policy** - Jak agresywnie ciac glebokosc sprawdzania DataForSEO? -> Odpowiedz: Uzytkownik potrzebowal rozpisania. Przyjeto adaptacyjna polityke kosztowa oraz twardy limit: nie sprawdzamy dalej niz do 50 pozycji. - **Intervals** - Czy mozemy zmieniac czestotliwosc sprawdzania fraz przez `days_offset`? -> Odpowiedz: Tak, jesli powstanie mechanizm automatycznie sprawdzajacy rzadziej stabilne frazy, a czesciej niestabilne, z uwzglednieniem fraz majacych na sztywno wpisany interwal. - **Scope** - Czy plan ma obejmowac panel administracyjny? -> Odpowiedz: Moze to byc backend-only.

Project Context

@.paul/PROJECT.md @.paul/ROADMAP.md @.paul/STATE.md @.paul/codebase/architecture.md @.paul/codebase/db_schema.md @.paul/codebase/concerns.md

Source Files

@autoload/class.Cron.php @cron.php @dsf.php @autoload/factory/class.Ranker.php

External Pricing Context

<acceptance_criteria>

AC-1: Manual Intervals Stay Authoritative

Given a phrase has `days_offset` set to a positive value
When `Cron::post_phrases_positions_dfs3()` selects phrases for DataForSEO
Then the phrase is eligible only when its manual interval is due
And the adaptive interval does not shorten or override that manual setting

AC-2: Stable Phrases Are Checked Less Often

Given a phrase has no manual `days_offset`
And its recent recorded positions are stable
When the DataForSEO posting cron runs
Then the phrase is not sent every day
And its next eligibility is delayed according to the adaptive stability rule

AC-3: Unstable Or New Phrases Stay Fresh

Given a phrase has no manual `days_offset`
And it is new, missing recent history, or has volatile recent positions
When the DataForSEO posting cron runs
Then the phrase remains eligible more frequently than stable phrases
And rank tracking does not become stale for unstable keywords

AC-4: DataForSEO Depth Is Capped At 50

Given any active phrase selected for a DataForSEO v3 Google Organic task
When the request payload is built
Then the request `depth` is never greater than 50
And no request attempts to check positions beyond top 50

AC-5: Current Data Flow Remains Compatible

Given DataForSEO returns a completed task through the existing postback flow
When `Cron::get_phrases_positions_dfs3()` processes the result
Then position rows continue to be inserted or updated as before
And `last_checked`, `ds_id`, `ds_ready`, and `filled_missing_positions` continue to be maintained

</acceptance_criteria>

Task 1: Add DataForSEO cost policy helpers autoload/class.Cron.php Add small private/static helper methods inside `Cron` for the DataForSEO v3 flow: - `getDfsRecentPositions($phrase_id, $limit)` to read the latest non-empty positions from `pro_rr_phrases_positions`. - `getDfsPositionVolatility($positions)` to classify recent movement using simple absolute deltas. - `getDfsAdaptiveIntervalDays($row, $positions)` to return the automatic interval for phrases where `days_offset` is empty. - `getDfsDepth($last_position, $positions)` to return a capped depth.
Policy to implement:
- Hard cap: `depth <= 50` always.
- If no previous position exists: use `depth = 50`, because first discovery still needs a reasonable search window but must not exceed top 50.
- Last position 1-10: use `depth = 10` for stable phrases, `depth = 20` for volatile phrases.
- Last position 11-20: use `depth = 20` for stable phrases, `depth = 30` for volatile phrases.
- Last position 21-50: use `depth = 50`.
- Last position >50: use `depth = 50`; if not found again, store/handle as not found according to the existing result behavior.

Adaptive interval policy:
- Missing or short history: 1 day.
- Volatile phrase, e.g. max movement in recent checks greater than 5 positions: 1 day.
- Mild movement, e.g. max movement 2-5 positions: 2 days.
- Stable top 10, e.g. max movement 0-1 position across at least 5 recent checks: 3 days.
- Stable positions 11-50 across at least 5 recent checks: 5 days.
- Keep thresholds as named local constants or clearly named helper variables, not scattered magic numbers.

Avoid:
- Adding a new database column in this plan; use existing `days_offset`, `last_checked`, and position history.
- Introducing Composer or a new framework.
- Changing credentials handling in this plan; it is known debt but separate from cost control.
php -l autoload/class.Cron.php AC-2, AC-3, and AC-4 have clear helper logic available for the cron selection and payload code. Task 2: Apply adaptive eligibility in phrase selection autoload/class.Cron.php Update `Cron::post_phrases_positions_dfs3()` so it no longer blindly selects the first active unchecked phrase and sends it daily.
Required behavior:
- Keep all existing active date filters for phrase/site `date_start` and `date_end`.
- Keep `ds_id IS NULL` so already posted tasks are not duplicated.
- Preserve manual `days_offset`: if present, the phrase is due only when `DATE_ADD(last_checked, INTERVAL days_offset DAY) <= CURRENT_DATE`, plus the existing `last_checked = '2012-01-01'` refresh behavior.
- For phrases with empty `days_offset`, compute automatic due status from recent positions and `last_checked`.
- Ensure an ineligible stable phrase cannot block later eligible phrases. If SQL cannot express the adaptive rule cleanly, select a bounded candidate pool ordered by site/name/phrase and iterate in PHP until the first due phrase is found.
- If no due phrase exists, return `[ 'status' => 'empty' ]` as before.

The candidate-pool approach is acceptable and preferred over risky MySQL 5 window-function assumptions. Keep the pool bounded, e.g. 100-300 active candidates, so a cron hit remains cheap.
php -l autoload/class.Cron.php AC-1, AC-2, and AC-3 satisfied: manual intervals are respected, stable automatic phrases are skipped until due, and eligible later phrases are not blocked. Task 3: Build cheaper DataForSEO payloads autoload/class.Cron.php Replace the current `depth` calculation in `Cron::post_phrases_positions_dfs3()`: - Remove the current 30/50/100 ladder. - Use `getDfsDepth()` from Task 1. - Ensure both localization branches use the same payload policy. - Keep `priority => 1`, `language_code => "pl"`, `postback_data => "advanced"`, and the existing postback URL flow unchanged. - Include the computed interval/depth in the returned cron message so operations can see why a phrase was sent, e.g. depth and whether interval is manual or adaptive.
Also harden result processing only where it directly protects the new policy:
- Initialize `$phrase_position` and `$site_url` before looping through result items.
- If the domain is not found in returned top 50, store position `0` or the existing "not found" representation used by the app, without PHP notices.
- Do not expand result retrieval beyond the posted task result.
php -l autoload/class.Cron.php AC-4 and AC-5 satisfied: no request exceeds depth 50 and the existing DataForSEO post/get lifecycle still writes positions correctly.

DO NOT CHANGE

  • Do not add or run database migrations in this plan.
  • Do not modify dsf.php unless implementation discovers the current postback marker is incompatible with the unchanged flow.
  • Do not change DataForSEO credentials or move secrets in this plan.
  • Do not replace DataForSEO with another provider in this plan.
  • Do not change UI templates or admin panels.

SCOPE LIMITS

  • This is backend cost control only.
  • The maximum tracked position becomes top 50 for DataForSEO checks.
  • Existing historical positions beyond top 50 are not rewritten.
  • Security debt listed in .paul/codebase/concerns.md remains deferred unless it directly blocks this plan.
Before declaring plan complete: - [ ] `php -l autoload/class.Cron.php` - [ ] Review `Cron::post_phrases_positions_dfs3()` and confirm `depth` cannot exceed 50. - [ ] Review manual `days_offset` path and confirm it remains authoritative. - [ ] Review automatic interval path and confirm stable phrases cannot block other due candidates. - [ ] Review `Cron::get_phrases_positions_dfs3()` and confirm no undefined-position notices are introduced for not-found-in-top-50 results.

<success_criteria>

  • DataForSEO v3 Google Organic task payloads never request more than top 50.
  • Phrases with manual days_offset keep their configured schedule.
  • Phrases without manual days_offset get adaptive scheduling based on recent stability.
  • Cron still returns ok when a task is posted and empty when nothing is due.
  • Existing DataForSEO postback/get result flow continues to update ranking tables. </success_criteria>
After completion, create `.paul/phases/01-dataforseo-cost-optimization/01-01-SUMMARY.md`.