--- phase: 01-dataforseo-cost-optimization plan: 01 type: execute wave: 1 depends_on: [] files_modified: - autoload/class.Cron.php autonomous: true delegation: off --- ## Goal Reduce DataForSEO spending for Google rank checks by making cron scheduling and SERP depth adaptive, with a hard maximum of top 50 results. ## Purpose DataForSEO Organic SERP pricing changed on 2025-09-19 so the base price covers only the first page of 10 organic results; deeper `depth` values now cost more. The current cron logic uses `depth` values 30, 50, and 100, which makes many checks substantially more expensive. This plan keeps rank tracking useful while cutting avoidable API spend. ## Output Backend-only changes in `autoload/class.Cron.php`: - manual `days_offset` remains an explicit override - phrases without manual interval get an adaptive interval based on recent rank stability - DataForSEO requests never use `depth` above 50 - deep checks are reduced for stable or low-priority cases - **Depth policy** - Jak agresywnie ciac glebokosc sprawdzania DataForSEO? -> Odpowiedz: Uzytkownik potrzebowal rozpisania. Przyjeto adaptacyjna polityke kosztowa oraz twardy limit: nie sprawdzamy dalej niz do 50 pozycji. - **Intervals** - Czy mozemy zmieniac czestotliwosc sprawdzania fraz przez `days_offset`? -> Odpowiedz: Tak, jesli powstanie mechanizm automatycznie sprawdzajacy rzadziej stabilne frazy, a czesciej niestabilne, z uwzglednieniem fraz majacych na sztywno wpisany interwal. - **Scope** - Czy plan ma obejmowac panel administracyjny? -> Odpowiedz: Moze to byc backend-only. ## Project Context @.paul/PROJECT.md @.paul/ROADMAP.md @.paul/STATE.md @.paul/codebase/architecture.md @.paul/codebase/db_schema.md @.paul/codebase/concerns.md ## Source Files @autoload/class.Cron.php @cron.php @dsf.php @autoload/factory/class.Ranker.php ## External Pricing Context - DataForSEO Organic SERP pricing/depth update FAQ: https://dataforseo.com/help-center/serp-api-pricing-depth-update-faq - DataForSEO SERP additional cost explanation: https://dataforseo.com/help-center/serp-api-cost-explained - DataForSEO Google Organic Task POST docs: https://docs.dataforseo.com/v3/serp/google/organic/task_post/ ## AC-1: Manual Intervals Stay Authoritative ```gherkin Given a phrase has `days_offset` set to a positive value When `Cron::post_phrases_positions_dfs3()` selects phrases for DataForSEO Then the phrase is eligible only when its manual interval is due And the adaptive interval does not shorten or override that manual setting ``` ## AC-2: Stable Phrases Are Checked Less Often ```gherkin Given a phrase has no manual `days_offset` And its recent recorded positions are stable When the DataForSEO posting cron runs Then the phrase is not sent every day And its next eligibility is delayed according to the adaptive stability rule ``` ## AC-3: Unstable Or New Phrases Stay Fresh ```gherkin Given a phrase has no manual `days_offset` And it is new, missing recent history, or has volatile recent positions When the DataForSEO posting cron runs Then the phrase remains eligible more frequently than stable phrases And rank tracking does not become stale for unstable keywords ``` ## AC-4: DataForSEO Depth Is Capped At 50 ```gherkin Given any active phrase selected for a DataForSEO v3 Google Organic task When the request payload is built Then the request `depth` is never greater than 50 And no request attempts to check positions beyond top 50 ``` ## AC-5: Current Data Flow Remains Compatible ```gherkin Given DataForSEO returns a completed task through the existing postback flow When `Cron::get_phrases_positions_dfs3()` processes the result Then position rows continue to be inserted or updated as before And `last_checked`, `ds_id`, `ds_ready`, and `filled_missing_positions` continue to be maintained ``` Task 1: Add DataForSEO cost policy helpers autoload/class.Cron.php Add small private/static helper methods inside `Cron` for the DataForSEO v3 flow: - `getDfsRecentPositions($phrase_id, $limit)` to read the latest non-empty positions from `pro_rr_phrases_positions`. - `getDfsPositionVolatility($positions)` to classify recent movement using simple absolute deltas. - `getDfsAdaptiveIntervalDays($row, $positions)` to return the automatic interval for phrases where `days_offset` is empty. - `getDfsDepth($last_position, $positions)` to return a capped depth. Policy to implement: - Hard cap: `depth <= 50` always. - If no previous position exists: use `depth = 50`, because first discovery still needs a reasonable search window but must not exceed top 50. - Last position 1-10: use `depth = 10` for stable phrases, `depth = 20` for volatile phrases. - Last position 11-20: use `depth = 20` for stable phrases, `depth = 30` for volatile phrases. - Last position 21-50: use `depth = 50`. - Last position >50: use `depth = 50`; if not found again, store/handle as not found according to the existing result behavior. Adaptive interval policy: - Missing or short history: 1 day. - Volatile phrase, e.g. max movement in recent checks greater than 5 positions: 1 day. - Mild movement, e.g. max movement 2-5 positions: 2 days. - Stable top 10, e.g. max movement 0-1 position across at least 5 recent checks: 3 days. - Stable positions 11-50 across at least 5 recent checks: 5 days. - Keep thresholds as named local constants or clearly named helper variables, not scattered magic numbers. Avoid: - Adding a new database column in this plan; use existing `days_offset`, `last_checked`, and position history. - Introducing Composer or a new framework. - Changing credentials handling in this plan; it is known debt but separate from cost control. php -l autoload/class.Cron.php AC-2, AC-3, and AC-4 have clear helper logic available for the cron selection and payload code. Task 2: Apply adaptive eligibility in phrase selection autoload/class.Cron.php Update `Cron::post_phrases_positions_dfs3()` so it no longer blindly selects the first active unchecked phrase and sends it daily. Required behavior: - Keep all existing active date filters for phrase/site `date_start` and `date_end`. - Keep `ds_id IS NULL` so already posted tasks are not duplicated. - Preserve manual `days_offset`: if present, the phrase is due only when `DATE_ADD(last_checked, INTERVAL days_offset DAY) <= CURRENT_DATE`, plus the existing `last_checked = '2012-01-01'` refresh behavior. - For phrases with empty `days_offset`, compute automatic due status from recent positions and `last_checked`. - Ensure an ineligible stable phrase cannot block later eligible phrases. If SQL cannot express the adaptive rule cleanly, select a bounded candidate pool ordered by site/name/phrase and iterate in PHP until the first due phrase is found. - If no due phrase exists, return `[ 'status' => 'empty' ]` as before. The candidate-pool approach is acceptable and preferred over risky MySQL 5 window-function assumptions. Keep the pool bounded, e.g. 100-300 active candidates, so a cron hit remains cheap. php -l autoload/class.Cron.php AC-1, AC-2, and AC-3 satisfied: manual intervals are respected, stable automatic phrases are skipped until due, and eligible later phrases are not blocked. Task 3: Build cheaper DataForSEO payloads autoload/class.Cron.php Replace the current `depth` calculation in `Cron::post_phrases_positions_dfs3()`: - Remove the current 30/50/100 ladder. - Use `getDfsDepth()` from Task 1. - Ensure both localization branches use the same payload policy. - Keep `priority => 1`, `language_code => "pl"`, `postback_data => "advanced"`, and the existing postback URL flow unchanged. - Include the computed interval/depth in the returned cron message so operations can see why a phrase was sent, e.g. depth and whether interval is manual or adaptive. Also harden result processing only where it directly protects the new policy: - Initialize `$phrase_position` and `$site_url` before looping through result items. - If the domain is not found in returned top 50, store position `0` or the existing "not found" representation used by the app, without PHP notices. - Do not expand result retrieval beyond the posted task result. php -l autoload/class.Cron.php AC-4 and AC-5 satisfied: no request exceeds depth 50 and the existing DataForSEO post/get lifecycle still writes positions correctly. ## DO NOT CHANGE - Do not add or run database migrations in this plan. - Do not modify `dsf.php` unless implementation discovers the current postback marker is incompatible with the unchanged flow. - Do not change DataForSEO credentials or move secrets in this plan. - Do not replace DataForSEO with another provider in this plan. - Do not change UI templates or admin panels. ## SCOPE LIMITS - This is backend cost control only. - The maximum tracked position becomes top 50 for DataForSEO checks. - Existing historical positions beyond top 50 are not rewritten. - Security debt listed in `.paul/codebase/concerns.md` remains deferred unless it directly blocks this plan. Before declaring plan complete: - [ ] `php -l autoload/class.Cron.php` - [ ] Review `Cron::post_phrases_positions_dfs3()` and confirm `depth` cannot exceed 50. - [ ] Review manual `days_offset` path and confirm it remains authoritative. - [ ] Review automatic interval path and confirm stable phrases cannot block other due candidates. - [ ] Review `Cron::get_phrases_positions_dfs3()` and confirm no undefined-position notices are introduced for not-found-in-top-50 results. - DataForSEO v3 Google Organic task payloads never request more than top 50. - Phrases with manual `days_offset` keep their configured schedule. - Phrases without manual `days_offset` get adaptive scheduling based on recent stability. - Cron still returns `ok` when a task is posted and `empty` when nothing is due. - Existing DataForSEO postback/get result flow continues to update ranking tables. After completion, create `.paul/phases/01-dataforseo-cost-optimization/01-01-SUMMARY.md`.