10 KiB
10 KiB
phase, plan, type, wave, depends_on, files_modified, autonomous, delegation
| phase | plan | type | wave | depends_on | files_modified | autonomous | delegation | |
|---|---|---|---|---|---|---|---|---|
| 01-dataforseo-cost-optimization | 01 | execute | 1 |
|
true | off |
Purpose
DataForSEO Organic SERP pricing changed on 2025-09-19 so the base price covers only the first page of 10 organic results; deeper depth values now cost more. The current cron logic uses depth values 30, 50, and 100, which makes many checks substantially more expensive. This plan keeps rank tracking useful while cutting avoidable API spend.
Output
Backend-only changes in autoload/class.Cron.php:
- manual
days_offsetremains an explicit override - phrases without manual interval get an adaptive interval based on recent rank stability
- DataForSEO requests never use
depthabove 50 - deep checks are reduced for stable or low-priority cases
Project Context
@.paul/PROJECT.md @.paul/ROADMAP.md @.paul/STATE.md @.paul/codebase/architecture.md @.paul/codebase/db_schema.md @.paul/codebase/concerns.md
Source Files
@autoload/class.Cron.php @cron.php @dsf.php @autoload/factory/class.Ranker.php
External Pricing Context
- DataForSEO Organic SERP pricing/depth update FAQ: https://dataforseo.com/help-center/serp-api-pricing-depth-update-faq
- DataForSEO SERP additional cost explanation: https://dataforseo.com/help-center/serp-api-cost-explained
- DataForSEO Google Organic Task POST docs: https://docs.dataforseo.com/v3/serp/google/organic/task_post/
<acceptance_criteria>
AC-1: Manual Intervals Stay Authoritative
Given a phrase has `days_offset` set to a positive value
When `Cron::post_phrases_positions_dfs3()` selects phrases for DataForSEO
Then the phrase is eligible only when its manual interval is due
And the adaptive interval does not shorten or override that manual setting
AC-2: Stable Phrases Are Checked Less Often
Given a phrase has no manual `days_offset`
And its recent recorded positions are stable
When the DataForSEO posting cron runs
Then the phrase is not sent every day
And its next eligibility is delayed according to the adaptive stability rule
AC-3: Unstable Or New Phrases Stay Fresh
Given a phrase has no manual `days_offset`
And it is new, missing recent history, or has volatile recent positions
When the DataForSEO posting cron runs
Then the phrase remains eligible more frequently than stable phrases
And rank tracking does not become stale for unstable keywords
AC-4: DataForSEO Depth Is Capped At 50
Given any active phrase selected for a DataForSEO v3 Google Organic task
When the request payload is built
Then the request `depth` is never greater than 50
And no request attempts to check positions beyond top 50
AC-5: Current Data Flow Remains Compatible
Given DataForSEO returns a completed task through the existing postback flow
When `Cron::get_phrases_positions_dfs3()` processes the result
Then position rows continue to be inserted or updated as before
And `last_checked`, `ds_id`, `ds_ready`, and `filled_missing_positions` continue to be maintained
</acceptance_criteria>
Task 1: Add DataForSEO cost policy helpers autoload/class.Cron.php Add small private/static helper methods inside `Cron` for the DataForSEO v3 flow: - `getDfsRecentPositions($phrase_id, $limit)` to read the latest non-empty positions from `pro_rr_phrases_positions`. - `getDfsPositionVolatility($positions)` to classify recent movement using simple absolute deltas. - `getDfsAdaptiveIntervalDays($row, $positions)` to return the automatic interval for phrases where `days_offset` is empty. - `getDfsDepth($last_position, $positions)` to return a capped depth.Policy to implement:
- Hard cap: `depth <= 50` always.
- If no previous position exists: use `depth = 50`, because first discovery still needs a reasonable search window but must not exceed top 50.
- Last position 1-10: use `depth = 10` for stable phrases, `depth = 20` for volatile phrases.
- Last position 11-20: use `depth = 20` for stable phrases, `depth = 30` for volatile phrases.
- Last position 21-50: use `depth = 50`.
- Last position >50: use `depth = 50`; if not found again, store/handle as not found according to the existing result behavior.
Adaptive interval policy:
- Missing or short history: 1 day.
- Volatile phrase, e.g. max movement in recent checks greater than 5 positions: 1 day.
- Mild movement, e.g. max movement 2-5 positions: 2 days.
- Stable top 10, e.g. max movement 0-1 position across at least 5 recent checks: 3 days.
- Stable positions 11-50 across at least 5 recent checks: 5 days.
- Keep thresholds as named local constants or clearly named helper variables, not scattered magic numbers.
Avoid:
- Adding a new database column in this plan; use existing `days_offset`, `last_checked`, and position history.
- Introducing Composer or a new framework.
- Changing credentials handling in this plan; it is known debt but separate from cost control.
php -l autoload/class.Cron.php
AC-2, AC-3, and AC-4 have clear helper logic available for the cron selection and payload code.
Task 2: Apply adaptive eligibility in phrase selection
autoload/class.Cron.php
Update `Cron::post_phrases_positions_dfs3()` so it no longer blindly selects the first active unchecked phrase and sends it daily.
Required behavior:
- Keep all existing active date filters for phrase/site `date_start` and `date_end`.
- Keep `ds_id IS NULL` so already posted tasks are not duplicated.
- Preserve manual `days_offset`: if present, the phrase is due only when `DATE_ADD(last_checked, INTERVAL days_offset DAY) <= CURRENT_DATE`, plus the existing `last_checked = '2012-01-01'` refresh behavior.
- For phrases with empty `days_offset`, compute automatic due status from recent positions and `last_checked`.
- Ensure an ineligible stable phrase cannot block later eligible phrases. If SQL cannot express the adaptive rule cleanly, select a bounded candidate pool ordered by site/name/phrase and iterate in PHP until the first due phrase is found.
- If no due phrase exists, return `[ 'status' => 'empty' ]` as before.
The candidate-pool approach is acceptable and preferred over risky MySQL 5 window-function assumptions. Keep the pool bounded, e.g. 100-300 active candidates, so a cron hit remains cheap.
php -l autoload/class.Cron.php
AC-1, AC-2, and AC-3 satisfied: manual intervals are respected, stable automatic phrases are skipped until due, and eligible later phrases are not blocked.
Task 3: Build cheaper DataForSEO payloads
autoload/class.Cron.php
Replace the current `depth` calculation in `Cron::post_phrases_positions_dfs3()`:
- Remove the current 30/50/100 ladder.
- Use `getDfsDepth()` from Task 1.
- Ensure both localization branches use the same payload policy.
- Keep `priority => 1`, `language_code => "pl"`, `postback_data => "advanced"`, and the existing postback URL flow unchanged.
- Include the computed interval/depth in the returned cron message so operations can see why a phrase was sent, e.g. depth and whether interval is manual or adaptive.
Also harden result processing only where it directly protects the new policy:
- Initialize `$phrase_position` and `$site_url` before looping through result items.
- If the domain is not found in returned top 50, store position `0` or the existing "not found" representation used by the app, without PHP notices.
- Do not expand result retrieval beyond the posted task result.
php -l autoload/class.Cron.php
AC-4 and AC-5 satisfied: no request exceeds depth 50 and the existing DataForSEO post/get lifecycle still writes positions correctly.
DO NOT CHANGE
- Do not add or run database migrations in this plan.
- Do not modify
dsf.phpunless implementation discovers the current postback marker is incompatible with the unchanged flow. - Do not change DataForSEO credentials or move secrets in this plan.
- Do not replace DataForSEO with another provider in this plan.
- Do not change UI templates or admin panels.
SCOPE LIMITS
- This is backend cost control only.
- The maximum tracked position becomes top 50 for DataForSEO checks.
- Existing historical positions beyond top 50 are not rewritten.
- Security debt listed in
.paul/codebase/concerns.mdremains deferred unless it directly blocks this plan.
<success_criteria>
- DataForSEO v3 Google Organic task payloads never request more than top 50.
- Phrases with manual
days_offsetkeep their configured schedule. - Phrases without manual
days_offsetget adaptive scheduling based on recent stability. - Cron still returns
okwhen a task is posted andemptywhen nothing is due. - Existing DataForSEO postback/get result flow continues to update ranking tables. </success_criteria>