220 lines
10 KiB
Markdown
220 lines
10 KiB
Markdown
---
|
|
phase: 01-dataforseo-cost-optimization
|
|
plan: 01
|
|
type: execute
|
|
wave: 1
|
|
depends_on: []
|
|
files_modified:
|
|
- autoload/class.Cron.php
|
|
autonomous: true
|
|
delegation: off
|
|
---
|
|
|
|
<objective>
|
|
## Goal
|
|
Reduce DataForSEO spending for Google rank checks by making cron scheduling and SERP depth adaptive, with a hard maximum of top 50 results.
|
|
|
|
## Purpose
|
|
DataForSEO Organic SERP pricing changed on 2025-09-19 so the base price covers only the first page of 10 organic results; deeper `depth` values now cost more. The current cron logic uses `depth` values 30, 50, and 100, which makes many checks substantially more expensive. This plan keeps rank tracking useful while cutting avoidable API spend.
|
|
|
|
## Output
|
|
Backend-only changes in `autoload/class.Cron.php`:
|
|
- manual `days_offset` remains an explicit override
|
|
- phrases without manual interval get an adaptive interval based on recent rank stability
|
|
- DataForSEO requests never use `depth` above 50
|
|
- deep checks are reduced for stable or low-priority cases
|
|
</objective>
|
|
|
|
<context>
|
|
<clarifications>
|
|
- **Depth policy** - Jak agresywnie ciac glebokosc sprawdzania DataForSEO?
|
|
-> Odpowiedz: Uzytkownik potrzebowal rozpisania. Przyjeto adaptacyjna polityke kosztowa oraz twardy limit: nie sprawdzamy dalej niz do 50 pozycji.
|
|
- **Intervals** - Czy mozemy zmieniac czestotliwosc sprawdzania fraz przez `days_offset`?
|
|
-> Odpowiedz: Tak, jesli powstanie mechanizm automatycznie sprawdzajacy rzadziej stabilne frazy, a czesciej niestabilne, z uwzglednieniem fraz majacych na sztywno wpisany interwal.
|
|
- **Scope** - Czy plan ma obejmowac panel administracyjny?
|
|
-> Odpowiedz: Moze to byc backend-only.
|
|
</clarifications>
|
|
|
|
## Project Context
|
|
@.paul/PROJECT.md
|
|
@.paul/ROADMAP.md
|
|
@.paul/STATE.md
|
|
@.paul/codebase/architecture.md
|
|
@.paul/codebase/db_schema.md
|
|
@.paul/codebase/concerns.md
|
|
|
|
## Source Files
|
|
@autoload/class.Cron.php
|
|
@cron.php
|
|
@dsf.php
|
|
@autoload/factory/class.Ranker.php
|
|
|
|
## External Pricing Context
|
|
- DataForSEO Organic SERP pricing/depth update FAQ: https://dataforseo.com/help-center/serp-api-pricing-depth-update-faq
|
|
- DataForSEO SERP additional cost explanation: https://dataforseo.com/help-center/serp-api-cost-explained
|
|
- DataForSEO Google Organic Task POST docs: https://docs.dataforseo.com/v3/serp/google/organic/task_post/
|
|
</context>
|
|
|
|
<acceptance_criteria>
|
|
|
|
## AC-1: Manual Intervals Stay Authoritative
|
|
```gherkin
|
|
Given a phrase has `days_offset` set to a positive value
|
|
When `Cron::post_phrases_positions_dfs3()` selects phrases for DataForSEO
|
|
Then the phrase is eligible only when its manual interval is due
|
|
And the adaptive interval does not shorten or override that manual setting
|
|
```
|
|
|
|
## AC-2: Stable Phrases Are Checked Less Often
|
|
```gherkin
|
|
Given a phrase has no manual `days_offset`
|
|
And its recent recorded positions are stable
|
|
When the DataForSEO posting cron runs
|
|
Then the phrase is not sent every day
|
|
And its next eligibility is delayed according to the adaptive stability rule
|
|
```
|
|
|
|
## AC-3: Unstable Or New Phrases Stay Fresh
|
|
```gherkin
|
|
Given a phrase has no manual `days_offset`
|
|
And it is new, missing recent history, or has volatile recent positions
|
|
When the DataForSEO posting cron runs
|
|
Then the phrase remains eligible more frequently than stable phrases
|
|
And rank tracking does not become stale for unstable keywords
|
|
```
|
|
|
|
## AC-4: DataForSEO Depth Is Capped At 50
|
|
```gherkin
|
|
Given any active phrase selected for a DataForSEO v3 Google Organic task
|
|
When the request payload is built
|
|
Then the request `depth` is never greater than 50
|
|
And no request attempts to check positions beyond top 50
|
|
```
|
|
|
|
## AC-5: Current Data Flow Remains Compatible
|
|
```gherkin
|
|
Given DataForSEO returns a completed task through the existing postback flow
|
|
When `Cron::get_phrases_positions_dfs3()` processes the result
|
|
Then position rows continue to be inserted or updated as before
|
|
And `last_checked`, `ds_id`, `ds_ready`, and `filled_missing_positions` continue to be maintained
|
|
```
|
|
|
|
</acceptance_criteria>
|
|
|
|
<tasks>
|
|
|
|
<task type="auto">
|
|
<name>Task 1: Add DataForSEO cost policy helpers</name>
|
|
<files>autoload/class.Cron.php</files>
|
|
<action>
|
|
Add small private/static helper methods inside `Cron` for the DataForSEO v3 flow:
|
|
- `getDfsRecentPositions($phrase_id, $limit)` to read the latest non-empty positions from `pro_rr_phrases_positions`.
|
|
- `getDfsPositionVolatility($positions)` to classify recent movement using simple absolute deltas.
|
|
- `getDfsAdaptiveIntervalDays($row, $positions)` to return the automatic interval for phrases where `days_offset` is empty.
|
|
- `getDfsDepth($last_position, $positions)` to return a capped depth.
|
|
|
|
Policy to implement:
|
|
- Hard cap: `depth <= 50` always.
|
|
- If no previous position exists: use `depth = 50`, because first discovery still needs a reasonable search window but must not exceed top 50.
|
|
- Last position 1-10: use `depth = 10` for stable phrases, `depth = 20` for volatile phrases.
|
|
- Last position 11-20: use `depth = 20` for stable phrases, `depth = 30` for volatile phrases.
|
|
- Last position 21-50: use `depth = 50`.
|
|
- Last position >50: use `depth = 50`; if not found again, store/handle as not found according to the existing result behavior.
|
|
|
|
Adaptive interval policy:
|
|
- Missing or short history: 1 day.
|
|
- Volatile phrase, e.g. max movement in recent checks greater than 5 positions: 1 day.
|
|
- Mild movement, e.g. max movement 2-5 positions: 2 days.
|
|
- Stable top 10, e.g. max movement 0-1 position across at least 5 recent checks: 3 days.
|
|
- Stable positions 11-50 across at least 5 recent checks: 5 days.
|
|
- Keep thresholds as named local constants or clearly named helper variables, not scattered magic numbers.
|
|
|
|
Avoid:
|
|
- Adding a new database column in this plan; use existing `days_offset`, `last_checked`, and position history.
|
|
- Introducing Composer or a new framework.
|
|
- Changing credentials handling in this plan; it is known debt but separate from cost control.
|
|
</action>
|
|
<verify>php -l autoload/class.Cron.php</verify>
|
|
<done>AC-2, AC-3, and AC-4 have clear helper logic available for the cron selection and payload code.</done>
|
|
</task>
|
|
|
|
<task type="auto">
|
|
<name>Task 2: Apply adaptive eligibility in phrase selection</name>
|
|
<files>autoload/class.Cron.php</files>
|
|
<action>
|
|
Update `Cron::post_phrases_positions_dfs3()` so it no longer blindly selects the first active unchecked phrase and sends it daily.
|
|
|
|
Required behavior:
|
|
- Keep all existing active date filters for phrase/site `date_start` and `date_end`.
|
|
- Keep `ds_id IS NULL` so already posted tasks are not duplicated.
|
|
- Preserve manual `days_offset`: if present, the phrase is due only when `DATE_ADD(last_checked, INTERVAL days_offset DAY) <= CURRENT_DATE`, plus the existing `last_checked = '2012-01-01'` refresh behavior.
|
|
- For phrases with empty `days_offset`, compute automatic due status from recent positions and `last_checked`.
|
|
- Ensure an ineligible stable phrase cannot block later eligible phrases. If SQL cannot express the adaptive rule cleanly, select a bounded candidate pool ordered by site/name/phrase and iterate in PHP until the first due phrase is found.
|
|
- If no due phrase exists, return `[ 'status' => 'empty' ]` as before.
|
|
|
|
The candidate-pool approach is acceptable and preferred over risky MySQL 5 window-function assumptions. Keep the pool bounded, e.g. 100-300 active candidates, so a cron hit remains cheap.
|
|
</action>
|
|
<verify>php -l autoload/class.Cron.php</verify>
|
|
<done>AC-1, AC-2, and AC-3 satisfied: manual intervals are respected, stable automatic phrases are skipped until due, and eligible later phrases are not blocked.</done>
|
|
</task>
|
|
|
|
<task type="auto">
|
|
<name>Task 3: Build cheaper DataForSEO payloads</name>
|
|
<files>autoload/class.Cron.php</files>
|
|
<action>
|
|
Replace the current `depth` calculation in `Cron::post_phrases_positions_dfs3()`:
|
|
- Remove the current 30/50/100 ladder.
|
|
- Use `getDfsDepth()` from Task 1.
|
|
- Ensure both localization branches use the same payload policy.
|
|
- Keep `priority => 1`, `language_code => "pl"`, `postback_data => "advanced"`, and the existing postback URL flow unchanged.
|
|
- Include the computed interval/depth in the returned cron message so operations can see why a phrase was sent, e.g. depth and whether interval is manual or adaptive.
|
|
|
|
Also harden result processing only where it directly protects the new policy:
|
|
- Initialize `$phrase_position` and `$site_url` before looping through result items.
|
|
- If the domain is not found in returned top 50, store position `0` or the existing "not found" representation used by the app, without PHP notices.
|
|
- Do not expand result retrieval beyond the posted task result.
|
|
</action>
|
|
<verify>php -l autoload/class.Cron.php</verify>
|
|
<done>AC-4 and AC-5 satisfied: no request exceeds depth 50 and the existing DataForSEO post/get lifecycle still writes positions correctly.</done>
|
|
</task>
|
|
|
|
</tasks>
|
|
|
|
<boundaries>
|
|
|
|
## DO NOT CHANGE
|
|
- Do not add or run database migrations in this plan.
|
|
- Do not modify `dsf.php` unless implementation discovers the current postback marker is incompatible with the unchanged flow.
|
|
- Do not change DataForSEO credentials or move secrets in this plan.
|
|
- Do not replace DataForSEO with another provider in this plan.
|
|
- Do not change UI templates or admin panels.
|
|
|
|
## SCOPE LIMITS
|
|
- This is backend cost control only.
|
|
- The maximum tracked position becomes top 50 for DataForSEO checks.
|
|
- Existing historical positions beyond top 50 are not rewritten.
|
|
- Security debt listed in `.paul/codebase/concerns.md` remains deferred unless it directly blocks this plan.
|
|
|
|
</boundaries>
|
|
|
|
<verification>
|
|
Before declaring plan complete:
|
|
- [ ] `php -l autoload/class.Cron.php`
|
|
- [ ] Review `Cron::post_phrases_positions_dfs3()` and confirm `depth` cannot exceed 50.
|
|
- [ ] Review manual `days_offset` path and confirm it remains authoritative.
|
|
- [ ] Review automatic interval path and confirm stable phrases cannot block other due candidates.
|
|
- [ ] Review `Cron::get_phrases_positions_dfs3()` and confirm no undefined-position notices are introduced for not-found-in-top-50 results.
|
|
</verification>
|
|
|
|
<success_criteria>
|
|
- DataForSEO v3 Google Organic task payloads never request more than top 50.
|
|
- Phrases with manual `days_offset` keep their configured schedule.
|
|
- Phrases without manual `days_offset` get adaptive scheduling based on recent stability.
|
|
- Cron still returns `ok` when a task is posted and `empty` when nothing is due.
|
|
- Existing DataForSEO postback/get result flow continues to update ranking tables.
|
|
</success_criteria>
|
|
|
|
<output>
|
|
After completion, create `.paul/phases/01-dataforseo-cost-optimization/01-01-SUMMARY.md`.
|
|
</output>
|