update
This commit is contained in:
219
.paul/phases/01-dataforseo-cost-optimization/01-01-PLAN.md
Normal file
219
.paul/phases/01-dataforseo-cost-optimization/01-01-PLAN.md
Normal file
@@ -0,0 +1,219 @@
|
||||
---
|
||||
phase: 01-dataforseo-cost-optimization
|
||||
plan: 01
|
||||
type: execute
|
||||
wave: 1
|
||||
depends_on: []
|
||||
files_modified:
|
||||
- autoload/class.Cron.php
|
||||
autonomous: true
|
||||
delegation: off
|
||||
---
|
||||
|
||||
<objective>
|
||||
## Goal
|
||||
Reduce DataForSEO spending for Google rank checks by making cron scheduling and SERP depth adaptive, with a hard maximum of top 50 results.
|
||||
|
||||
## Purpose
|
||||
DataForSEO Organic SERP pricing changed on 2025-09-19 so the base price covers only the first page of 10 organic results; deeper `depth` values now cost more. The current cron logic uses `depth` values 30, 50, and 100, which makes many checks substantially more expensive. This plan keeps rank tracking useful while cutting avoidable API spend.
|
||||
|
||||
## Output
|
||||
Backend-only changes in `autoload/class.Cron.php`:
|
||||
- manual `days_offset` remains an explicit override
|
||||
- phrases without manual interval get an adaptive interval based on recent rank stability
|
||||
- DataForSEO requests never use `depth` above 50
|
||||
- deep checks are reduced for stable or low-priority cases
|
||||
</objective>
|
||||
|
||||
<context>
|
||||
<clarifications>
|
||||
- **Depth policy** - Jak agresywnie ciac glebokosc sprawdzania DataForSEO?
|
||||
-> Odpowiedz: Uzytkownik potrzebowal rozpisania. Przyjeto adaptacyjna polityke kosztowa oraz twardy limit: nie sprawdzamy dalej niz do 50 pozycji.
|
||||
- **Intervals** - Czy mozemy zmieniac czestotliwosc sprawdzania fraz przez `days_offset`?
|
||||
-> Odpowiedz: Tak, jesli powstanie mechanizm automatycznie sprawdzajacy rzadziej stabilne frazy, a czesciej niestabilne, z uwzglednieniem fraz majacych na sztywno wpisany interwal.
|
||||
- **Scope** - Czy plan ma obejmowac panel administracyjny?
|
||||
-> Odpowiedz: Moze to byc backend-only.
|
||||
</clarifications>
|
||||
|
||||
## Project Context
|
||||
@.paul/PROJECT.md
|
||||
@.paul/ROADMAP.md
|
||||
@.paul/STATE.md
|
||||
@.paul/codebase/architecture.md
|
||||
@.paul/codebase/db_schema.md
|
||||
@.paul/codebase/concerns.md
|
||||
|
||||
## Source Files
|
||||
@autoload/class.Cron.php
|
||||
@cron.php
|
||||
@dsf.php
|
||||
@autoload/factory/class.Ranker.php
|
||||
|
||||
## External Pricing Context
|
||||
- DataForSEO Organic SERP pricing/depth update FAQ: https://dataforseo.com/help-center/serp-api-pricing-depth-update-faq
|
||||
- DataForSEO SERP additional cost explanation: https://dataforseo.com/help-center/serp-api-cost-explained
|
||||
- DataForSEO Google Organic Task POST docs: https://docs.dataforseo.com/v3/serp/google/organic/task_post/
|
||||
</context>
|
||||
|
||||
<acceptance_criteria>
|
||||
|
||||
## AC-1: Manual Intervals Stay Authoritative
|
||||
```gherkin
|
||||
Given a phrase has `days_offset` set to a positive value
|
||||
When `Cron::post_phrases_positions_dfs3()` selects phrases for DataForSEO
|
||||
Then the phrase is eligible only when its manual interval is due
|
||||
And the adaptive interval does not shorten or override that manual setting
|
||||
```
|
||||
|
||||
## AC-2: Stable Phrases Are Checked Less Often
|
||||
```gherkin
|
||||
Given a phrase has no manual `days_offset`
|
||||
And its recent recorded positions are stable
|
||||
When the DataForSEO posting cron runs
|
||||
Then the phrase is not sent every day
|
||||
And its next eligibility is delayed according to the adaptive stability rule
|
||||
```
|
||||
|
||||
## AC-3: Unstable Or New Phrases Stay Fresh
|
||||
```gherkin
|
||||
Given a phrase has no manual `days_offset`
|
||||
And it is new, missing recent history, or has volatile recent positions
|
||||
When the DataForSEO posting cron runs
|
||||
Then the phrase remains eligible more frequently than stable phrases
|
||||
And rank tracking does not become stale for unstable keywords
|
||||
```
|
||||
|
||||
## AC-4: DataForSEO Depth Is Capped At 50
|
||||
```gherkin
|
||||
Given any active phrase selected for a DataForSEO v3 Google Organic task
|
||||
When the request payload is built
|
||||
Then the request `depth` is never greater than 50
|
||||
And no request attempts to check positions beyond top 50
|
||||
```
|
||||
|
||||
## AC-5: Current Data Flow Remains Compatible
|
||||
```gherkin
|
||||
Given DataForSEO returns a completed task through the existing postback flow
|
||||
When `Cron::get_phrases_positions_dfs3()` processes the result
|
||||
Then position rows continue to be inserted or updated as before
|
||||
And `last_checked`, `ds_id`, `ds_ready`, and `filled_missing_positions` continue to be maintained
|
||||
```
|
||||
|
||||
</acceptance_criteria>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 1: Add DataForSEO cost policy helpers</name>
|
||||
<files>autoload/class.Cron.php</files>
|
||||
<action>
|
||||
Add small private/static helper methods inside `Cron` for the DataForSEO v3 flow:
|
||||
- `getDfsRecentPositions($phrase_id, $limit)` to read the latest non-empty positions from `pro_rr_phrases_positions`.
|
||||
- `getDfsPositionVolatility($positions)` to classify recent movement using simple absolute deltas.
|
||||
- `getDfsAdaptiveIntervalDays($row, $positions)` to return the automatic interval for phrases where `days_offset` is empty.
|
||||
- `getDfsDepth($last_position, $positions)` to return a capped depth.
|
||||
|
||||
Policy to implement:
|
||||
- Hard cap: `depth <= 50` always.
|
||||
- If no previous position exists: use `depth = 50`, because first discovery still needs a reasonable search window but must not exceed top 50.
|
||||
- Last position 1-10: use `depth = 10` for stable phrases, `depth = 20` for volatile phrases.
|
||||
- Last position 11-20: use `depth = 20` for stable phrases, `depth = 30` for volatile phrases.
|
||||
- Last position 21-50: use `depth = 50`.
|
||||
- Last position >50: use `depth = 50`; if not found again, store/handle as not found according to the existing result behavior.
|
||||
|
||||
Adaptive interval policy:
|
||||
- Missing or short history: 1 day.
|
||||
- Volatile phrase, e.g. max movement in recent checks greater than 5 positions: 1 day.
|
||||
- Mild movement, e.g. max movement 2-5 positions: 2 days.
|
||||
- Stable top 10, e.g. max movement 0-1 position across at least 5 recent checks: 3 days.
|
||||
- Stable positions 11-50 across at least 5 recent checks: 5 days.
|
||||
- Keep thresholds as named local constants or clearly named helper variables, not scattered magic numbers.
|
||||
|
||||
Avoid:
|
||||
- Adding a new database column in this plan; use existing `days_offset`, `last_checked`, and position history.
|
||||
- Introducing Composer or a new framework.
|
||||
- Changing credentials handling in this plan; it is known debt but separate from cost control.
|
||||
</action>
|
||||
<verify>php -l autoload/class.Cron.php</verify>
|
||||
<done>AC-2, AC-3, and AC-4 have clear helper logic available for the cron selection and payload code.</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 2: Apply adaptive eligibility in phrase selection</name>
|
||||
<files>autoload/class.Cron.php</files>
|
||||
<action>
|
||||
Update `Cron::post_phrases_positions_dfs3()` so it no longer blindly selects the first active unchecked phrase and sends it daily.
|
||||
|
||||
Required behavior:
|
||||
- Keep all existing active date filters for phrase/site `date_start` and `date_end`.
|
||||
- Keep `ds_id IS NULL` so already posted tasks are not duplicated.
|
||||
- Preserve manual `days_offset`: if present, the phrase is due only when `DATE_ADD(last_checked, INTERVAL days_offset DAY) <= CURRENT_DATE`, plus the existing `last_checked = '2012-01-01'` refresh behavior.
|
||||
- For phrases with empty `days_offset`, compute automatic due status from recent positions and `last_checked`.
|
||||
- Ensure an ineligible stable phrase cannot block later eligible phrases. If SQL cannot express the adaptive rule cleanly, select a bounded candidate pool ordered by site/name/phrase and iterate in PHP until the first due phrase is found.
|
||||
- If no due phrase exists, return `[ 'status' => 'empty' ]` as before.
|
||||
|
||||
The candidate-pool approach is acceptable and preferred over risky MySQL 5 window-function assumptions. Keep the pool bounded, e.g. 100-300 active candidates, so a cron hit remains cheap.
|
||||
</action>
|
||||
<verify>php -l autoload/class.Cron.php</verify>
|
||||
<done>AC-1, AC-2, and AC-3 satisfied: manual intervals are respected, stable automatic phrases are skipped until due, and eligible later phrases are not blocked.</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 3: Build cheaper DataForSEO payloads</name>
|
||||
<files>autoload/class.Cron.php</files>
|
||||
<action>
|
||||
Replace the current `depth` calculation in `Cron::post_phrases_positions_dfs3()`:
|
||||
- Remove the current 30/50/100 ladder.
|
||||
- Use `getDfsDepth()` from Task 1.
|
||||
- Ensure both localization branches use the same payload policy.
|
||||
- Keep `priority => 1`, `language_code => "pl"`, `postback_data => "advanced"`, and the existing postback URL flow unchanged.
|
||||
- Include the computed interval/depth in the returned cron message so operations can see why a phrase was sent, e.g. depth and whether interval is manual or adaptive.
|
||||
|
||||
Also harden result processing only where it directly protects the new policy:
|
||||
- Initialize `$phrase_position` and `$site_url` before looping through result items.
|
||||
- If the domain is not found in returned top 50, store position `0` or the existing "not found" representation used by the app, without PHP notices.
|
||||
- Do not expand result retrieval beyond the posted task result.
|
||||
</action>
|
||||
<verify>php -l autoload/class.Cron.php</verify>
|
||||
<done>AC-4 and AC-5 satisfied: no request exceeds depth 50 and the existing DataForSEO post/get lifecycle still writes positions correctly.</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<boundaries>
|
||||
|
||||
## DO NOT CHANGE
|
||||
- Do not add or run database migrations in this plan.
|
||||
- Do not modify `dsf.php` unless implementation discovers the current postback marker is incompatible with the unchanged flow.
|
||||
- Do not change DataForSEO credentials or move secrets in this plan.
|
||||
- Do not replace DataForSEO with another provider in this plan.
|
||||
- Do not change UI templates or admin panels.
|
||||
|
||||
## SCOPE LIMITS
|
||||
- This is backend cost control only.
|
||||
- The maximum tracked position becomes top 50 for DataForSEO checks.
|
||||
- Existing historical positions beyond top 50 are not rewritten.
|
||||
- Security debt listed in `.paul/codebase/concerns.md` remains deferred unless it directly blocks this plan.
|
||||
|
||||
</boundaries>
|
||||
|
||||
<verification>
|
||||
Before declaring plan complete:
|
||||
- [ ] `php -l autoload/class.Cron.php`
|
||||
- [ ] Review `Cron::post_phrases_positions_dfs3()` and confirm `depth` cannot exceed 50.
|
||||
- [ ] Review manual `days_offset` path and confirm it remains authoritative.
|
||||
- [ ] Review automatic interval path and confirm stable phrases cannot block other due candidates.
|
||||
- [ ] Review `Cron::get_phrases_positions_dfs3()` and confirm no undefined-position notices are introduced for not-found-in-top-50 results.
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
- DataForSEO v3 Google Organic task payloads never request more than top 50.
|
||||
- Phrases with manual `days_offset` keep their configured schedule.
|
||||
- Phrases without manual `days_offset` get adaptive scheduling based on recent stability.
|
||||
- Cron still returns `ok` when a task is posted and `empty` when nothing is due.
|
||||
- Existing DataForSEO postback/get result flow continues to update ranking tables.
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.paul/phases/01-dataforseo-cost-optimization/01-01-SUMMARY.md`.
|
||||
</output>
|
||||
@@ -0,0 +1,38 @@
|
||||
# Summary: 01-01 DataForSEO Cost Optimization
|
||||
|
||||
## Plan
|
||||
|
||||
`.paul/phases/01-dataforseo-cost-optimization/01-01-PLAN.md`
|
||||
|
||||
## Completed
|
||||
|
||||
- Added DataForSEO cost-policy helpers in `autoload/class.Cron.php`.
|
||||
- Added adaptive interval calculation for phrases without manual `days_offset`.
|
||||
- Preserved manual `days_offset` as the authoritative schedule override.
|
||||
- Changed DataForSEO candidate selection from a single SQL-selected phrase to a bounded candidate pool ordered by oldest `last_checked`, then filtered in PHP.
|
||||
- Removed the old `30/50/100` depth ladder.
|
||||
- Capped all DataForSEO v3 Google Organic requests at top 50.
|
||||
- Hardened result processing so "not found in returned top 50" stores position `0` without relying on an undefined `$site`.
|
||||
- Added depth and interval metadata to the cron success message.
|
||||
|
||||
## Verification
|
||||
|
||||
- `php -l autoload\class.Cron.php` - passed.
|
||||
- Confirmed no `100` literal remains in `autoload/class.Cron.php`.
|
||||
- Confirmed DataForSEO payloads use computed `$depth` from `getDfsDepth()`.
|
||||
- Confirmed both localization branches use the same computed depth.
|
||||
|
||||
## Deviations
|
||||
|
||||
- No SonarQube scan was run because only one source file was modified.
|
||||
- No codebase docs were updated because this plan changed one existing backend file and did not introduce schema, dependency, API, or multi-file architecture changes.
|
||||
|
||||
## Files Modified
|
||||
|
||||
- `autoload/class.Cron.php`
|
||||
- `.paul/STATE.md`
|
||||
- `.paul/phases/01-dataforseo-cost-optimization/01-01-SUMMARY.md`
|
||||
|
||||
## Next
|
||||
|
||||
Run `$paul-unify .paul/phases/01-dataforseo-cost-optimization/01-01-PLAN.md` to reconcile the implementation against the plan and close the loop.
|
||||
Reference in New Issue
Block a user