6.4 KiB
6.4 KiB
phase, plan, subsystem, tags, requires, provides, affects, tech-stack, key-files, key-decisions, patterns-established, duration, started, completed
| phase | plan | subsystem | tags | requires | provides | affects | tech-stack | key-files | key-decisions | patterns-established | duration | started | completed | |||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 01-statlink-autolinking | 01 | seo |
|
|
|
|
|
|
|
|
~4h (initial build) + 2h (bugfix session 2026-04-09) | 2026-04-08 | 2026-04-09T11:15:00Z |
Phase 1 Plan 01: StatLink Auto-Linking Summary
Automated StatLink.pl link management: login, add links for published articles, track lifecycle, remove after 60 days
Performance
| Metric | Value |
|---|---|
| Duration | ~6h total (build + bugfix) |
| Started | 2026-04-08 |
| Completed | 2026-04-09 |
| Tasks | 3 completed |
| Files created | 4 |
| Files modified | 2 |
Acceptance Criteria Results
| Criterion | Status | Notes |
|---|---|---|
| AC-1: Login do StatLink | Pass | Guzzle CookieJar, GET homepage + POST login, verified on production |
| AC-2: Dodawanie linku | Pass | Form POST with CSRF, anchor sanitization, ID extraction from response |
| AC-3: Usuwanie wygasłych | Pass | POST with statlink_id + usun, status tracking |
| AC-4: Cron endpoint | Pass | /statlink/token-run with SEO_TRIGGER_TOKEN, also cron/statlink.php |
| AC-5: Tracking w bazie | Pass | statlink_links table with full lifecycle tracking |
Accomplishments
- StatLink service logs in, adds links, removes expired links end-to-end
- Robust diagnostic logging — every step tracked, errors surfaced in JSON response
- Retry mechanism for failed links with error tracking in database
- Token-secured HTTP endpoint + standalone cron script
Files Created/Modified
| File | Change | Purpose |
|---|---|---|
src/Services/StatLinkService.php |
Created | Core service: login, addLink, removeLink, processNewArticles, retryFailedLinks, removeExpiredLinks |
src/Controllers/StatLinkController.php |
Created | HTTP endpoints: index (admin view), runByToken (cron trigger) |
cron/statlink.php |
Created | Standalone cron script with lock file |
migrations/013_statlink_tracking.sql |
Created | statlink_links table schema |
config/routes.php |
Modified | Added /statlink routes |
src/Core/Controller.php |
Modified | json_encode with JSON_INVALID_UTF8_SUBSTITUTE |
Decisions Made
| Decision | Rationale | Impact |
|---|---|---|
| ASCII-only anchors (transliteration) | StatLink rejects Polish diacritics in anchor field | All anchors auto-sanitized ą→a, ś→s etc. |
| MAX_LINKS_PER_RUN=1 | Avoid StatLink rate limiting | 1 link per cron run, predictable load |
| Timeout 120s per request | StatLink is slow | connect_timeout=60s, timeout=120s |
| set_time_limit(300) | PHP default 30s insufficient | Both controller and cron script |
| JSON_INVALID_UTF8_SUBSTITUTE | Scraped StatLink HTML contains broken UTF-8 | Prevents empty JSON responses |
| findLinkIdInHtml before search | Response HTML already contains new link | Reduces requests, more reliable ID detection |
Deviations from Plan
Summary
| Type | Count | Impact |
|---|---|---|
| Auto-fixed | 3 | Critical — without these fixes, no links were being added |
| Scope additions | 1 | Retry mechanism (not in original plan) |
| Deferred | 1 | No max retry limit |
Auto-fixed Issues
1. Anchor encoding — StatLink rejects Polish characters
- Found during: Production testing
- Issue: StatLink form validation requires ASCII-only anchors (alphanumeric +
.,+-_?!&\:=+ space) - Fix: Added
sanitizeAnchor()with Polish→ASCII transliteration map - Files:
src/Services/StatLinkService.php - Verification: Links now added successfully with sanitized anchors
2. Empty JSON responses from scraped HTML
- Found during: Production debugging
- Issue:
json_encode()returnsfalse(output: nothing) when data contains invalid UTF-8 from StatLink HTML - Fix: Added
JSON_INVALID_UTF8_SUBSTITUTE | JSON_UNESCAPED_UNICODEflags - Files:
src/Core/Controller.php - Verification: All endpoints return valid JSON
3. StatLink ID not detected after successful add
- Found during: Production testing
- Issue:
findLinkIdBySearchmade separate request, URL matching was too narrow (no protocol variants, small region) - Fix: New
findLinkIdInHtml()extracts ID directly from form response HTML with wider region and URL variants - Files:
src/Services/StatLinkService.php - Verification:
statlink_id=2673465correctly detected
Deferred Items
- No max retry count for permanently failing links (could block queue)
- StatLink cron not integrated into main publish cron — needs separate cron job setup on server
Issues Encountered
| Issue | Resolution |
|---|---|
| OPcache serving stale files after FTP upload | Manual opcache_reset() via test script; documented in patterns |
| PHP max_execution_time killing script | Added set_time_limit(300) in controller and cron |
| Login diagnostics missing on failure | Added loginDiagnostic in all error paths (empty credentials, exceptions) |
Next Phase Readiness
Ready:
- StatLink service fully operational, links being added and tracked
- 37 failed links queued for retry (will auto-process via cron)
- Admin panel view exists at /statlink
Concerns:
- No max retry limit — a permanently failing link blocks the queue
- Cron not yet configured on server (only manual token URL trigger)
Blockers:
- None
Phase: 01-statlink-autolinking, Plan: 01 Completed: 2026-04-09