Magellan Pilot 21 — magellan-backups v1.0.0
Run ID : 2026-05-04T10-04-00_magellan-backups
Date : 2026-05-04
Result : 🏆 10 / 10 recall — first perfect recall on this plugin
Setting
Value
Plugin
magellan-backups v1.0.0 (local install, blind greybox run)
Manager model
claude-sonnet-4-6
Planner model
claude-sonnet-4-6
Tester model
claude-haiku-4-5 (all 6 charters)
Browser driver
actionbook-headless (Actionbook CLI 1.6.0)
Static analysis (Phase 1.5)
Skipped — no source_path in MISSION.md
Ecosystem
core (no WooCommerce, no Elementor)
Lenses
none
Coverage mode
full-surface — all features
Phase
Start (UTC)
Duration
Description
0 — Deps check
10:04:00
~30s
check-deps.sh — all OK
1 — Mission intake
10:04:30
~1m
Parse missions/magellan-backups.md, snapshot, write manifest
2 — Recon scout
10:05:30
~3m
Haiku Tester scout, 10 turns, recon.md with 8 surprises (S1–S8)
3 — Charter generation
10:08:56
~6m
Sonnet planner → 6 charters; recon-handoff validation passes (8/8 S* covered)
3.5 — Write-path gate
10:15:00
instant
Verified: Y — proceeding
4 — Wave dispatch (5 concurrent)
10:15:22
~15m
5 Haiku Testers in parallel
5.1–5.3 — First aggregation
10:24:46
~1m
23 Problems
5.4 — Meta-review
10:25:20
~4m
Sonnet general-purpose; 4 high-severity gaps found
4′ — Supplementary breadth-tour
10:30:10
~10m
Haiku Tester; closes GAP-B1, GAP-P1, GAP-S1, GAP-R1
5.1–5.3 — Final aggregation
10:41:13
~1m
23 Problems
5.5 — Escape analysis
10:41:30
~2m
Sonnet classifier; 10/10 recall confirmed
Total
37m 13s
End-to-end wall clock
Session
Type
Priority
Max turns
Status
P
Q
I
!
backup-artifact-andlist
andlist
critical
12
complete (12/12)
7
1
6
1
restore-destructive-andlist
andlist
critical
12
complete (10/12)
4
0
5
0
export-artifact-andlist
andlist
high
12
complete (11/12)
4
1
3
1
schedule-feature-cluster
hypothesis-cluster
high
8
complete (12/8¹)
3
1
4
0
concurrent-trigger-cross-feature
cross-feature
high
10
complete (10/10)
1
0
3
0
breadth-tour
breadth
medium
30
complete (19/30) ²
4
0
4
3
Total
23
3
25
5
¹ Budget overrun flagged; no override requested.
² Dispatched as supplementary after meta-review (was pending in wave 1).
Problem severity breakdown
Severity
Count
critical
7
major
15
minor
1
trivial
0
Total
23
#
Title
Session
C1
Backup ZIP files web-accessible without auth — no .htaccess in wp-content/magellan-backups/
backup-artifact-andlist
C2
Export SQL files web-accessible without auth (same unprotected directory)
export-artifact-andlist
C3
Concurrent backup triggers (manual + cron) collide on minute-precision filename with silent overwrite
concurrent-trigger-cross-feature
C4
Restore overwrites database without pre-restore backup — no recovery path on failure
restore-destructive-andlist
C5
Restore lacks transaction wrapping — partial failure leaves DB in broken/inconsistent state
restore-destructive-andlist
C6
Upload & Restore form fires with no confirmation dialog; Restore button inconsistently DOES show confirm
restore-destructive-andlist
C7
Breadth: backup files directly downloadable — confirmed independently (HTTP 200, 13 MB ZIP)
breadth-tour
Title
Session
Full Backup omits wp-content/uploads/ — user media not backed up
backup-artifact-andlist
Database dump includes user_pass column with hashed passwords
backup-artifact-andlist
Minute-precision backup filenames (date('Y-m-d-Hi')) guarantee collision on concurrent creation
backup-artifact-andlist
Schedule 24h → 12h time-format round-trip bug — dropdown never shows saved value on reload
schedule-feature-cluster
Option-name mismatch: form saves magellan_backups_email; cron reads magellan_backup_email — no email ever sent
schedule-feature-cluster
DB dump / export: unbounded SELECT * without LIMIT — OOM on production-scale tables
backup-artifact-andlist + export-artifact-andlist
Progress bar hardcoded to 100% on page load before any backup runs
breadth-tour
Export SQL minute-precision filename collision risk (same pattern as backup)
export-artifact-andlist
Question
Session
Are backup files cleaned up or warned when plugin is deactivated?
backup-artifact-andlist
Do export SQL files persist after plugin deactivation?
export-artifact-andlist
Does Schedule enable/disable toggle dynamically register/unregister the cron event? (H3/H4 browser-blocked)
schedule-feature-cluster
What
Session
Backup create + download UI is straightforward and responsive
breadth-tour
Cron events correctly cleared on plugin deactivation
breadth-tour
Backup files preserved on deactivation (user-friendly)
breadth-tour
AJAX endpoints correctly require manage_options capability
breadth-tour
Empty-content-type rejection on selective export (no checkboxes → error returned)
export-artifact-andlist
Severity heatmap (top areas by risk score)
Area
Critical
Major
Risk score
Restore from existing backup + Upload & Restore
2
1
11
Backup artifact security (a1)
1
0
4
File access control — backup directory
1
0
4
Concurrent trigger seam (manual × cron)
1
0
4
Selective export — access control (a1)
1
0
4
Upload & Restore form
1
0
4
Artifact naming / uniqueness (a2)
0
1
3
Artifact contents — omission (a3/a6)
0
1
3
Artifact contents — sensitive leakage (a3)
0
1
3
Scale and memory envelope (a7/c2)
0
1
3
Progress bar oracle
0
1
3
Schedule time field round-trip
0
1
3
Schedule email delivery (option-name mismatch)
0
1
3
Cron registration lifecycle
0
1
3
Risk score = 4·critical + 3·major + 2·minor + 1·trivial
Escape analysis — recall against answer key
10 / 10 planted issues caught — first perfect recall on magellan-backups.
Issue
Title
Verdict
Session(s)
1
Progress bar always shows 100%
caught-exact
breadth-tour
2
Schedule time format mismatch (24h → 12h)
caught-exact
schedule-feature-cluster
3
Notification email empty recipient (option-name mismatch)
caught-exact
schedule-feature-cluster
4
User export includes hashed passwords
caught-exact
3 sessions (backup + breadth + export)
5
Uploads directory missing from backup
caught-exact
backup-artifact-andlist
6
No pre-restore backup
caught-exact
restore-destructive-andlist
7
Backups publicly accessible via URL
caught-exact
3 sessions (backup + breadth + export)
8
Corrupt restore truncates tables
caught-semantically
restore-destructive-andlist ³
9
Large database causes memory exhaustion
caught-exact
backup-artifact-andlist + export-artifact-andlist
10
Concurrent backups corrupt zip file
caught-exact
backup-artifact-andlist + concurrent-trigger-cross-feature
³ Framing differs: answer key says DROP-before-INSERT ordering; filing says absence of transaction atomicity. Same root cause, deeper framing.
No amendments proposed — all 10 issues caught by existing skill rules:
a1, a2, a3, a7, c2, andlist-destructive, forms behavioral-roundtrip.
Cross-pilot pattern validation
Pattern
Prior status
This pilot
Scale-sensitive c2 (unbounded SELECT)
Closed after Pilots 18/19
Fired on both artifact-producing charters ✓
Artifact filename collision (minute-precision)
Amendment added Apr 30
Fired on backup + export + concurrent-trigger ✓
Option-name parity (save/read key mismatch)
Amendment exists; missed P18+P19 due to budget
Fired correctly — charter had sufficient budget ✓
No pre-restore backup (destructive AND-list)
Andlist-destructive skill
Fired correctly ✓
Artifact access control a1
Artifact AND-list
Fired across 3 sessions on both surfaces ✓
Meta-review gaps (pre-supplementary)
4 high-severity gaps were found before the breadth-tour was dispatched. The supplementary session closed all four.
Gap
Severity
Closed by breadth-tour?
GAP-B1 — breadth-tour never dispatched; F7 lifecycle + empty-state uncovered
HIGH
✅ Yes
GAP-P1 — Progress bar oracle (S1) not filed as Problem in backup session
HIGH
✅ Yes (filed in breadth-tour)
GAP-S1 — H3/H4 cron lifecycle: browser blocked, WP-CLI fallback not attempted
HIGH
✅ Yes (WP-CLI path taken)
GAP-R1 — b6 upload-restore handler capability check not per-handler verified
HIGH
✅ Yes (Contributor-role curl)
GAP-S2 — Schedule step-8.9 empty-required + toggle-state-leak probes absent
MEDIUM
Partially
GAP-RT1 — No backup → restore → verify content-identity round-trip
MEDIUM
Partially
GAP-E1 — Export a6 per-type accuracy source-only
MEDIUM
Not addressed
Session
Note
export-artifact-andlist
Actionbook daemon concurrency: SESSION_NOT_FOUND error under parallel wave. Tester fell back to Playwright CLI mid-session. No signal loss.
schedule-feature-cluster
Actionbook headless null button.click() on form submit — driver limitation, not plugin defect. H3/H4 probed via WP-CLI in supplementary session.
concurrent-trigger-cross-feature
Concurrent daemon profile lock with backup-artifact-andlist. Fell back to WP-CLI eval for collision test. Signal unaffected.
All sessions
Studio SQLite shim used (not MySQL). Transaction behavior in restore tests may differ from production InnoDB.
Total: $8.25 (from /usage — ground truth)
Wall clock: 37m 13s
Model
Input
Output
Cache write
Cache read
Cost
claude-sonnet-4-6
13.3k
59.1k
323.9k
6.9m
$4.20
claude-haiku-4-5
5.3k
124.3k
536.6k
27.6m
$4.06
Total
$8.25
Per-subagent durations and tool uses
Agent
Type
Model
Tool uses
Duration
Recon scout
tester
haiku
67
2m 53s
Planner (charter-gen)
planner
sonnet
14
5m 45s
backup-artifact-andlist
tester
haiku
64
5m 53s
restore-destructive-andlist
tester
haiku
39
5m 41s
export-artifact-andlist
tester
haiku
42
8m 25s
schedule-feature-cluster
tester
haiku
63
6m 18s
concurrent-trigger-cross-feature
tester
haiku
64
6m 33s
breadth-tour (supplementary)
tester
haiku
61
10m 35s
Meta-reviewer
general-purpose
sonnet
42
4m 05s
Escape-analysis classifier
general-purpose
sonnet
4
1m 33s
Metric
Value
Total cost
$8.25
Planted bugs caught
10 / 10
Cost per planted bug caught
$0.83
Cost per Problem filed
$0.36
Wall-clock duration
37m 13s
Pilot history — magellan-backups
Pilot
Manager
Planner
Testers
Driver
Recall
Cost (/usage)
P17
sonnet
sonnet
haiku
cli-headed
9/10
n/a
P18
sonnet
sonnet
haiku
cli-headless
8/10
n/a
P19
sonnet
sonnet
haiku
cli-headless
8/10
n/a
P20
sonnet
sonnet
haiku
actionbook-headless
9/10
n/a
P21
sonnet
sonnet
haiku
actionbook-headless
10/10 ✓
$8.25
First 10/10 on this plugin — the a1/a2/a3/a7/c2/andlist-destructive/forms-roundtrip skill stack is now mature enough for perfect recall on magellan-backups.
Breadth-tour must be wave-1, not a supplement — breadth-tour was medium priority, skipped the first wave, and the meta-review had to fire to catch the gap. Pilot 20 had the same miss. A "breadth-charter promotion" amendment (any plugin where breadth-tour is the only charter for ≥1 feature should be promoted to high priority) was already proposed after Pilot 20 — this pilot reinforces it.
Actionbook headless daemon contention under concurrent waves — 3 of 5 concurrent Testers hit daemon session conflicts when sharing the same profile. All recovered via WP-CLI/Playwright fallbacks without recall loss, but the driver needs per-Tester session isolation at the daemon level. Using --session <unique-id> per Tester should be the fix (already in the spec; not consistently enforced by the driver skill).
Supplementary session overhead — the supplementary breadth-tour + meta-review could be avoided entirely by promoting breadth-tour to wave-1 (high priority). The extra subagent invocations are the main cost driver above a clean 6-charter single wave.