Skip to content

Instantly share code, notes, and snippets.

@alopezari
Last active May 4, 2026 10:55
Show Gist options
  • Select an option

  • Save alopezari/f38e239f09c543f0f2f9a796a5caa2eb to your computer and use it in GitHub Desktop.

Select an option

Save alopezari/f38e239f09c543f0f2f9a796a5caa2eb to your computer and use it in GitHub Desktop.
Magellan Pilot 21 — magellan-backups v1.0.0 — 10/10 recall, $8.25, Sonnet+Haiku+Actionbook

Magellan Pilot 21 — magellan-backups v1.0.0

Run ID: 2026-05-04T10-04-00_magellan-backups
Date: 2026-05-04
Result: 🏆 10 / 10 recall — first perfect recall on this plugin


Configuration

Setting Value
Plugin magellan-backups v1.0.0 (local install, blind greybox run)
Manager model claude-sonnet-4-6
Planner model claude-sonnet-4-6
Tester model claude-haiku-4-5 (all 6 charters)
Browser driver actionbook-headless (Actionbook CLI 1.6.0)
Static analysis (Phase 1.5) Skipped — no source_path in MISSION.md
Ecosystem core (no WooCommerce, no Elementor)
Lenses none
Coverage mode full-surface — all features

Phase timeline

Phase Start (UTC) Duration Description
0 — Deps check 10:04:00 ~30s check-deps.sh — all OK
1 — Mission intake 10:04:30 ~1m Parse missions/magellan-backups.md, snapshot, write manifest
2 — Recon scout 10:05:30 ~3m Haiku Tester scout, 10 turns, recon.md with 8 surprises (S1–S8)
3 — Charter generation 10:08:56 ~6m Sonnet planner → 6 charters; recon-handoff validation passes (8/8 S* covered)
3.5 — Write-path gate 10:15:00 instant Verified: Y — proceeding
4 — Wave dispatch (5 concurrent) 10:15:22 ~15m 5 Haiku Testers in parallel
5.1–5.3 — First aggregation 10:24:46 ~1m 23 Problems
5.4 — Meta-review 10:25:20 ~4m Sonnet general-purpose; 4 high-severity gaps found
4′ — Supplementary breadth-tour 10:30:10 ~10m Haiku Tester; closes GAP-B1, GAP-P1, GAP-S1, GAP-R1
5.1–5.3 — Final aggregation 10:41:13 ~1m 23 Problems
5.5 — Escape analysis 10:41:30 ~2m Sonnet classifier; 10/10 recall confirmed
Total 37m 13s End-to-end wall clock

Charter set

Session Type Priority Max turns Status P Q I !
backup-artifact-andlist andlist critical 12 complete (12/12) 7 1 6 1
restore-destructive-andlist andlist critical 12 complete (10/12) 4 0 5 0
export-artifact-andlist andlist high 12 complete (11/12) 4 1 3 1
schedule-feature-cluster hypothesis-cluster high 8 complete (12/8¹) 3 1 4 0
concurrent-trigger-cross-feature cross-feature high 10 complete (10/10) 1 0 3 0
breadth-tour breadth medium 30 complete (19/30) ² 4 0 4 3
Total 23 3 25 5

¹ Budget overrun flagged; no override requested.
² Dispatched as supplementary after meta-review (was pending in wave 1).


Findings summary

Problem severity breakdown

Severity Count
critical 7
major 15
minor 1
trivial 0
Total 23

Top 7 critical problems

# Title Session
C1 Backup ZIP files web-accessible without auth — no .htaccess in wp-content/magellan-backups/ backup-artifact-andlist
C2 Export SQL files web-accessible without auth (same unprotected directory) export-artifact-andlist
C3 Concurrent backup triggers (manual + cron) collide on minute-precision filename with silent overwrite concurrent-trigger-cross-feature
C4 Restore overwrites database without pre-restore backup — no recovery path on failure restore-destructive-andlist
C5 Restore lacks transaction wrapping — partial failure leaves DB in broken/inconsistent state restore-destructive-andlist
C6 Upload & Restore form fires with no confirmation dialog; Restore button inconsistently DOES show confirm restore-destructive-andlist
C7 Breadth: backup files directly downloadable — confirmed independently (HTTP 200, 13 MB ZIP) breadth-tour

Select major problems

Title Session
Full Backup omits wp-content/uploads/ — user media not backed up backup-artifact-andlist
Database dump includes user_pass column with hashed passwords backup-artifact-andlist
Minute-precision backup filenames (date('Y-m-d-Hi')) guarantee collision on concurrent creation backup-artifact-andlist
Schedule 24h → 12h time-format round-trip bug — dropdown never shows saved value on reload schedule-feature-cluster
Option-name mismatch: form saves magellan_backups_email; cron reads magellan_backup_email — no email ever sent schedule-feature-cluster
DB dump / export: unbounded SELECT * without LIMIT — OOM on production-scale tables backup-artifact-andlist + export-artifact-andlist
Progress bar hardcoded to 100% on page load before any backup runs breadth-tour
Export SQL minute-precision filename collision risk (same pattern as backup) export-artifact-andlist

Questions (3)

Question Session
Are backup files cleaned up or warned when plugin is deactivated? backup-artifact-andlist
Do export SQL files persist after plugin deactivation? export-artifact-andlist
Does Schedule enable/disable toggle dynamically register/unregister the cron event? (H3/H4 browser-blocked) schedule-feature-cluster

Praises (5)

What Session
Backup create + download UI is straightforward and responsive breadth-tour
Cron events correctly cleared on plugin deactivation breadth-tour
Backup files preserved on deactivation (user-friendly) breadth-tour
AJAX endpoints correctly require manage_options capability breadth-tour
Empty-content-type rejection on selective export (no checkboxes → error returned) export-artifact-andlist

Severity heatmap (top areas by risk score)

Area Critical Major Risk score
Restore from existing backup + Upload & Restore 2 1 11
Backup artifact security (a1) 1 0 4
File access control — backup directory 1 0 4
Concurrent trigger seam (manual × cron) 1 0 4
Selective export — access control (a1) 1 0 4
Upload & Restore form 1 0 4
Artifact naming / uniqueness (a2) 0 1 3
Artifact contents — omission (a3/a6) 0 1 3
Artifact contents — sensitive leakage (a3) 0 1 3
Scale and memory envelope (a7/c2) 0 1 3
Progress bar oracle 0 1 3
Schedule time field round-trip 0 1 3
Schedule email delivery (option-name mismatch) 0 1 3
Cron registration lifecycle 0 1 3

Risk score = 4·critical + 3·major + 2·minor + 1·trivial


Escape analysis — recall against answer key

10 / 10 planted issues caught — first perfect recall on magellan-backups.

Issue Title Verdict Session(s)
1 Progress bar always shows 100% caught-exact breadth-tour
2 Schedule time format mismatch (24h → 12h) caught-exact schedule-feature-cluster
3 Notification email empty recipient (option-name mismatch) caught-exact schedule-feature-cluster
4 User export includes hashed passwords caught-exact 3 sessions (backup + breadth + export)
5 Uploads directory missing from backup caught-exact backup-artifact-andlist
6 No pre-restore backup caught-exact restore-destructive-andlist
7 Backups publicly accessible via URL caught-exact 3 sessions (backup + breadth + export)
8 Corrupt restore truncates tables caught-semantically restore-destructive-andlist ³
9 Large database causes memory exhaustion caught-exact backup-artifact-andlist + export-artifact-andlist
10 Concurrent backups corrupt zip file caught-exact backup-artifact-andlist + concurrent-trigger-cross-feature

³ Framing differs: answer key says DROP-before-INSERT ordering; filing says absence of transaction atomicity. Same root cause, deeper framing.

No amendments proposed — all 10 issues caught by existing skill rules:
a1, a2, a3, a7, c2, andlist-destructive, forms behavioral-roundtrip.

Cross-pilot pattern validation

Pattern Prior status This pilot
Scale-sensitive c2 (unbounded SELECT) Closed after Pilots 18/19 Fired on both artifact-producing charters ✓
Artifact filename collision (minute-precision) Amendment added Apr 30 Fired on backup + export + concurrent-trigger ✓
Option-name parity (save/read key mismatch) Amendment exists; missed P18+P19 due to budget Fired correctly — charter had sufficient budget ✓
No pre-restore backup (destructive AND-list) Andlist-destructive skill Fired correctly ✓
Artifact access control a1 Artifact AND-list Fired across 3 sessions on both surfaces ✓

Meta-review gaps (pre-supplementary)

4 high-severity gaps were found before the breadth-tour was dispatched. The supplementary session closed all four.

Gap Severity Closed by breadth-tour?
GAP-B1 — breadth-tour never dispatched; F7 lifecycle + empty-state uncovered HIGH ✅ Yes
GAP-P1 — Progress bar oracle (S1) not filed as Problem in backup session HIGH ✅ Yes (filed in breadth-tour)
GAP-S1 — H3/H4 cron lifecycle: browser blocked, WP-CLI fallback not attempted HIGH ✅ Yes (WP-CLI path taken)
GAP-R1 — b6 upload-restore handler capability check not per-handler verified HIGH ✅ Yes (Contributor-role curl)
GAP-S2 — Schedule step-8.9 empty-required + toggle-state-leak probes absent MEDIUM Partially
GAP-RT1 — No backup → restore → verify content-identity round-trip MEDIUM Partially
GAP-E1 — Export a6 per-type accuracy source-only MEDIUM Not addressed

Environment notes

Session Note
export-artifact-andlist Actionbook daemon concurrency: SESSION_NOT_FOUND error under parallel wave. Tester fell back to Playwright CLI mid-session. No signal loss.
schedule-feature-cluster Actionbook headless null button.click() on form submit — driver limitation, not plugin defect. H3/H4 probed via WP-CLI in supplementary session.
concurrent-trigger-cross-feature Concurrent daemon profile lock with backup-artifact-andlist. Fell back to WP-CLI eval for collision test. Signal unaffected.
All sessions Studio SQLite shim used (not MySQL). Transaction behavior in restore tests may differ from production InnoDB.

Token consumption & cost

Total: $8.25 (from /usage — ground truth)
Wall clock: 37m 13s

Model Input Output Cache write Cache read Cost
claude-sonnet-4-6 13.3k 59.1k 323.9k 6.9m $4.20
claude-haiku-4-5 5.3k 124.3k 536.6k 27.6m $4.06
Total $8.25

Per-subagent durations and tool uses

Agent Type Model Tool uses Duration
Recon scout tester haiku 67 2m 53s
Planner (charter-gen) planner sonnet 14 5m 45s
backup-artifact-andlist tester haiku 64 5m 53s
restore-destructive-andlist tester haiku 39 5m 41s
export-artifact-andlist tester haiku 42 8m 25s
schedule-feature-cluster tester haiku 63 6m 18s
concurrent-trigger-cross-feature tester haiku 64 6m 33s
breadth-tour (supplementary) tester haiku 61 10m 35s
Meta-reviewer general-purpose sonnet 42 4m 05s
Escape-analysis classifier general-purpose sonnet 4 1m 33s

Cost efficiency

Metric Value
Total cost $8.25
Planted bugs caught 10 / 10
Cost per planted bug caught $0.83
Cost per Problem filed $0.36
Wall-clock duration 37m 13s

Pilot history — magellan-backups

Pilot Manager Planner Testers Driver Recall Cost (/usage)
P17 sonnet sonnet haiku cli-headed 9/10 n/a
P18 sonnet sonnet haiku cli-headless 8/10 n/a
P19 sonnet sonnet haiku cli-headless 8/10 n/a
P20 sonnet sonnet haiku actionbook-headless 9/10 n/a
P21 sonnet sonnet haiku actionbook-headless 10/10 ✓ $8.25

Key takeaways

  1. First 10/10 on this plugin — the a1/a2/a3/a7/c2/andlist-destructive/forms-roundtrip skill stack is now mature enough for perfect recall on magellan-backups.

  2. Breadth-tour must be wave-1, not a supplement — breadth-tour was medium priority, skipped the first wave, and the meta-review had to fire to catch the gap. Pilot 20 had the same miss. A "breadth-charter promotion" amendment (any plugin where breadth-tour is the only charter for ≥1 feature should be promoted to high priority) was already proposed after Pilot 20 — this pilot reinforces it.

  3. Actionbook headless daemon contention under concurrent waves — 3 of 5 concurrent Testers hit daemon session conflicts when sharing the same profile. All recovered via WP-CLI/Playwright fallbacks without recall loss, but the driver needs per-Tester session isolation at the daemon level. Using --session <unique-id> per Tester should be the fix (already in the spec; not consistently enforced by the driver skill).

  4. Supplementary session overhead — the supplementary breadth-tour + meta-review could be avoided entirely by promoting breadth-tour to wave-1 (high priority). The extra subagent invocations are the main cost driver above a clean 6-charter single wave.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment