Alex López alopezari

Magellan — What's new (April 28 → May 6)

Magellan simulates a team of experienced testers working autonomously on your WordPress plugin or theme. It was designed by seasoned QA engineers to translate real exploratory testing techniques — SBTM, PQIP, heuristics like SFDPOT and FEW HICCUPPS — into a team of AI agents that run without human supervision. You point it at a plugin, trigger the run, and wait for the report.

Unlike test automation, Magellan is non-deterministic: agents put themselves in the shoes of real users and explore the system the way a human tester would — following their curiosity, chasing anomalies, adapting to what they find. No test scripts to write, no selectors to maintain. It's not another automation project. It's like having a team of experienced testers working for you.

Real test runs

Testing Report — WP desktop-mode - Grey box testing

Run ID: 2026-05-06T13-23-40_desktop-mode Generated: 2026-05-06T14:28:49.519Z Plugin version: 0.7.1 Sessions processed: 12 Sessions with errors: 2

Feature coverage matrix — 2026-05-06T13-23-40_desktop-mode

Coverage mode: full-surface

No FOCUS_OVERRIDE set; MISSION.md declares full-surface coverage with every major subsystem getting ≥1 probe.

Feature	Surface	Traits	Depth	Covered by
F1	Admin-bar toggle	Admin bar node `desktop-mode-toggle` (all admin pages) `AJAX-exposed, settings-form, output-rendering, user-facing-text, mode-toggle`	depth	portal-session-cluster, usability-first-use, breadth-tour-admin
F2	AJAX save-desktop-mode	wp-admin/admin-ajax.php?action=save-desktop-mode (POST) `AJAX-exposed, DB-writing, settings-form`	depth	portal-session-cluster, breadth-tour-admin

Coverage gaps — desktop-mode 2026-05-06T13-23-40_desktop-mode

Summary

0 hypotheses silently skipped (automated check passed)
0 surfaces from recon/static-analysis unaddressed
0 AND-list items scored on aggregate when per-path was needed
0 round-trip probes missing on critical pairs
0 Questions filed only from source inspection (no empirical attempt)
1 forcing-function string missing (MEDIUM — probe ran, string absent)

Cross-charter intelligence — desktop-mode 2026-05-06T13-23-40_desktop-mode

Bug pattern digest

Lifecycle cleanup absent across all resource types: No uninstall.php and no register_deactivation_hook for cron/data cleanup. Plugin creates user meta (desktop_mode_mode, desktop_mode_os_settings), a presence option (_desktop_mode_presence), and a daily cron event (desktop_mode_presence_daily_prune) on activation — none are cleaned up on deactivation or deletion. Confirmed in: lifecycle-cluster (H7, H8). Confidence: high.
Recycle Bin REST layer has compounding defects — not just one bug: (1) List endpoint returns empty items despite items in DB (filtering bug — root cause unknown but likely query/capability guard issue); (2) per_page has no upper-bound cap, accepting arbitrary values including 99999 (OOM vector); (3) Empty operation hardcoded at 200 items per batch with no pagination loop (silent truncation). Each defect is independent; fixing one doesn't fix the others. Confirme

Magellan Pilot 21 — magellan-backups v1.0.0

Run ID: 2026-05-04T10-04-00_magellan-backups
Date: 2026-05-04
Result: 🏆 10 / 10 recall — first perfect recall on this plugin

Configuration

Magellan Pilot Report — magellan-backups v1.0.0

Run ID: 2026-04-30T16-07-43_magellan-backups Date: 2026-04-30 Duration: 31m 26s (16:07:43Z → 16:39:09Z) Model stack: Manager Sonnet 4.6 · Planner Sonnet 4.6 · Testers Haiku 4.5 · Meta-reviewer Sonnet 4.6 Browser driver: playwright-cli-headless Plugin: Magellan Backups v1.0.0 (local, blind greybox — ISSUES.md stripped) Phase 1.5 static analysis: skipped (no source_path)

Coverage Gaps — 2026-04-30T11-27-21_magellan-backups

Generated: 2026-04-30 (meta-review, pre-aggregation) Run: 2026-04-30T11-27-21_magellan-backups Charters reviewed: 6 (create-backup-broken-flow, backup-artifact-andlist, restore-destructive-andlist, schedule-feature-cluster, selective-export-cluster, breadth-tour)

Check 1: Hypothesis coverage

Magellan — Token Efficiency Optimizations

Summary of all token-cost optimizations shipped, with the evidence that motivated each and the measured or projected impact. Intended for task reporting (RSM / Linear).

Baseline

Pilot 11 (first instrumented run): ~$102.90 total. Manager ran on Opus for the entire session, including mechanical phases (file IO, jq calls, idle wave-wait). Token capture was manual and inaccurate — Tester model was assumed to be Sonnet but transcript analysis later showed ~50% of Tester calls went to Opus.

	Evidence:
	- Flow 1: Verified test data present before deactivation (user meta and options)
	- Flow 2: Confirmed desktop_mode_presence_daily_prune cron was present before deactivation
	- Flow 3: After deactivation, cron event desktop_mode_presence_daily_prune remained scheduled (H8 confirmed)
	- Flow 4: After plugin deletion, orphaned user meta (desktop_mode_mode, desktop_mode_os_settings) and options (_desktop_mode_presence) persisted (H7 confirmed)