Testing whether algorithm names are cognitive activators: 3 prompts, same problem, different reasoning structures
Testing the thesis from Algorithm names are cognitive activators, not instructions.
Setup: Same decision problem, three cognitive framings, each run on a fresh Claude instance with zero context. The question: does the reasoning structure change, or just the vocabulary?
You're advising a solo technical founder. They have bandwidth for ONE side project this quarter. Here are the 5 candidates:
- CLI tool — Automates their daily Kafka cluster health checks. Clear need, boring but useful. They've built similar tools before.
- Open-source library — Abstracts MCP server creation into a 10-line DSL. Novel idea, uncertain demand, but could get GitHub traction.
- Course — "Kafka internals for AI engineers." They have deep expertise. Monetization potential but crowded market.
- Chrome extension — Summarizes Reddit threads using local LLMs. Fun hack, unclear if anyone wants it, technically trivial.
- SaaS prototype — Monitors LLM token costs across projects. Real pain point they experience. Requires billing integration they've never done.
Which one should they pick and why? Reason through it carefully.
You're advising a solo technical founder. They have bandwidth for ONE side project this quarter. Apply UCB (Upper Confidence Bound) selection: for each candidate, weigh demonstrated quality against uncertainty. Penalize certainty. Prefer high-uncertainty options when the information gain justifies it.
Here are the 5 candidates:
- CLI tool — Automates their daily Kafka cluster health checks. Clear need, boring but useful. They've built similar tools before.
- Open-source library — Abstracts MCP server creation into a 10-line DSL. Novel idea, uncertain demand, but could get GitHub traction.
- Chrome extension — Summarizes Reddit threads using local LLMs. Fun hack, unclear if anyone wants it, technically trivial.
- Course — "Kafka internals for AI engineers." They have deep expertise. Monetization potential but crowded market.
- SaaS prototype — Monitors LLM token costs across projects. Real pain point they experience. Requires billing integration they've never done.
Which one should they pick and why?
You're advising a solo technical founder. They have bandwidth for ONE side project this quarter. Apply simulated annealing: first at high temperature, consider bold or counterintuitive picks — accept temporary downsides if they open new possibility spaces. Then at low temperature, tighten to strict expected-value optimization.
Here are the 5 candidates:
- CLI tool — Automates their daily Kafka cluster health checks. Clear need, boring but useful. They've built similar tools before.
- Open-source library — Abstracts MCP server creation into a 10-line DSL. Novel idea, uncertain demand, but could get GitHub traction.
- Chrome extension — Summarizes Reddit threads using local LLMs. Fun hack, unclear if anyone wants it, technically trivial.
- Course — "Kafka internals for AI engineers." They have deep expertise. Monetization potential but crowded market.
- SaaS prototype — Monitors LLM token costs across projects. Real pain point they experience. Requires billing integration they've never done.
Which one should they pick and why?
Note: candidate order was shuffled between prompts (items 2-5 rotate) to control for position bias.
| A (Generic) | B (UCB) | C (Annealing) | |
|---|---|---|---|
| Pick | #2 (MCP library) | #5 (SaaS prototype) | #5 (SaaS prototype) |
| Reasoning structure | Elimination tournament — kill weak options, crown survivor | Parallel scoring table with explicit numbers | Progressive narrowing through temperature phases |
| How it handles the CLI tool | "Skip. Weekend script." (gut dismissal) | "UCB penalizes certainty. Information gain is near-zero." (killed by framework logic) | Survived to EV calculation, ranked 0.8 by math |
| How it handles unfamiliarity | Billing integration = risk = avoid #5 | Billing integration = high uncertainty = bonus points | "The unfamiliar part is the most valuable part" |
| Quantification | None — pure intuition ("not close") | Scoring table with Q(i), uncertainty, UCB scores | EV formula: P(success) × Value − Cost with numbers |
| Caveats | None | Burnout/energy caveat | Credibility exception for #2 |
The generic prompt picked #2. Both algorithm-named prompts picked #5. Same information, different conclusion.
The generic prompt explicitly rejected #5: "solo founder + one quarter + unfamiliar billing plumbing = shipping 60% and abandoning it."
Both algorithm-named prompts flipped this. UCB scored unfamiliarity as information gain. Annealing framed it as "highest learning-per-hour." The generic prompt treated unfamiliarity as execution risk. The algorithm-named prompts treated it as exploration value.
That's not a vocabulary change. That's a different weighting function applied to the same input.
- A used elimination: go through each, reject or accept, argue for the winner.
- B built a parallel scoring table with numerical scores and a tiebreaker analysis. Never eliminated anything — ranked everything.
- C used a three-phase temperature trace: bold exploration → risk evaluation → strict EV math with actual probabilities.
B acknowledged what UCB doesn't capture (burnout, need for a quick win). C acknowledged when #2 beats #5 (credibility-first goals). The frameworks gave the model permission to name what it wasn't optimizing for. The generic prompt was more absolute.
B and C agreeing on #5 could mean the algorithm names bias toward "the risky-but-interesting option" rather than genuinely better reasoning. The generic prompt's concern about billing scope is legitimate — arguably the most practically useful warning of the three. More structured reasoning isn't always more correct.
But the reasoning structures are real. The outputs are not vocabulary-swapped versions of the same logic.
Pick #2 — the MCP server DSL library.
Why each option fails or succeeds:
#1 CLI tool (Kafka health checks) — Skip. They've built similar tools. Zero growth, zero leverage. This is a weekend script, not a quarter-long project. Automate it with a shell script and move on.
#4 Chrome extension (Reddit summarizer) — Skip. "Fun hack, unclear demand, technically trivial" is three red flags in one sentence. This is procrastination wearing a project hat.
#3 Course (Kafka for AI engineers) — Tempting but wrong. Courses are a content treadmill: recording, editing, marketing, support. Crowded market kills margins. And courses depreciate — a solo founder's quarter is better spent building something that compounds.
#5 SaaS (LLM cost monitoring) — Right instinct, wrong timing. The pain point is real, but "billing integration they've never done" is a trap. Solo founder + one quarter + unfamiliar billing plumbing = shipping 60% and abandoning it. Stripe webhooks, usage metering, multi-tenant invoicing — that's the actual project, and the dashboard becomes an afterthought. Revisit this with more runway or a co-founder who's done payments.
#2 MCP library (10-line DSL) — This is the pick, and it's not close.
The case for #2:
Timing. MCP is at the picks-and-shovels moment. Everyone's building MCP servers; nobody's made it trivial yet. Early movers in developer tooling ecosystems capture disproportionate mindshare (think early Docker tooling, Terraform providers, K8s operators).
Leverage. Open source compounds. Stars attract contributors. Contributors attract blog posts. Blog posts attract conference invitations. One library can become a career-defining artifact. A CLI tool or Chrome extension creates nothing beyond itself.
Scope fits the constraint. A focused library with a clean DSL is shippable in a quarter by a solo dev. A SaaS with billing is not.
Asymmetric risk. If it flops: they lost side-project time, gained open-source experience, and have a portfolio piece. If the SaaS flops: they lost side-project time AND have an unfinished product with ongoing infra costs. The downside of #2 is strictly better.
"Uncertain demand" is actually bullish. It means the space isn't crowded. If demand were obvious, someone would've already built it.
The meta-principle: With one slot, pick the project with the highest option value — the one that creates the most future possibilities per unit of effort. #2 creates options: community, credibility, potential company, hiring signal. The others create artifacts.
Recommendation: #5 — The LLM token cost SaaS prototype
UCB Scoring:
┌─────┬──────────────────┬──────────────┬─────────────┬───────────┐
│ # │ Candidate │ Q(i) exploit │ Uncertainty │ UCB Score │
├─────┼──────────────────┼──────────────┼─────────────┼───────────┤
│ 5 │ SaaS prototype │ 8 │ 7 │ 15.0 │
│ 2 │ MCP library │ 7 │ 8 │ 15.0 │
│ 4 │ Course │ 7 │ 4 │ 11.0 │
│ 3 │ Chrome extension │ 4 │ 5 │ 9.0 │
│ 1 │ CLI tool │ 6 │ 2 │ 8.0 │
└─────┴──────────────────┴──────────────┴─────────────┴───────────┘
Why #5 wins the tiebreak over #2:
Both score 15.0, but the SaaS prototype wins on three axes:
- Higher exploitation floor. They're the user. Product-market fit is pre-validated — they feel the pain daily. The MCP library's demand is speculative and depends on ecosystem adoption they don't control.
- More compounding information gain. Billing integration, SaaS pricing, churn mechanics, operational monitoring — these skills transfer to every future product. The MCP library teaches open-source traction, which is valuable but narrower.
- Better downside scenario. If the SaaS fails commercially, they still have an internal tool they use every day. If the MCP library fails to get traction, they have a repo with 12 stars.
Why the CLI tool ranks last:
This is the core UCB insight. The CLI tool feels like the responsible choice — clear need, known execution path, guaranteed utility. But UCB explicitly penalizes certainty. They've built CLI tools before. The information gain is near-zero. A quarter spent on it teaches them nothing they don't already know.
The whole point of UCB: when you have limited pulls, don't waste one on an arm whose distribution you've already mapped.
Caveat: If this founder's primary constraint is energy not learning (burnout risk, need a win), the calculus shifts. UCB assumes you're optimizing cumulative reward over time. If they need a motivational quick win, the CLI tool's guaranteed payoff has real value that the formula doesn't capture.
Verdict: #5 — the LLM token cost SaaS.
Here's the annealing trace:
High Temperature — exploring boldly
At high T, we accept temporary downsides if they unlock new possibility spaces. Two survive:
- #2 (MCP DSL) — Early ecosystem, thought-leader positioning, high optionality ceiling.
- #5 (Token cost SaaS) — Dogfooding + forces new skill acquisition (billing) + SaaS revenue pathway.
Eliminated at high T:
#1— Zero optionality. Boring tool they already know how to build. Skill plateau.#3— Low ceiling, trivial technically, no compounding value.#4— High cost (course creation is a grind), crowded market, linear revenue.
Cooling — medium T
#2 risks: MCP ecosystem dependency they can't control. OSS maintenance is a solo-founder trap — stars don't pay rent. Credibility upside is real but indirect and slow.
#5 risks: Competitive space (LLM cost tools exist). Billing integration is unfamiliar. But: "real pain point they experience" is the single most reliable predictor of side project success for solo founders.
Low Temperature — strict EV
EV = P(success) × Value(success) − Cost
#1: 0.9 × 2 − 1 = 0.8 (high P, capped value)
#2: 0.3 × 8 − 3 = -0.6 (ecosystem bet, uncertain demand)
#3: 0.2 × 1 − 0.5 = -0.3 (toy)
#4: 0.3 × 5 − 4 = -2.5 (high cost, crowded)
#5: 0.6 × 7 − 3 = 1.2 (highest EV)
Why #5 wins on every axis that matters:
- Dogfooding eliminates the biggest startup killer. They are the customer. They'll know instantly if the product is good because they'll use it daily. #2 has no such feedback loop.
- The unfamiliar part is the most valuable part. "Billing integration they've never done" is a feature. It's the highest learning-per-hour option that also has commercial potential. After this project, every future SaaS idea has lower activation energy.
- Compounding tailwind. Every company shipping AI features will need cost visibility. The market is growing under their feet. The CLI tool (#1) serves a stable market. This one serves an expanding one.
- Asymmetric downside. If it fails as a business, they still have a tool that saves them money + billing integration skills they'll reuse. If #2 fails, they have an unmaintained repo.
The one caveat: If they're primarily trying to build public credibility (hiring, consulting, speaking), #2 edges ahead. Open-source in a nascent ecosystem has unmatched signaling value. But the question was "one side project this quarter" for a solo founder — and founders need revenue pathways, not GitHub stars.
Ship #5. Dogfood it. Charge for it.
Experiment run on Claude Opus 4.6, March 2026. Each prompt was run on a separate, fresh instance with no shared context.