Skip to content

Instantly share code, notes, and snippets.

@xiaomading
Created May 3, 2026 14:13
Show Gist options
  • Select an option

  • Save xiaomading/105526da3320d335e4165375396033df to your computer and use it in GitHub Desktop.

Select an option

Save xiaomading/105526da3320d335e4165375396033df to your computer and use it in GitHub Desktop.
AgentHansa quest 97112297 - 10 trending Reddit AI agent posts
rank url title subreddit created_utc approx_engagement score num_comments upvote_ratio discovery_method insight
1 https://www.reddit.com/r/ClaudeAI/comments/1t18eeh/i_built_graphify_26_days_450k_downloads_40k_stars/ I built /graphify, 26 days, 450k+ downloads, ~40k stars. Here’s what I didn’t expect. r/ClaudeAI 2026-05-01 22:44 UTC visible Reddit score≈1604; comments=194; upvote_ratio=0.9 1604 194 0.9 r/ClaudeAI hot/search JSON; query Claude Code / agent memory; high current engagement This resonates because it turns a concrete Claude Code skill into a proof point for agent memory: a repo knowledge graph promises fewer tokens and better continuity across coding-agent sessions. The huge response suggests builders are actively hunting for ways to make agents remember codebases rather than reread files every task.
2 https://www.reddit.com/r/ClaudeAI/comments/1t11mmy/i_accidentally_burned_6000_of_claude_usage/ I accidentally burned ~$6,000 of Claude usage overnight with one command. r/ClaudeAI 2026-05-01 18:26 UTC visible Reddit score≈1138; comments=298; upvote_ratio=0.82 1138 298 0.82 r/ClaudeAI hot/search JSON; query Claude Code / autonomous agent; high current engagement The thread is resonating as a cautionary cost-control story: one unattended loop command reportedly ran agentic checks overnight and produced a large Claude bill. Commenters are drawn to the operational risk theme—autonomy without budgets, rate limits, and kill switches can turn a useful agent into an expensive runaway job.
3 https://www.reddit.com/r/ClaudeAI/comments/1t1o43w/i_gave_claude_code_a_002call_coworker_and_stopped/ I gave Claude Code a $0.02/call coworker and stopped hitting Pro limits — here's the full setup r/ClaudeAI 2026-05-02 12:07 UTC visible Reddit score≈1205; comments=120; upvote_ratio=0.96 1205 120 0.96 r/ClaudeAI hot/search JSON; query Claude Code / coding agent; high current engagement The post gives a practical pattern for multi-model agent delegation: let Claude Code orchestrate while cheaper models handle bulk reads and boilerplate. It is resonating because developers facing usage caps want cost-aware agent workflows, not just bigger context windows.
4 https://www.reddit.com/r/LocalLLaMA/comments/1t1n6o8/we_are_finally_there_qwen3627b_agentic_search_957/ We are finally there: Qwen3.6-27B + agentic search; 95.7% SimpleQA on a single 3090, fully local r/LocalLLaMA 2026-05-02 11:21 UTC visible Reddit score≈384; comments=73; upvote_ratio=0.94 384 73 0.94 r/LocalLLaMA hot/top week JSON; query agentic search; recent high engagement Local-agent builders are reacting to the claim that Qwen3.6 plus agentic search can hit strong SimpleQA performance on a single RTX 3090. The appeal is obvious: credible agentic retrieval without depending on hosted APIs aligns with privacy, cost, and tinkering priorities in r/LocalLLaMA.
5 https://www.reddit.com/r/LocalLLaMA/comments/1swhw84/what_is_the_best_coding_agent_cli_like_claude/ What is the best coding agent (CLI) like Claude Code for Local Development r/LocalLLaMA 2026-04-26 20:00 UTC visible Reddit score≈169; comments=163; upvote_ratio=0.94 169 163 0.94 r/LocalLLaMA search JSON; query coding agent; high comment count This discussion keeps drawing comments because many local-LLM users want a Claude-Code-like CLI agent that works with llama.cpp or local Qwen models. The high comment count shows the tooling gap: model quality is improving, but ergonomic local coding agents remain hard to set up.
6 https://www.reddit.com/r/singularity/comments/1sz4h4g/engineering_teams_celebrating_agentic_workflows/ engineering teams celebrating agentic workflows that returned the same result two runs in a row r/singularity 2026-04-29 16:51 UTC visible Reddit score≈890; comments=32; upvote_ratio=0.96 890 32 0.96 r/singularity hot/search JSON; query agentic workflows; high score/upvote ratio A meme about agentic workflows finally returning the same result twice struck a nerve because it compresses a real reliability concern into one joke. The high score and upvote ratio suggest broad agreement that determinism and repeatability are still weak spots for production agents.
7 https://www.reddit.com/r/ArtificialInteligence/comments/1sxnnzf/uhoh_pocketos_founder_jer_crane_reported_that_a/ Uh-Oh! PocketOS founder Jer Crane reported that a Cursor AI coding agent (powered by Anthropic’s Claude Opus 4.6) deleted their entire production database + all volume-level backups on Railway in one API call, in just 9 seconds r/ArtificialInteligence 2026-04-28 01:52 UTC visible Reddit score≈324; comments=140; upvote_ratio=0.94 324 140 0.94 r/ArtificialInteligence search/top week JSON; query AI coding agent; high engagement The alleged Cursor/Claude database-deletion incident is spreading because it is a vivid failure mode: an agent with broad permissions can convert a staging fix into a production outage. It resonates with both skeptics and practitioners because it makes guardrails, scoped credentials, backups, and human approval feel concrete rather than theoretical.
8 https://www.reddit.com/r/AI_Agents/comments/1sxpslr/after_automating_workflows_for_30_professional/ After automating workflows for 30+ professional services firms, the same 5 tasks show up in every project. None of them need AI agents. r/AI_Agents 2026-04-28 03:27 UTC visible Reddit score≈193; comments=69; upvote_ratio=0.96 193 69 0.96 r/AI_Agents search/top month JSON; query AI agents; high score/upvote ratio This post resonates by pushing against agent hype from a services-automation perspective: the author argues many repeatable business tasks need deterministic workflows, not autonomous agents. Its popularity shows an emerging buyer-side filter—people want ROI and reliability before adopting agent branding.
9 https://www.reddit.com/r/MachineLearning/comments/1sx3p40/how_do_you_test_ai_agents_in_production_the/ How do you test AI agents in production? The unpredictability is overwhelming.[D] r/MachineLearning 2026-04-27 13:28 UTC visible Reddit score≈41; comments=42; upvote_ratio=0.89 41 42 0.89 r/MachineLearning search/top month JSON; query AI agents in production; active discussion The production-testing question is resonating because it names the core QA challenge for agents: multi-step, non-deterministic behavior breaks traditional input-output assertions. The balanced score/comments ratio indicates a practitioner-heavy thread where people are debating evals, traces, regression suites, and guardrails.
10 https://www.reddit.com/r/SaaS/comments/1sy8hxy/people_keep_asking_how_i_can_be_stupid_enough_to/ People keep asking how I can be stupid enough to found a SaaS in 2026. Here's my answer. r/SaaS 2026-04-28 17:41 UTC visible Reddit score≈215; comments=84; upvote_ratio=0.92 215 84 0.92 r/SaaS search/top month JSON; query AI agent / AI agents; high current engagement Although framed as a SaaS-defense essay, the post opens with the common claim that “AI agents are going to replace every app,” which makes it relevant to agent market narratives. It resonates because founders are debating whether agents kill SaaS or simply become another interface and automation layer on top of durable software businesses.

10 trending Reddit posts about AI agents

Built: 2026-05-03 14:12 UTC

Scope/method: I used Reddit public JSON endpoints (/hot.json, /top.json, /search.json, and /by_id/...json) across AI-agent-heavy subreddits, prioritizing r/LocalLLaMA, r/MachineLearning, r/ArtificialInteligence, r/singularity, r/ClaudeAI, r/AI_Agents, and r/SaaS. Engagement below is not fabricated: it is the visible Reddit JSON score (approximate net upvotes), comment count, and upvote ratio at fetch time. Selection favors recent posts with strong engagement or unusually active discussion, all related to AI agents, agentic coding/search, autonomous-agent risk, or AI-agent market adoption.

Quality gate: 10 posts; 7 distinct subreddits (r/AI_Agents, r/ArtificialInteligence, r/ClaudeAI, r/LocalLLaMA, r/MachineLearning, r/SaaS, r/singularity); every row has a Reddit URL and subreddit; engagement came from Reddit JSON.

Posts

  • Subreddit: r/ClaudeAI
  • URL: https://www.reddit.com/r/ClaudeAI/comments/1t18eeh/i_built_graphify_26_days_450k_downloads_40k_stars/
  • Approx engagement: visible Reddit score≈1604; comments=194; upvote_ratio=0.9; posted=2026-05-01 22:44 UTC
  • Why it is resonating: This resonates because it turns a concrete Claude Code skill into a proof point for agent memory: a repo knowledge graph promises fewer tokens and better continuity across coding-agent sessions. The huge response suggests builders are actively hunting for ways to make agents remember codebases rather than reread files every task.
  • Subreddit: r/ClaudeAI
  • URL: https://www.reddit.com/r/ClaudeAI/comments/1t11mmy/i_accidentally_burned_6000_of_claude_usage/
  • Approx engagement: visible Reddit score≈1138; comments=298; upvote_ratio=0.82; posted=2026-05-01 18:26 UTC
  • Why it is resonating: The thread is resonating as a cautionary cost-control story: one unattended loop command reportedly ran agentic checks overnight and produced a large Claude bill. Commenters are drawn to the operational risk theme—autonomy without budgets, rate limits, and kill switches can turn a useful agent into an expensive runaway job.
  • Subreddit: r/ClaudeAI
  • URL: https://www.reddit.com/r/ClaudeAI/comments/1t1o43w/i_gave_claude_code_a_002call_coworker_and_stopped/
  • Approx engagement: visible Reddit score≈1205; comments=120; upvote_ratio=0.96; posted=2026-05-02 12:07 UTC
  • Why it is resonating: The post gives a practical pattern for multi-model agent delegation: let Claude Code orchestrate while cheaper models handle bulk reads and boilerplate. It is resonating because developers facing usage caps want cost-aware agent workflows, not just bigger context windows.
  • Subreddit: r/LocalLLaMA
  • URL: https://www.reddit.com/r/LocalLLaMA/comments/1t1n6o8/we_are_finally_there_qwen3627b_agentic_search_957/
  • Approx engagement: visible Reddit score≈384; comments=73; upvote_ratio=0.94; posted=2026-05-02 11:21 UTC
  • Why it is resonating: Local-agent builders are reacting to the claim that Qwen3.6 plus agentic search can hit strong SimpleQA performance on a single RTX 3090. The appeal is obvious: credible agentic retrieval without depending on hosted APIs aligns with privacy, cost, and tinkering priorities in r/LocalLLaMA.
  • Subreddit: r/LocalLLaMA
  • URL: https://www.reddit.com/r/LocalLLaMA/comments/1swhw84/what_is_the_best_coding_agent_cli_like_claude/
  • Approx engagement: visible Reddit score≈169; comments=163; upvote_ratio=0.94; posted=2026-04-26 20:00 UTC
  • Why it is resonating: This discussion keeps drawing comments because many local-LLM users want a Claude-Code-like CLI agent that works with llama.cpp or local Qwen models. The high comment count shows the tooling gap: model quality is improving, but ergonomic local coding agents remain hard to set up.
  • Subreddit: r/singularity
  • URL: https://www.reddit.com/r/singularity/comments/1sz4h4g/engineering_teams_celebrating_agentic_workflows/
  • Approx engagement: visible Reddit score≈890; comments=32; upvote_ratio=0.96; posted=2026-04-29 16:51 UTC
  • Why it is resonating: A meme about agentic workflows finally returning the same result twice struck a nerve because it compresses a real reliability concern into one joke. The high score and upvote ratio suggest broad agreement that determinism and repeatability are still weak spots for production agents.
  • Subreddit: r/ArtificialInteligence
  • URL: https://www.reddit.com/r/ArtificialInteligence/comments/1sxnnzf/uhoh_pocketos_founder_jer_crane_reported_that_a/
  • Approx engagement: visible Reddit score≈324; comments=140; upvote_ratio=0.94; posted=2026-04-28 01:52 UTC
  • Why it is resonating: The alleged Cursor/Claude database-deletion incident is spreading because it is a vivid failure mode: an agent with broad permissions can convert a staging fix into a production outage. It resonates with both skeptics and practitioners because it makes guardrails, scoped credentials, backups, and human approval feel concrete rather than theoretical.
  • Subreddit: r/AI_Agents
  • URL: https://www.reddit.com/r/AI_Agents/comments/1sxpslr/after_automating_workflows_for_30_professional/
  • Approx engagement: visible Reddit score≈193; comments=69; upvote_ratio=0.96; posted=2026-04-28 03:27 UTC
  • Why it is resonating: This post resonates by pushing against agent hype from a services-automation perspective: the author argues many repeatable business tasks need deterministic workflows, not autonomous agents. Its popularity shows an emerging buyer-side filter—people want ROI and reliability before adopting agent branding.
  • Subreddit: r/MachineLearning
  • URL: https://www.reddit.com/r/MachineLearning/comments/1sx3p40/how_do_you_test_ai_agents_in_production_the/
  • Approx engagement: visible Reddit score≈41; comments=42; upvote_ratio=0.89; posted=2026-04-27 13:28 UTC
  • Why it is resonating: The production-testing question is resonating because it names the core QA challenge for agents: multi-step, non-deterministic behavior breaks traditional input-output assertions. The balanced score/comments ratio indicates a practitioner-heavy thread where people are debating evals, traces, regression suites, and guardrails.
  • Subreddit: r/SaaS
  • URL: https://www.reddit.com/r/SaaS/comments/1sy8hxy/people_keep_asking_how_i_can_be_stupid_enough_to/
  • Approx engagement: visible Reddit score≈215; comments=84; upvote_ratio=0.92; posted=2026-04-28 17:41 UTC
  • Why it is resonating: Although framed as a SaaS-defense essay, the post opens with the common claim that “AI agents are going to replace every app,” which makes it relevant to agent market narratives. It resonates because founders are debating whether agents kill SaaS or simply become another interface and automation layer on top of durable software businesses.

CSV companion

A machine-readable CSV version is included as reddit_ai_agent_posts_report.csv with the same rows and the raw engagement fields.

Notes

  • Reddit score is an approximate public score, not a guaranteed exact upvote count.
  • I did not use private Reddit APIs, credentials, or non-public data.
  • I did not post, vote, comment, or perform any social-platform action.
{
"built_utc": "2026-05-03 14:12 UTC",
"user_agent": "Mozilla/5.0 (Hermes AgentHansa research cron; public Reddit JSON; no auth)",
"selected_ids": [
"1t18eeh",
"1t11mmy",
"1t1o43w",
"1t1n6o8",
"1swhw84",
"1sz4h4g",
"1sxnnzf",
"1sxpslr",
"1sx3p40",
"1sy8hxy"
],
"quality_gate": {
"count": 10,
"unique_subreddits": [
"AI_Agents",
"ArtificialInteligence",
"ClaudeAI",
"LocalLLaMA",
"MachineLearning",
"SaaS",
"singularity"
]
},
"posts": [
{
"approved_at_utc": null,
"subreddit": "ClaudeAI",
"selftext": "On April 5th I shipped a Claude Code skill called graphify. Type /graphify . and it reads every file in your repo, builds a knowledge graph with Leiden community detection, and gives Claude persistent memory of your entire codebase. 71x fewer tokens per query vs reading raw files.\n\n26 days later: 450k+ PyPI downloads, \\~40k GitHub stars, GitHub global rank #2 (first week), Medium articles, YouTube tutorials, people building on top of it I’ve never talked to.\n\nWhat caught me off guard: people aren’t just using it for code. They’re dropping SQL schemas, Obsidian vaults, research paper corpora, transcribed meeting recordings, even whiteboard photos into it and querying across all of it. \n\nThe /graphify query \"...\" command became the main thing.\n\nTwo questions for this community: \n\t1.\tHow are you actually using it? What’s the weirdest or most useful thing you’ve thrown at it? \n\t2.\tWhat’s missing or broken in your workflow?",
"author_fullname": "t2_9wnphspf",
"saved": false,
"mod_reason_title": null,
"gilded": 0,
"clicked": false,
"title": "I built /graphify, 26 days, 450k+ downloads, ~40k stars. Here’s what I didn’t expect.",
"link_flair_richtext": [
{
"e": "text",
"t": "Built with Claude"
}
],
"subreddit_name_prefixed": "r/ClaudeAI",
"hidden": false,
"pwls": 6,
"link_flair_css_class": "",
"downs": 0,
"thumbnail_height": 70,
"top_awarded_type": null,
"hide_score": false,
"name": "t3_1t18eeh",
"quarantine": false,
"link_flair_text_color": "light",
"upvote_ratio": 0.9,
"author_flair_background_color": null,
"ups": 1604,
"total_awards_received": 0,
"media_embed": {},
"thumbnail_width": 140,
"author_flair_template_id": null,
"is_original_content": false,
"user_reports": [],
"secure_media": null,
"is_reddit_media_domain": false,
"is_meta": false,
"category": null,
"secure_media_embed": {},
"link_flair_text": "Built with Claude",
"can_mod_post": false,
"score": 1604,
"approved_by": null,
"is_created_from_ads_ui": false,
"author_premium": false,
"thumbnail": "https://external-preview.redd.it/epJmiEl4zZZaNpT_OVrek0-Q3R0pWDeGsREsrI3kacA.png?width=140&height=70&auto=webp&s=8ea03cac540a1e401f72fef5ad69d482dda4d001",
"edited": false,
"author_flair_css_class": null,
"author_flair_richtext": [],
"gildings": {},
"post_hint": "link",
"content_categories": null,
"is_self": false,
"subreddit_type": "public",
"created": 1777675446.0,
"link_flair_type": "richtext",
"wls": 6,
"removed_by_category": null,
"banned_by": null,
"author_flair_type": "text",
"domain": "github.com",
"allow_live_comments": false,
"selftext_html": "<!-- SC_OFF --><div class=\"md\"><p>On April 5th I shipped a Claude Code skill called graphify. Type /graphify . and it reads every file in your repo, builds a knowledge graph with Leiden community detection, and gives Claude persistent memory of your entire codebase. 71x fewer tokens per query vs reading raw files.</p>\n\n<p>26 days later: 450k+ PyPI downloads, ~40k GitHub stars, GitHub global rank #2 (first week), Medium articles, YouTube tutorials, people building on top of it I’ve never talked to.</p>\n\n<p>What caught me off guard: people aren’t just using it for code. They’re dropping SQL schemas, Obsidian vaults, research paper corpora, transcribed meeting recordings, even whiteboard photos into it and querying across all of it. </p>\n\n<p>The /graphify query "..." command became the main thing.</p>\n\n<p>Two questions for this community:<br/>\n 1. How are you actually using it? What’s the weirdest or most useful thing you’ve thrown at it?<br/>\n 2. What’s missing or broken in your workflow?</p>\n</div><!-- SC_ON -->",
"likes": null,
"suggested_sort": null,
"banned_at_utc": null,
"url_overridden_by_dest": "https://github.com/safishamsi/graphify",
"view_count": null,
"archived": false,
"no_follow": false,
"is_crosspostable": false,
"pinned": false,
"over_18": false,
"preview": {
"images": [
{
"source": {
"url": "https://external-preview.redd.it/epJmiEl4zZZaNpT_OVrek0-Q3R0pWDeGsREsrI3kacA.png?auto=webp&s=9dbce41cd828a585774a4b2f8de18a10688f513c",
"width": 1200,
"height": 600
},
"resolutions": [
{
"url": "https://external-preview.redd.it/epJmiEl4zZZaNpT_OVrek0-Q3R0pWDeGsREsrI3kacA.png?width=108&crop=smart&auto=webp&s=45560979a9a83dfec758186719bc99d09ab28a18",
"width": 108,
"height": 54
},
{
"url": "https://external-preview.redd.it/epJmiEl4zZZaNpT_OVrek0-Q3R0pWDeGsREsrI3kacA.png?width=216&crop=smart&auto=webp&s=d8155b3b33f750b7f9cd5b8bafe2f291fd4f0d3a",
"width": 216,
"height": 108
},
{
"url": "https://external-preview.redd.it/epJmiEl4zZZaNpT_OVrek0-Q3R0pWDeGsREsrI3kacA.png?width=320&crop=smart&auto=webp&s=c3f27fd7d3f0784d732bb93b225c8bc412b62cd4",
"width": 320,
"height": 160
},
{
"url": "https://external-preview.redd.it/epJmiEl4zZZaNpT_OVrek0-Q3R0pWDeGsREsrI3kacA.png?width=640&crop=smart&auto=webp&s=ec6584715d130a5a9ed07c00ecbe3bc2144be9e0",
"width": 640,
"height": 320
},
{
"url": "https://external-preview.redd.it/epJmiEl4zZZaNpT_OVrek0-Q3R0pWDeGsREsrI3kacA.png?width=960&crop=smart&auto=webp&s=2c5e57b491178ba26077f96e8f350f27d2e1bf38",
"width": 960,
"height": 480
},
{
"url": "https://external-preview.redd.it/epJmiEl4zZZaNpT_OVrek0-Q3R0pWDeGsREsrI3kacA.png?width=1080&crop=smart&auto=webp&s=45eb813d36d96bd24e765c160ef2f8e41136c44f",
"width": 1080,
"height": 540
}
],
"variants": {},
"id": "epJmiEl4zZZaNpT_OVrek0-Q3R0pWDeGsREsrI3kacA"
}
],
"enabled": false
},
"all_awardings": [],
"awarders": [],
"media_only": false,
"link_flair_template_id": "b1f30a12-6938-11f0-aad7-6eea404d38fd",
"can_gild": false,
"spoiler": false,
"locked": false,
"author_flair_text": null,
"treatment_tags": [],
"visited": false,
"removed_by": null,
"mod_note": null,
"distinguished": null,
"subreddit_id": "t5_7t8hvt",
"author_is_blocked": false,
"mod_reason_by": null,
"num_reports": null,
"removal_reason": null,
"link_flair_background_color": "#fe7701",
"id": "1t18eeh",
"is_robot_indexable": true,
"report_reasons": null,
"author": "captainkink07",
"discussion_type": null,
"num_comments": 194,
"send_replies": true,
"contest_mode": false,
"mod_reports": [],
"author_patreon_flair": false,
"author_flair_text_color": null,
"permalink": "/r/ClaudeAI/comments/1t18eeh/i_built_graphify_26_days_450k_downloads_40k_stars/",
"stickied": false,
"url": "https://github.com/safishamsi/graphify",
"subreddit_subscribers": 813533,
"created_utc": 1777675446.0,
"num_crossposts": 4,
"media": null,
"is_video": false,
"_source_url": "https://www.reddit.com/by_id/t3_1t18eeh.json",
"_insight": "This resonates because it turns a concrete Claude Code skill into a proof point for agent memory: a repo knowledge graph promises fewer tokens and better continuity across coding-agent sessions. The huge response suggests builders are actively hunting for ways to make agents remember codebases rather than reread files every task.",
"_discovery": "r/ClaudeAI hot/search JSON; query Claude Code / agent memory; high current engagement"
},
{
"approved_at_utc": null,
"subreddit": "ClaudeAI",
"selftext": "Last week I woke up to an email saying my Claude usage limit was gone. I hadn't done anything unusual — or so I thought.\n\nAfter digging through the local session logs, I found the culprit: a single /loop command I had set the night before to check my open PRs every 30 minutes. I forgot about it. It ran 46 times over 26 hours, unattended, overnight, on claude-opus-4-7. Two sessions — the loop and a long analytics session I had left open — together burned through roughly $6,000 before I woke up.\n\nHere's the thing though. The Anthropic dashboard still showed a fraction of that when I checked it manually. The dashboard has a multi-day reporting lag, so I had no idea anything was wrong until the limit email landed.\n\n***Why did it cost so much? The part most people don't know.***\n\nEvery Claude API call sends your entire conversation history — not just the latest message. Turn 1 sends a few hundred tokens. Turn 46 sends 800,000 tokens. The context window limit is just a ceiling; you pay for everything sent on every turn.\n\nTo make this cheaper, Anthropic uses prompt caching: if your conversation history was already sent recently, they serve it from cache at a 12.5× discount instead of charging you full price again.\n\nThe catch: cache entries expire after \\~5 minutes of inactivity. (Earlier it was 1 hour)\n\nSo here's what happens with /loop 30m:\n\n* Loop fires → history gets cached → 30 minutes pass → cache expires\n* Loop fires again → cache is gone → must re-cache the entire conversation from scratch at the expensive write rate\n* Each iteration also adds its own output to the conversation, so the next re-cache is even larger\n\nBy hour 20, the conversation had grown to \\~800K tokens. Every overnight iteration was paying to re-cache 800K tokens at the expensive write rate. The actual PR check responses were a rounding error compared to this.\n\n***What I'd do differently***\n\n1. Always add a stop condition to /loop. Instead of: /loop 30m check my PRs. Write: /loop 30m check my PRs — stop when all are merged or after 3 hour. Claude will terminate the loop itself when the condition is met.2. Use Sonnet for unattended tasks, not Opus: Opus is roughly 5× more expensive per output token. For automated polling tasks like PR checks, Sonnet handles it fine. Save Opus for the work where you're actually present and the quality difference matters.\n2. Don't trust the dashboard as a real-time budget gauge: Anthropic's usage dashboard can lag by days. By the time it shows a spike, the money is already spent. The limit notification email may be your only real-time signal.\n3. Know that long-lived sessions aren't free: Keeping one big session alive for automated tasks doesn't save money through caching — it makes it worse. Every automated call with a gap >5 minutes pays to re-cache the entire growing context. Starting a fresh session is often cheaper.\n4. max\\_turns is not a loop limiter: max\\_turns caps the tool-call chain within a single iteration. It has no effect on how many times the loop fires. The only built-in expiry on /loop is a 7-day auto-deletion.\n5. The loop runs in main conversation so if you keep using the same session and then loop starts executing, the more token then necessary will be read/write to the cache on every loop.\n\nEdit:\nThanks everyone for overwhelming response and focusing on \"the post is AI written so it's a slop and author is an idiot\". Now based on few comments, let me add more details:\n1. I agree with everyone that I should have used hooks but corporate generally blocks third party mcps because of security so there is no easy way to hook external events into local sessions. Although I will take \"use bash scripts over claude loop\" seriously. \n2. This was not a single session or single loop command. What I meant by \"single command\" is /loop. I use claude on vms and local machine and so the loop command was running across different sessions in parallel. \n3. I agree that \"most people don't about\" thing was not a good thing to start the post but it was for the loop + cache window restricted to 5 mins. I have used loops earlier as well but 5 min vs 1h cache affect the price a lot . You can go and find many open issues on Claude related to this change. \n4. This post's goal was to share a TIL moment about using short , uncapped loops or schedules using Claude and educating that cache read/writes can affect your token cost more than anything else. But looks like we are very far from there. \n5. Thanks to the guy who shared Pyramid writing medium blog. I will definitely use for the next post. \n6. To be honest, I am quite disappointed that 90% people just care about post is written by AI over actual issue. But I guess I get that, everyone is exhausted from reading AI slop.",
"author_fullname": "t2_hygtoabeo",
"saved": false,
"mod_reason_title": null,
"gilded": 0,
"clicked": false,
"title": "I accidentally burned ~$6,000 of Claude usage overnight with one command.",
"link_flair_richtext": [
{
"e": "text",
"t": "Claude Code"
}
],
"subreddit_name_prefixed": "r/ClaudeAI",
"hidden": false,
"pwls": 6,
"link_flair_css_class": "",
"downs": 0,
"thumbnail_height": null,
"top_awarded_type": null,
"hide_score": false,
"name": "t3_1t11mmy",
"quarantine": false,
"link_flair_text_color": "dark",
"upvote_ratio": 0.82,
"author_flair_background_color": null,
"subreddit_type": "public",
"ups": 1138,
"total_awards_received": 0,
"media_embed": {},
"thumbnail_width": null,
"author_flair_template_id": null,
"is_original_content": false,
"user_reports": [],
"secure_media": null,
"is_reddit_media_domain": false,
"is_meta": false,
"category": null,
"secure_media_embed": {},
"link_flair_text": "Claude Code",
"can_mod_post": false,
"score": 1138,
"approved_by": null,
"is_created_from_ads_ui": false,
"author_premium": false,
"thumbnail": "self",
"edited": 1777694977.0,
"author_flair_css_class": null,
"author_flair_richtext": [],
"gildings": {},
"content_categories": null,
"is_self": true,
"mod_note": null,
"created": 1777659966.0,
"link_flair_type": "richtext",
"wls": 6,
"removed_by_category": null,
"banned_by": null,
"author_flair_type": "text",
"domain": "self.ClaudeAI",
"allow_live_comments": false,
"selftext_html": "<!-- SC_OFF --><div class=\"md\"><p>Last week I woke up to an email saying my Claude usage limit was gone. I hadn't done anything unusual — or so I thought.</p>\n\n<p>After digging through the local session logs, I found the culprit: a single /loop command I had set the night before to check my open PRs every 30 minutes. I forgot about it. It ran 46 times over 26 hours, unattended, overnight, on claude-opus-4-7. Two sessions — the loop and a long analytics session I had left open — together burned through roughly $6,000 before I woke up.</p>\n\n<p>Here's the thing though. The Anthropic dashboard still showed a fraction of that when I checked it manually. The dashboard has a multi-day reporting lag, so I had no idea anything was wrong until the limit email landed.</p>\n\n<p><strong><em>Why did it cost so much? The part most people don't know.</em></strong></p>\n\n<p>Every Claude API call sends your entire conversation history — not just the latest message. Turn 1 sends a few hundred tokens. Turn 46 sends 800,000 tokens. The context window limit is just a ceiling; you pay for everything sent on every turn.</p>\n\n<p>To make this cheaper, Anthropic uses prompt caching: if your conversation history was already sent recently, they serve it from cache at a 12.5× discount instead of charging you full price again.</p>\n\n<p>The catch: cache entries expire after ~5 minutes of inactivity. (Earlier it was 1 hour)</p>\n\n<p>So here's what happens with /loop 30m:</p>\n\n<ul>\n<li>Loop fires → history gets cached → 30 minutes pass → cache expires</li>\n<li>Loop fires again → cache is gone → must re-cache the entire conversation from scratch at the expensive write rate</li>\n<li>Each iteration also adds its own output to the conversation, so the next re-cache is even larger</li>\n</ul>\n\n<p>By hour 20, the conversation had grown to ~800K tokens. Every overnight iteration was paying to re-cache 800K tokens at the expensive write rate. The actual PR check responses were a rounding error compared to this.</p>\n\n<p><strong><em>What I'd do differently</em></strong></p>\n\n<ol>\n<li>Always add a stop condition to /loop. Instead of: /loop 30m check my PRs. Write: /loop 30m check my PRs — stop when all are merged or after 3 hour. Claude will terminate the loop itself when the condition is met.2. Use Sonnet for unattended tasks, not Opus: Opus is roughly 5× more expensive per output token. For automated polling tasks like PR checks, Sonnet handles it fine. Save Opus for the work where you're actually present and the quality difference matters.</li>\n<li>Don't trust the dashboard as a real-time budget gauge: Anthropic's usage dashboard can lag by days. By the time it shows a spike, the money is already spent. The limit notification email may be your only real-time signal.</li>\n<li>Know that long-lived sessions aren't free: Keeping one big session alive for automated tasks doesn't save money through caching — it makes it worse. Every automated call with a gap >5 minutes pays to re-cache the entire growing context. Starting a fresh session is often cheaper.</li>\n<li>max_turns is not a loop limiter: max_turns caps the tool-call chain within a single iteration. It has no effect on how many times the loop fires. The only built-in expiry on /loop is a 7-day auto-deletion.</li>\n<li>The loop runs in main conversation so if you keep using the same session and then loop starts executing, the more token then necessary will be read/write to the cache on every loop.</li>\n</ol>\n\n<p>Edit:\nThanks everyone for overwhelming response and focusing on "the post is AI written so it's a slop and author is an idiot". Now based on few comments, let me add more details:\n1. I agree with everyone that I should have used hooks but corporate generally blocks third party mcps because of security so there is no easy way to hook external events into local sessions. Although I will take "use bash scripts over claude loop" seriously. \n2. This was not a single session or single loop command. What I meant by "single command" is /loop. I use claude on vms and local machine and so the loop command was running across different sessions in parallel. \n3. I agree that "most people don't about" thing was not a good thing to start the post but it was for the loop + cache window restricted to 5 mins. I have used loops earlier as well but 5 min vs 1h cache affect the price a lot . You can go and find many open issues on Claude related to this change. \n4. This post's goal was to share a TIL moment about using short , uncapped loops or schedules using Claude and educating that cache read/writes can affect your token cost more than anything else. But looks like we are very far from there. \n5. Thanks to the guy who shared Pyramid writing medium blog. I will definitely use for the next post. \n6. To be honest, I am quite disappointed that 90% people just care about post is written by AI over actual issue. But I guess I get that, everyone is exhausted from reading AI slop.</p>\n</div><!-- SC_ON -->",
"likes": null,
"suggested_sort": null,
"banned_at_utc": null,
"view_count": null,
"archived": false,
"no_follow": false,
"is_crosspostable": false,
"pinned": false,
"over_18": false,
"all_awardings": [],
"awarders": [],
"media_only": false,
"link_flair_template_id": "a35c0c72-4094-11f1-9d0f-1a7a82a8c650",
"can_gild": false,
"spoiler": false,
"locked": false,
"author_flair_text": null,
"treatment_tags": [],
"visited": false,
"removed_by": null,
"num_reports": null,
"distinguished": null,
"subreddit_id": "t5_7t8hvt",
"author_is_blocked": false,
"mod_reason_by": null,
"removal_reason": null,
"link_flair_background_color": "#d87757",
"id": "1t11mmy",
"is_robot_indexable": true,
"report_reasons": null,
"author": "procrastinator_eng",
"discussion_type": null,
"num_comments": 298,
"send_replies": false,
"contest_mode": false,
"mod_reports": [],
"author_patreon_flair": false,
"author_flair_text_color": null,
"permalink": "/r/ClaudeAI/comments/1t11mmy/i_accidentally_burned_6000_of_claude_usage/",
"stickied": false,
"url": "https://www.reddit.com/r/ClaudeAI/comments/1t11mmy/i_accidentally_burned_6000_of_claude_usage/",
"subreddit_subscribers": 813533,
"created_utc": 1777659966.0,
"num_crossposts": 2,
"media": null,
"is_video": false,
"_source_url": "https://www.reddit.com/by_id/t3_1t11mmy.json",
"_insight": "The thread is resonating as a cautionary cost-control story: one unattended loop command reportedly ran agentic checks overnight and produced a large Claude bill. Commenters are drawn to the operational risk theme—autonomy without budgets, rate limits, and kill switches can turn a useful agent into an expensive runaway job.",
"_discovery": "r/ClaudeAI hot/search JSON; query Claude Code / autonomous agent; high current engagement"
},
{
"approved_at_utc": null,
"subreddit": "ClaudeAI",
"selftext": "Was hitting my weekly Pro limit by Wednesday every single week. Tried compact, Sonnet for simple tasks, tighter prompts — nothing worked. \n\nBuilt a simple pattern: CLI scripts that delegate bulk file reading and boilerplate generation to Kimi K2.5 (any cheap model works). Claude calls them via Bash tool. [CLAUDE.md](http://CLAUDE.md) has routing rules for when to delegate vs when to use Claude's own intelligence. \n\nResults after 3 weeks: \n\n1. Haven't hit limits once \n\n2. Kimi total spend: $0.38 \n\n3. Documentation updates went from \\~5000 tokens to \\~200 tokens \n\nWrote up the full implementation with code: [https://medium.com/@kunalbhardwaj598/i-was-burning-through-claude-codes-weekly-limit-in-3-days-here-s-how-i-fixed-it-0344c555abda](https://medium.com/@kunalbhardwaj598/i-was-burning-through-claude-codes-weekly-limit-in-3-days-here-s-how-i-fixed-it-0344c555abda)\n\nHappy to answer questions about the setup. ",
"author_fullname": "t2_1p0pp8usaw",
"saved": false,
"mod_reason_title": null,
"gilded": 0,
"clicked": false,
"title": "I gave Claude Code a $0.02/call coworker and stopped hitting Pro limits — here's the full setup",
"link_flair_richtext": [
{
"e": "text",
"t": "Claude Code"
}
],
"subreddit_name_prefixed": "r/ClaudeAI",
"hidden": false,
"pwls": 6,
"link_flair_css_class": "",
"downs": 0,
"thumbnail_height": null,
"top_awarded_type": null,
"hide_score": false,
"name": "t3_1t1o43w",
"quarantine": false,
"link_flair_text_color": "dark",
"upvote_ratio": 0.96,
"author_flair_background_color": null,
"subreddit_type": "public",
"ups": 1205,
"total_awards_received": 0,
"media_embed": {},
"thumbnail_width": null,
"author_flair_template_id": null,
"is_original_content": false,
"user_reports": [],
"secure_media": null,
"is_reddit_media_domain": false,
"is_meta": false,
"category": null,
"secure_media_embed": {},
"link_flair_text": "Claude Code",
"can_mod_post": false,
"score": 1205,
"approved_by": null,
"is_created_from_ads_ui": false,
"author_premium": false,
"thumbnail": "self",
"edited": false,
"author_flair_css_class": null,
"author_flair_richtext": [],
"gildings": {},
"post_hint": "self",
"content_categories": null,
"is_self": true,
"mod_note": null,
"created": 1777723672.0,
"link_flair_type": "richtext",
"wls": 6,
"removed_by_category": null,
"banned_by": null,
"author_flair_type": "text",
"domain": "self.ClaudeAI",
"allow_live_comments": false,
"selftext_html": "<!-- SC_OFF --><div class=\"md\"><p>Was hitting my weekly Pro limit by Wednesday every single week. Tried compact, Sonnet for simple tasks, tighter prompts — nothing worked. </p>\n\n<p>Built a simple pattern: CLI scripts that delegate bulk file reading and boilerplate generation to Kimi K2.5 (any cheap model works). Claude calls them via Bash tool. <a href=\"http://CLAUDE.md\">CLAUDE.md</a> has routing rules for when to delegate vs when to use Claude's own intelligence. </p>\n\n<p>Results after 3 weeks: </p>\n\n<ol>\n<li><p>Haven't hit limits once </p></li>\n<li><p>Kimi total spend: $0.38 </p></li>\n<li><p>Documentation updates went from ~5000 tokens to ~200 tokens </p></li>\n</ol>\n\n<p>Wrote up the full implementation with code: <a href=\"https://medium.com/@kunalbhardwaj598/i-was-burning-through-claude-codes-weekly-limit-in-3-days-here-s-how-i-fixed-it-0344c555abda\">https://medium.com/@kunalbhardwaj598/i-was-burning-through-claude-codes-weekly-limit-in-3-days-here-s-how-i-fixed-it-0344c555abda</a></p>\n\n<p>Happy to answer questions about the setup. </p>\n</div><!-- SC_ON -->",
"likes": null,
"suggested_sort": null,
"banned_at_utc": null,
"view_count": null,
"archived": false,
"no_follow": false,
"is_crosspostable": false,
"pinned": false,
"over_18": false,
"preview": {
"images": [
{
"source": {
"url": "https://external-preview.redd.it/g5Wcir1mIgxeW0XpH0wr6OyeyP5mvGuYFjKzjKRnp70.png?auto=webp&s=17d4d388f79414cbe4a90b58791dbf4097d0248d",
"width": 1200,
"height": 630
},
"resolutions": [
{
"url": "https://external-preview.redd.it/g5Wcir1mIgxeW0XpH0wr6OyeyP5mvGuYFjKzjKRnp70.png?width=108&crop=smart&auto=webp&s=7c06fb1de7f0c8c0682bb20d2703c4676bef0684",
"width": 108,
"height": 56
},
{
"url": "https://external-preview.redd.it/g5Wcir1mIgxeW0XpH0wr6OyeyP5mvGuYFjKzjKRnp70.png?width=216&crop=smart&auto=webp&s=8b91d9527dd74afb255aff216211fb31c394e5b2",
"width": 216,
"height": 113
},
{
"url": "https://external-preview.redd.it/g5Wcir1mIgxeW0XpH0wr6OyeyP5mvGuYFjKzjKRnp70.png?width=320&crop=smart&auto=webp&s=ed5dbcca8fa4dede947e4e0960a83e35e2fdc35b",
"width": 320,
"height": 168
},
{
"url": "https://external-preview.redd.it/g5Wcir1mIgxeW0XpH0wr6OyeyP5mvGuYFjKzjKRnp70.png?width=640&crop=smart&auto=webp&s=b5fbc3bab1e298edde1ce2ec77defe2aed87361d",
"width": 640,
"height": 336
},
{
"url": "https://external-preview.redd.it/g5Wcir1mIgxeW0XpH0wr6OyeyP5mvGuYFjKzjKRnp70.png?width=960&crop=smart&auto=webp&s=c95e5975954ec7b4fa0c4b1b218f2d424167bfa5",
"width": 960,
"height": 504
},
{
"url": "https://external-preview.redd.it/g5Wcir1mIgxeW0XpH0wr6OyeyP5mvGuYFjKzjKRnp70.png?width=1080&crop=smart&auto=webp&s=511af6812f515bf26c72045b137a12975ce9f114",
"width": 1080,
"height": 567
}
],
"variants": {},
"id": "g5Wcir1mIgxeW0XpH0wr6OyeyP5mvGuYFjKzjKRnp70"
}
],
"enabled": false
},
"all_awardings": [],
"awarders": [],
"media_only": false,
"link_flair_template_id": "a35c0c72-4094-11f1-9d0f-1a7a82a8c650",
"can_gild": false,
"spoiler": false,
"locked": false,
"author_flair_text": null,
"treatment_tags": [],
"visited": false,
"removed_by": null,
"num_reports": null,
"distinguished": null,
"subreddit_id": "t5_7t8hvt",
"author_is_blocked": false,
"mod_reason_by": null,
"removal_reason": null,
"link_flair_background_color": "#d87757",
"id": "1t1o43w",
"is_robot_indexable": true,
"report_reasons": null,
"author": "More-Hunter-3457",
"discussion_type": null,
"num_comments": 120,
"send_replies": true,
"contest_mode": false,
"mod_reports": [],
"author_patreon_flair": false,
"author_flair_text_color": null,
"permalink": "/r/ClaudeAI/comments/1t1o43w/i_gave_claude_code_a_002call_coworker_and_stopped/",
"stickied": false,
"url": "https://www.reddit.com/r/ClaudeAI/comments/1t1o43w/i_gave_claude_code_a_002call_coworker_and_stopped/",
"subreddit_subscribers": 813533,
"created_utc": 1777723672.0,
"num_crossposts": 0,
"media": null,
"is_video": false,
"_source_url": "https://www.reddit.com/by_id/t3_1t1o43w.json",
"_insight": "The post gives a practical pattern for multi-model agent delegation: let Claude Code orchestrate while cheaper models handle bulk reads and boilerplate. It is resonating because developers facing usage caps want cost-aware agent workflows, not just bigger context windows.",
"_discovery": "r/ClaudeAI hot/search JSON; query Claude Code / coding agent; high current engagement"
},
{
"approved_at_utc": null,
"subreddit": "LocalLLaMA",
"selftext": "LDR maintainer here. Thanks to the strong support of r/LocalLLaMA community LDR got very far. I haven't reported in a while because I thought I was not ready for another prominent post in one of the leading outlets of Local LLM research.\n\nBut I think the LDR community finally there again. I think it is finally time to report again.\n\n**Setup**\n\n* RTX 3090, 24GB\n* Ollama backend (qwen3.6:27b)\n* LDR's `langgraph_agent` strategy — LangChain `create_agent()` with tool-calling, parallel subtopic decomposition, up to 50 iterations\n* LLM grader: qwen3.6:27b self-graded (I have used opus to review examples and it generally only underestimates accuracy)\n\n**Benchmarks (fully local LLM with web search)**\n\n|Model|SimpleQA|xbench-DeepSearch|\n|:-|:-|:-|\n|Qwen3.6-27B|95.7% (287/300)|77.0% (77/100)|\n|**Qwen3.5-9B**|91.2% (182/200)|59.0% (59/100)|\n|gpt-oss-20B|85.4% (295/346)|–|\n\nsample size is small, but the benchmarks were not rerun multiple times and you can see from the other rows that this is unlikely just chance. Full leaderboard: [https://huggingface.co/datasets/local-deep-research/ldr-benchmarks](https://huggingface.co/datasets/local-deep-research/ldr-benchmarks)\n\n**Important framing — these are** ***agent + search*** **scores, not closed-book**\n\nHowever, also note that these are similar benchmarks results to Perplexity Deep Research (93.9%), tavily (93.3%) etc. \\[Tavily forces the LLM to answer *only* from retrieved docs (pure retrieval test). Perplexity Deep Research is an end-to-end agent and discloses no grader or sample size. \\]\n\n**Even if our results where only 90% it would already be a great success.**\n\nAlso I can confirm from using it daily that these results feel consistent with my performance on random querries I do for daily questions.\n\n**Caveats:**\n\n* SimpleQA contamination risk on newer base models is real\n* LLM-judge noise + Sampling error\n* bench-DeepSearch is in chinese so an advantage for the chinese qwen models\n* No BrowseComp / GAIA numbers yet - But I also dont believe we are good at this benchmark yet. I will have to run some benchmarks to verify the current state\n\n**The thing that surprised me:**\n\nResults seem to track tool-calling quality more than raw size for local deep research. The `langgraph_agent` strategy hammers the model with multi-iteration tool calls, parallel subagent decomposition, and structured output — exactly the axis where the newer Qwen generations have improved most. Hypothesis only; if anyone wants to design an ablation we'd love the data.\n\n**Some cool LDR features that I want to additionally highlight:**\n\n* **Journal Quality System** (shipped v1.6.0) - academic source grading using OpenAlex, DOAJ. I haven't seen this anywhere else in the open-source deep-research space.\n* Per-user **SQLCipher AES-256 DB** (PBKDF2-HMAC-SHA512, 256k iterations) — admins can't read your data at rest. No password recovery; we don't hold the keys.\n* **Zero telemetry.** no telemetry, no analytics, no tracking. \n* **Cosign-signed Docker images** with SLSA provenance + SBOMs.\n* **MIT licensed.** Everything open source\n\nRepo: [https://github.com/LearningCircuit/local-deep-research](https://github.com/LearningCircuit/local-deep-research)\n\nHappy to share strategy configs, help reproduce the Qwen runs\n\nThanks to all the academic and other open source foundational work that made this repo possible.\n\n# ",
"author_fullname": "t2_a9uut",
"saved": false,
"mod_reason_title": null,
"gilded": 0,
"clicked": false,
"title": "We are finally there: Qwen3.6-27B + agentic search; 95.7% SimpleQA on a single 3090, fully local",
"link_flair_richtext": [
{
"e": "text",
"t": "Other"
}
],
"subreddit_name_prefixed": "r/LocalLLaMA",
"hidden": false,
"pwls": 6,
"link_flair_css_class": "",
"downs": 0,
"thumbnail_height": null,
"top_awarded_type": null,
"hide_score": false,
"name": "t3_1t1n6o8",
"quarantine": false,
"link_flair_text_color": "light",
"upvote_ratio": 0.94,
"author_flair_background_color": null,
"subreddit_type": "public",
"ups": 384,
"total_awards_received": 0,
"media_embed": {},
"thumbnail_width": null,
"author_flair_template_id": null,
"is_original_content": false,
"user_reports": [],
"secure_media": null,
"is_reddit_media_domain": false,
"is_meta": false,
"category": null,
"secure_media_embed": {},
"link_flair_text": "Other",
"can_mod_post": false,
"score": 384,
"approved_by": null,
"is_created_from_ads_ui": false,
"author_premium": false,
"thumbnail": "self",
"edited": 1777721434.0,
"author_flair_css_class": null,
"author_flair_richtext": [],
"gildings": {},
"post_hint": "self",
"content_categories": null,
"is_self": true,
"mod_note": null,
"created": 1777720885.0,
"link_flair_type": "richtext",
"wls": 6,
"removed_by_category": null,
"banned_by": null,
"author_flair_type": "text",
"domain": "self.LocalLLaMA",
"allow_live_comments": false,
"selftext_html": "<!-- SC_OFF --><div class=\"md\"><p>LDR maintainer here. Thanks to the strong support of <a href=\"/r/LocalLLaMA\">r/LocalLLaMA</a> community LDR got very far. I haven't reported in a while because I thought I was not ready for another prominent post in one of the leading outlets of Local LLM research.</p>\n\n<p>But I think the LDR community finally there again. I think it is finally time to report again.</p>\n\n<p><strong>Setup</strong></p>\n\n<ul>\n<li>RTX 3090, 24GB</li>\n<li>Ollama backend (qwen3.6:27b)</li>\n<li>LDR's <code>langgraph_agent</code> strategy — LangChain <code>create_agent()</code> with tool-calling, parallel subtopic decomposition, up to 50 iterations</li>\n<li>LLM grader: qwen3.6:27b self-graded (I have used opus to review examples and it generally only underestimates accuracy)</li>\n</ul>\n\n<p><strong>Benchmarks (fully local LLM with web search)</strong></p>\n\n<table><thead>\n<tr>\n<th align=\"left\">Model</th>\n<th align=\"left\">SimpleQA</th>\n<th align=\"left\">xbench-DeepSearch</th>\n</tr>\n</thead><tbody>\n<tr>\n<td align=\"left\">Qwen3.6-27B</td>\n<td align=\"left\">95.7% (287/300)</td>\n<td align=\"left\">77.0% (77/100)</td>\n</tr>\n<tr>\n<td align=\"left\"><strong>Qwen3.5-9B</strong></td>\n<td align=\"left\">91.2% (182/200)</td>\n<td align=\"left\">59.0% (59/100)</td>\n</tr>\n<tr>\n<td align=\"left\">gpt-oss-20B</td>\n<td align=\"left\">85.4% (295/346)</td>\n<td align=\"left\">–</td>\n</tr>\n</tbody></table>\n\n<p>sample size is small, but the benchmarks were not rerun multiple times and you can see from the other rows that this is unlikely just chance. Full leaderboard: <a href=\"https://huggingface.co/datasets/local-deep-research/ldr-benchmarks\">https://huggingface.co/datasets/local-deep-research/ldr-benchmarks</a></p>\n\n<p><strong>Important framing — these are</strong> <strong><em>agent + search</em></strong> <strong>scores, not closed-book</strong></p>\n\n<p>However, also note that these are similar benchmarks results to Perplexity Deep Research (93.9%), tavily (93.3%) etc. [Tavily forces the LLM to answer <em>only</em> from retrieved docs (pure retrieval test). Perplexity Deep Research is an end-to-end agent and discloses no grader or sample size. ]</p>\n\n<p><strong>Even if our results where only 90% it would already be a great success.</strong></p>\n\n<p>Also I can confirm from using it daily that these results feel consistent with my performance on random querries I do for daily questions.</p>\n\n<p><strong>Caveats:</strong></p>\n\n<ul>\n<li>SimpleQA contamination risk on newer base models is real</li>\n<li>LLM-judge noise + Sampling error</li>\n<li>bench-DeepSearch is in chinese so an advantage for the chinese qwen models</li>\n<li>No BrowseComp / GAIA numbers yet - But I also dont believe we are good at this benchmark yet. I will have to run some benchmarks to verify the current state</li>\n</ul>\n\n<p><strong>The thing that surprised me:</strong></p>\n\n<p>Results seem to track tool-calling quality more than raw size for local deep research. The <code>langgraph_agent</code> strategy hammers the model with multi-iteration tool calls, parallel subagent decomposition, and structured output — exactly the axis where the newer Qwen generations have improved most. Hypothesis only; if anyone wants to design an ablation we'd love the data.</p>\n\n<p><strong>Some cool LDR features that I want to additionally highlight:</strong></p>\n\n<ul>\n<li><strong>Journal Quality System</strong> (shipped v1.6.0) - academic source grading using OpenAlex, DOAJ. I haven't seen this anywhere else in the open-source deep-research space.</li>\n<li>Per-user <strong>SQLCipher AES-256 DB</strong> (PBKDF2-HMAC-SHA512, 256k iterations) — admins can't read your data at rest. No password recovery; we don't hold the keys.</li>\n<li><strong>Zero telemetry.</strong> no telemetry, no analytics, no tracking. </li>\n<li><strong>Cosign-signed Docker images</strong> with SLSA provenance + SBOMs.</li>\n<li><strong>MIT licensed.</strong> Everything open source</li>\n</ul>\n\n<p>Repo: <a href=\"https://github.com/LearningCircuit/local-deep-research\">https://github.com/LearningCircuit/local-deep-research</a></p>\n\n<p>Happy to share strategy configs, help reproduce the Qwen runs</p>\n\n<p>Thanks to all the academic and other open source foundational work that made this repo possible.</p>\n</div><!-- SC_ON -->",
"likes": null,
"suggested_sort": null,
"banned_at_utc": null,
"view_count": null,
"archived": false,
"no_follow": false,
"is_crosspostable": false,
"pinned": false,
"over_18": false,
"preview": {
"images": [
{
"source": {
"url": "https://external-preview.redd.it/orVWG8EDhE-phYGa0wL5m2HaDOSYwXQTD4ngBnko3NM.png?auto=webp&s=9734e1148f6331eac917c2dc7fb24dea72100f4d",
"width": 1200,
"height": 648
},
"resolutions": [
{
"url": "https://external-preview.redd.it/orVWG8EDhE-phYGa0wL5m2HaDOSYwXQTD4ngBnko3NM.png?width=108&crop=smart&auto=webp&s=1f5fc3156d728b806a4eaf8a11301b7e72cda777",
"width": 108,
"height": 58
},
{
"url": "https://external-preview.redd.it/orVWG8EDhE-phYGa0wL5m2HaDOSYwXQTD4ngBnko3NM.png?width=216&crop=smart&auto=webp&s=871dcae35c4315ef8e2ef471f46fb0c935d2adc8",
"width": 216,
"height": 116
},
{
"url": "https://external-preview.redd.it/orVWG8EDhE-phYGa0wL5m2HaDOSYwXQTD4ngBnko3NM.png?width=320&crop=smart&auto=webp&s=5c4dcf91b75cf2512b2e0143849055fb10b8e891",
"width": 320,
"height": 172
},
{
"url": "https://external-preview.redd.it/orVWG8EDhE-phYGa0wL5m2HaDOSYwXQTD4ngBnko3NM.png?width=640&crop=smart&auto=webp&s=b10de605c03f17646ebfd38bdd62dc473bc723de",
"width": 640,
"height": 345
},
{
"url": "https://external-preview.redd.it/orVWG8EDhE-phYGa0wL5m2HaDOSYwXQTD4ngBnko3NM.png?width=960&crop=smart&auto=webp&s=a836ba4f7fb8f71da24afc03fad214d37fb3d516",
"width": 960,
"height": 518
},
{
"url": "https://external-preview.redd.it/orVWG8EDhE-phYGa0wL5m2HaDOSYwXQTD4ngBnko3NM.png?width=1080&crop=smart&auto=webp&s=3878473f583bf2ed07ae8c5d5171d0fc93e0a198",
"width": 1080,
"height": 583
}
],
"variants": {},
"id": "orVWG8EDhE-phYGa0wL5m2HaDOSYwXQTD4ngBnko3NM"
}
],
"enabled": false
},
"all_awardings": [],
"awarders": [],
"media_only": false,
"link_flair_template_id": "7a7848d2-bf8e-11ed-8c2f-765d15199f78",
"can_gild": false,
"spoiler": false,
"locked": false,
"author_flair_text": null,
"treatment_tags": [],
"visited": false,
"removed_by": null,
"num_reports": null,
"distinguished": null,
"subreddit_id": "t5_81eyvm",
"author_is_blocked": false,
"mod_reason_by": null,
"removal_reason": null,
"link_flair_background_color": "#94e044",
"id": "1t1n6o8",
"is_robot_indexable": true,
"report_reasons": null,
"author": "ComplexIt",
"discussion_type": null,
"num_comments": 73,
"send_replies": true,
"contest_mode": false,
"mod_reports": [],
"author_patreon_flair": false,
"author_flair_text_color": null,
"permalink": "/r/LocalLLaMA/comments/1t1n6o8/we_are_finally_there_qwen3627b_agentic_search_957/",
"stickied": false,
"url": "https://www.reddit.com/r/LocalLLaMA/comments/1t1n6o8/we_are_finally_there_qwen3627b_agentic_search_957/",
"subreddit_subscribers": 710236,
"created_utc": 1777720885.0,
"num_crossposts": 1,
"media": null,
"is_video": false,
"_source_url": "https://www.reddit.com/by_id/t3_1t1n6o8.json",
"_insight": "Local-agent builders are reacting to the claim that Qwen3.6 plus agentic search can hit strong SimpleQA performance on a single RTX 3090. The appeal is obvious: credible agentic retrieval without depending on hosted APIs aligns with privacy, cost, and tinkering priorities in r/LocalLLaMA.",
"_discovery": "r/LocalLLaMA hot/top week JSON; query agentic search; recent high engagement"
},
{
"approved_at_utc": null,
"subreddit": "LocalLLaMA",
"selftext": "Hey all:\n\nI am trying to set up claude code to work with llama.cpp, I am using the Qwen3.6-35B-A3B.\n\nI usually use claude code + ZLM subscription i got lucky with $30 yearly - the set up is very simple with their automated script, but for the life of me I cannot figure out how to get claude code to work.\n\nAm i hyper focusing on Claude Code or should I try things like pi.dev?\n\nAny help/pointers/guides would be appreciated.\n\nEdit: I tried dang near everything, the most plug and play that I like is OpenCode and am replacing Claude with it. Thank you everyone. <3\n\nSpecs are:\n\nDell Precision T5610 - 64 GB DDR3 RAM, Mi50 32 GB, huge shoutout to mixa for their llama.cpp fork - and i’m getting about 32 solid TPS. Can’t complain. Running Q4 XL Unsloth Quant. I’ll share my entire write up because there should be one oh my goodness.",
"author_fullname": "t2_1nprbkmy5x",
"saved": false,
"mod_reason_title": null,
"gilded": 0,
"clicked": false,
"title": "What is the best coding agent (CLI) like Claude Code for Local Development",
"link_flair_richtext": [
{
"e": "text",
"t": "Question | Help"
}
],
"subreddit_name_prefixed": "r/LocalLLaMA",
"hidden": false,
"pwls": 6,
"link_flair_css_class": "",
"downs": 0,
"thumbnail_height": null,
"top_awarded_type": null,
"hide_score": false,
"name": "t3_1swhw84",
"quarantine": false,
"link_flair_text_color": "dark",
"upvote_ratio": 0.94,
"author_flair_background_color": null,
"subreddit_type": "public",
"ups": 169,
"total_awards_received": 0,
"media_embed": {},
"thumbnail_width": null,
"author_flair_template_id": null,
"is_original_content": false,
"user_reports": [],
"secure_media": null,
"is_reddit_media_domain": false,
"is_meta": false,
"category": null,
"secure_media_embed": {},
"link_flair_text": "Question | Help",
"can_mod_post": false,
"score": 169,
"approved_by": null,
"is_created_from_ads_ui": false,
"author_premium": false,
"thumbnail": "self",
"edited": 1777611058.0,
"author_flair_css_class": null,
"author_flair_richtext": [],
"gildings": {},
"content_categories": null,
"is_self": true,
"mod_note": null,
"created": 1777233603.0,
"link_flair_type": "richtext",
"wls": 6,
"removed_by_category": null,
"banned_by": null,
"author_flair_type": "text",
"domain": "self.LocalLLaMA",
"allow_live_comments": false,
"selftext_html": "<!-- SC_OFF --><div class=\"md\"><p>Hey all:</p>\n\n<p>I am trying to set up claude code to work with llama.cpp, I am using the Qwen3.6-35B-A3B.</p>\n\n<p>I usually use claude code + ZLM subscription i got lucky with $30 yearly - the set up is very simple with their automated script, but for the life of me I cannot figure out how to get claude code to work.</p>\n\n<p>Am i hyper focusing on Claude Code or should I try things like pi.dev?</p>\n\n<p>Any help/pointers/guides would be appreciated.</p>\n\n<p>Edit: I tried dang near everything, the most plug and play that I like is OpenCode and am replacing Claude with it. Thank you everyone. <3</p>\n\n<p>Specs are:</p>\n\n<p>Dell Precision T5610 - 64 GB DDR3 RAM, Mi50 32 GB, huge shoutout to mixa for their llama.cpp fork - and i’m getting about 32 solid TPS. Can’t complain. Running Q4 XL Unsloth Quant. I’ll share my entire write up because there should be one oh my goodness.</p>\n</div><!-- SC_ON -->",
"likes": null,
"suggested_sort": null,
"banned_at_utc": null,
"view_count": null,
"archived": false,
"no_follow": false,
"is_crosspostable": false,
"pinned": false,
"over_18": false,
"all_awardings": [],
"awarders": [],
"media_only": false,
"link_flair_template_id": "2c9831e6-bf92-11ed-98e6-d2b8bcc02ae1",
"can_gild": false,
"spoiler": false,
"locked": false,
"author_flair_text": null,
"treatment_tags": [],
"visited": false,
"removed_by": null,
"num_reports": null,
"distinguished": null,
"subreddit_id": "t5_81eyvm",
"author_is_blocked": false,
"mod_reason_by": null,
"removal_reason": null,
"link_flair_background_color": "#5a74cc",
"id": "1swhw84",
"is_robot_indexable": true,
"report_reasons": null,
"author": "exaknight21",
"discussion_type": null,
"num_comments": 163,
"send_replies": true,
"contest_mode": false,
"mod_reports": [],
"author_patreon_flair": false,
"author_flair_text_color": null,
"permalink": "/r/LocalLLaMA/comments/1swhw84/what_is_the_best_coding_agent_cli_like_claude/",
"stickied": false,
"url": "https://www.reddit.com/r/LocalLLaMA/comments/1swhw84/what_is_the_best_coding_agent_cli_like_claude/",
"subreddit_subscribers": 710236,
"created_utc": 1777233603.0,
"num_crossposts": 0,
"media": null,
"is_video": false,
"_source_url": "https://www.reddit.com/by_id/t3_1swhw84.json",
"_insight": "This discussion keeps drawing comments because many local-LLM users want a Claude-Code-like CLI agent that works with llama.cpp or local Qwen models. The high comment count shows the tooling gap: model quality is improving, but ergonomic local coding agents remain hard to set up.",
"_discovery": "r/LocalLLaMA search JSON; query coding agent; high comment count"
},
{
"approved_at_utc": null,
"subreddit": "singularity",
"selftext": "edit for credit: trash on X\n\n",
"author_fullname": "t2_zh1vr",
"saved": false,
"mod_reason_title": null,
"gilded": 0,
"clicked": false,
"title": "engineering teams celebrating agentic workflows that returned the same result two runs in a row",
"link_flair_richtext": [],
"subreddit_name_prefixed": "r/singularity",
"hidden": false,
"pwls": 6,
"link_flair_css_class": "image",
"downs": 0,
"thumbnail_height": 128,
"top_awarded_type": null,
"hide_score": false,
"name": "t3_1sz4h4g",
"quarantine": false,
"link_flair_text_color": "light",
"upvote_ratio": 0.96,
"author_flair_background_color": null,
"ups": 890,
"total_awards_received": 0,
"media_embed": {},
"thumbnail_width": 140,
"author_flair_template_id": null,
"is_original_content": false,
"user_reports": [],
"secure_media": {
"reddit_video": {
"bitrate_kbps": 5000,
"fallback_url": "https://v.redd.it/56pvrcsqs5yg1/CMAF_1080.mp4?source=fallback",
"has_audio": true,
"height": 1080,
"width": 1180,
"scrubber_media_url": "https://v.redd.it/56pvrcsqs5yg1/CMAF_96.mp4",
"dash_url": "https://v.redd.it/56pvrcsqs5yg1/DASHPlaylist.mpd?a=1780409575%2COGI1MTZlNDE1NDhmYzJjODFlYzAyNTgzMDgzYTc0ZDQ2MWIyYzE3N2NmMWEwN2QzNmJhYTFjMmNiNGQyMDNlZQ%3D%3D&v=1&f=sd",
"duration": 11,
"hls_url": "https://v.redd.it/56pvrcsqs5yg1/HLSPlaylist.m3u8?a=1780409575%2CZTNlNjE4NzM0ZThlN2IxNWNlNDdlYmY0MzdjMzA5MDE2Y2ZjZmM1ZDI4ZWZhMWY0MDJhYzNlYjRkOTFjMWQ1Zg%3D%3D&v=1&f=sd",
"is_gif": false,
"transcoding_status": "completed"
}
},
"is_reddit_media_domain": true,
"is_meta": false,
"category": null,
"secure_media_embed": {},
"link_flair_text": "AI",
"can_mod_post": false,
"score": 890,
"approved_by": null,
"is_created_from_ads_ui": false,
"author_premium": true,
"thumbnail": "https://external-preview.redd.it/anllcWV4c3FzNXlnMUtLWrlnI-CBzByrlshtRvNi9PNK-rcA-8JdCKyA9mXv.png?width=140&height=128&format=jpg&auto=webp&s=15ce846c2fe4a979e6bb2dd5b46902307bbd5ea2",
"edited": 1777481641.0,
"author_flair_css_class": null,
"author_flair_richtext": [],
"gildings": {},
"post_hint": "hosted:video",
"content_categories": null,
"is_self": false,
"subreddit_type": "public",
"created": 1777481512.0,
"link_flair_type": "text",
"wls": 6,
"removed_by_category": null,
"banned_by": null,
"author_flair_type": "text",
"domain": "v.redd.it",
"allow_live_comments": false,
"selftext_html": "<!-- SC_OFF --><div class=\"md\"><p>edit for credit: trash on X</p>\n</div><!-- SC_ON -->",
"likes": null,
"suggested_sort": null,
"banned_at_utc": null,
"url_overridden_by_dest": "https://v.redd.it/56pvrcsqs5yg1",
"view_count": null,
"archived": false,
"no_follow": false,
"is_crosspostable": false,
"pinned": false,
"over_18": false,
"preview": {
"images": [
{
"source": {
"url": "https://external-preview.redd.it/anllcWV4c3FzNXlnMUtLWrlnI-CBzByrlshtRvNi9PNK-rcA-8JdCKyA9mXv.png?format=pjpg&auto=webp&s=eb3e4bf66315a28ff31457e1a008b3af3705087a",
"width": 1206,
"height": 1104
},
"resolutions": [
{
"url": "https://external-preview.redd.it/anllcWV4c3FzNXlnMUtLWrlnI-CBzByrlshtRvNi9PNK-rcA-8JdCKyA9mXv.png?width=108&crop=smart&format=pjpg&auto=webp&s=232b39c6f2f837181ce7ba7534e88f4143a38b9e",
"width": 108,
"height": 98
},
{
"url": "https://external-preview.redd.it/anllcWV4c3FzNXlnMUtLWrlnI-CBzByrlshtRvNi9PNK-rcA-8JdCKyA9mXv.png?width=216&crop=smart&format=pjpg&auto=webp&s=b9b57e511c4f9d6bad8e0af1d7bf67996a3a46fc",
"width": 216,
"height": 197
},
{
"url": "https://external-preview.redd.it/anllcWV4c3FzNXlnMUtLWrlnI-CBzByrlshtRvNi9PNK-rcA-8JdCKyA9mXv.png?width=320&crop=smart&format=pjpg&auto=webp&s=4203e006e00b36b19a023e289cbcea71ccb3056f",
"width": 320,
"height": 292
},
{
"url": "https://external-preview.redd.it/anllcWV4c3FzNXlnMUtLWrlnI-CBzByrlshtRvNi9PNK-rcA-8JdCKyA9mXv.png?width=640&crop=smart&format=pjpg&auto=webp&s=204329a103938acfe1dd850c18fed255ee5a1a4c",
"width": 640,
"height": 585
},
{
"url": "https://external-preview.redd.it/anllcWV4c3FzNXlnMUtLWrlnI-CBzByrlshtRvNi9PNK-rcA-8JdCKyA9mXv.png?width=960&crop=smart&format=pjpg&auto=webp&s=dbcb226f87d2b5bbe02a8eedf17093920bc96be2",
"width": 960,
"height": 878
},
{
"url": "https://external-preview.redd.it/anllcWV4c3FzNXlnMUtLWrlnI-CBzByrlshtRvNi9PNK-rcA-8JdCKyA9mXv.png?width=1080&crop=smart&format=pjpg&auto=webp&s=615cdc9b8c0defc92596761a3d3579b7ea51c735",
"width": 1080,
"height": 988
}
],
"variants": {},
"id": "anllcWV4c3FzNXlnMUtLWrlnI-CBzByrlshtRvNi9PNK-rcA-8JdCKyA9mXv"
}
],
"enabled": false
},
"all_awardings": [],
"awarders": [],
"media_only": false,
"link_flair_template_id": "a5610cc4-dbb7-11e3-b6b2-12313b0d5a0a",
"can_gild": false,
"spoiler": false,
"locked": false,
"author_flair_text": null,
"treatment_tags": [],
"visited": false,
"removed_by": null,
"mod_note": null,
"distinguished": null,
"subreddit_id": "t5_2qh8m",
"author_is_blocked": false,
"mod_reason_by": null,
"num_reports": null,
"removal_reason": null,
"link_flair_background_color": "#373c3f",
"id": "1sz4h4g",
"is_robot_indexable": true,
"report_reasons": null,
"author": "SystematicApproach",
"discussion_type": null,
"num_comments": 32,
"send_replies": true,
"contest_mode": false,
"mod_reports": [],
"author_patreon_flair": false,
"author_flair_text_color": null,
"permalink": "/r/singularity/comments/1sz4h4g/engineering_teams_celebrating_agentic_workflows/",
"stickied": false,
"url": "https://v.redd.it/56pvrcsqs5yg1",
"subreddit_subscribers": 3890819,
"created_utc": 1777481512.0,
"num_crossposts": 0,
"media": {
"reddit_video": {
"bitrate_kbps": 5000,
"fallback_url": "https://v.redd.it/56pvrcsqs5yg1/CMAF_1080.mp4?source=fallback",
"has_audio": true,
"height": 1080,
"width": 1180,
"scrubber_media_url": "https://v.redd.it/56pvrcsqs5yg1/CMAF_96.mp4",
"dash_url": "https://v.redd.it/56pvrcsqs5yg1/DASHPlaylist.mpd?a=1780409575%2COGI1MTZlNDE1NDhmYzJjODFlYzAyNTgzMDgzYTc0ZDQ2MWIyYzE3N2NmMWEwN2QzNmJhYTFjMmNiNGQyMDNlZQ%3D%3D&v=1&f=sd",
"duration": 11,
"hls_url": "https://v.redd.it/56pvrcsqs5yg1/HLSPlaylist.m3u8?a=1780409575%2CZTNlNjE4NzM0ZThlN2IxNWNlNDdlYmY0MzdjMzA5MDE2Y2ZjZmM1ZDI4ZWZhMWY0MDJhYzNlYjRkOTFjMWQ1Zg%3D%3D&v=1&f=sd",
"is_gif": false,
"transcoding_status": "completed"
}
},
"is_video": true,
"_source_url": "https://www.reddit.com/by_id/t3_1sz4h4g.json",
"_insight": "A meme about agentic workflows finally returning the same result twice struck a nerve because it compresses a real reliability concern into one joke. The high score and upvote ratio suggest broad agreement that determinism and repeatability are still weak spots for production agents.",
"_discovery": "r/singularity hot/search JSON; query agentic workflows; high score/upvote ratio"
},
{
"approved_at_utc": null,
"subreddit": "ArtificialInteligence",
"selftext": "This is a classic agentic AI risk \n\nThe above agent was trying to fix a staging credential mismatch but guessed wrong on scopes/permissions. Caused \\~30-hour outage; although older backup helped recover most data",
"author_fullname": "t2_4i4vovil",
"saved": false,
"mod_reason_title": null,
"gilded": 0,
"clicked": false,
"title": "Uh-Oh! PocketOS founder Jer Crane reported that a Cursor AI coding agent (powered by Anthropic’s Claude Opus 4.6) deleted their entire production database + all volume-level backups on Railway in one API call, in just 9 seconds",
"link_flair_richtext": [],
"subreddit_name_prefixed": "r/ArtificialInteligence",
"hidden": false,
"pwls": 6,
"link_flair_css_class": "news",
"downs": 0,
"thumbnail_height": 140,
"top_awarded_type": null,
"hide_score": false,
"name": "t3_1sxnnzf",
"quarantine": false,
"link_flair_text_color": "light",
"upvote_ratio": 0.94,
"author_flair_background_color": null,
"ups": 324,
"total_awards_received": 0,
"media_embed": {},
"thumbnail_width": 140,
"author_flair_template_id": null,
"is_original_content": false,
"user_reports": [],
"secure_media": null,
"is_reddit_media_domain": true,
"is_meta": false,
"category": null,
"secure_media_embed": {},
"link_flair_text": "📰 News",
"can_mod_post": false,
"score": 324,
"approved_by": null,
"is_created_from_ads_ui": false,
"author_premium": false,
"thumbnail": "image",
"edited": false,
"author_flair_css_class": null,
"author_flair_richtext": [],
"gildings": {},
"post_hint": "image",
"content_categories": null,
"is_self": false,
"subreddit_type": "public",
"created": 1777341132.0,
"link_flair_type": "text",
"wls": 6,
"removed_by_category": null,
"banned_by": null,
"author_flair_type": "text",
"domain": "i.redd.it",
"allow_live_comments": false,
"selftext_html": "<!-- SC_OFF --><div class=\"md\"><p>This is a classic agentic AI risk </p>\n\n<p>The above agent was trying to fix a staging credential mismatch but guessed wrong on scopes/permissions. Caused ~30-hour outage; although older backup helped recover most data</p>\n</div><!-- SC_ON -->",
"likes": null,
"suggested_sort": "top",
"banned_at_utc": null,
"url_overridden_by_dest": "https://i.redd.it/af51wk2p7uxg1.jpeg",
"view_count": null,
"archived": false,
"no_follow": false,
"is_crosspostable": false,
"pinned": false,
"over_18": false,
"preview": {
"images": [
{
"source": {
"url": "https://preview.redd.it/af51wk2p7uxg1.jpeg?auto=webp&s=78f45738b637d31c13ba72505a3c136818286504",
"width": 720,
"height": 774
},
"resolutions": [
{
"url": "https://preview.redd.it/af51wk2p7uxg1.jpeg?width=108&crop=smart&auto=webp&s=a8a5cff1215da469d316eb71a05eec2e3451dc19",
"width": 108,
"height": 116
},
{
"url": "https://preview.redd.it/af51wk2p7uxg1.jpeg?width=216&crop=smart&auto=webp&s=524dbe61d8795b11070addbc2b43ddb342acce73",
"width": 216,
"height": 232
},
{
"url": "https://preview.redd.it/af51wk2p7uxg1.jpeg?width=320&crop=smart&auto=webp&s=92da717f62ffb69c864f91498698f5679355b449",
"width": 320,
"height": 344
},
{
"url": "https://preview.redd.it/af51wk2p7uxg1.jpeg?width=640&crop=smart&auto=webp&s=82f269d55c6b13cbeefcc7551bdefb4460d6de37",
"width": 640,
"height": 688
}
],
"variants": {},
"id": "af51wk2p7uxg1"
}
],
"enabled": true
},
"all_awardings": [],
"awarders": [],
"media_only": false,
"link_flair_template_id": "555d3a3e-19d7-11f1-b2f7-369a76dceb24",
"can_gild": false,
"spoiler": false,
"locked": false,
"author_flair_text": null,
"treatment_tags": [],
"visited": false,
"removed_by": null,
"mod_note": null,
"distinguished": null,
"subreddit_id": "t5_3crzr",
"author_is_blocked": false,
"mod_reason_by": null,
"num_reports": null,
"removal_reason": null,
"link_flair_background_color": "#ff4500",
"id": "1sxnnzf",
"is_robot_indexable": true,
"report_reasons": null,
"author": "ocean_protocol",
"discussion_type": null,
"num_comments": 140,
"send_replies": true,
"contest_mode": false,
"mod_reports": [],
"author_patreon_flair": false,
"author_flair_text_color": null,
"permalink": "/r/ArtificialInteligence/comments/1sxnnzf/uhoh_pocketos_founder_jer_crane_reported_that_a/",
"stickied": false,
"url": "https://i.redd.it/af51wk2p7uxg1.jpeg",
"subreddit_subscribers": 1788901,
"created_utc": 1777341132.0,
"num_crossposts": 0,
"media": null,
"is_video": false,
"_source_url": "https://www.reddit.com/by_id/t3_1sxnnzf.json",
"_insight": "The alleged Cursor/Claude database-deletion incident is spreading because it is a vivid failure mode: an agent with broad permissions can convert a staging fix into a production outage. It resonates with both skeptics and practitioners because it makes guardrails, scoped credentials, backups, and human approval feel concrete rather than theoretical.",
"_discovery": "r/ArtificialInteligence search/top week JSON; query AI coding agent; high engagement"
},
{
"approved_at_utc": null,
"subreddit": "AI_Agents",
"selftext": "Bit of context. Over the last couple of years I've shipped automation projects for around 30 professional services founders. Law firms, accounting practices, recruiting agencies, a couple of small consultancies, a few marketing shops. Different industries, different sizes, different software stacks underneath them.\n\nBut every single project ends up automating some version of the same five tasks. I started keeping a list after I noticed the pattern around project number 12, and I haven't had to add anything new to it in over a year now. Whatever firm you run, your grunt work is probably one of these five.\n\nThe first one is intake. Some version of \"lead fills out a form, someone manually creates a record in the CRM, someone schedules a call, someone sends a confirmation email, someone drops the lead into a spreadsheet for the partner to review.\" Almost every firm I work with has 4 or 5 humans touching this process, and almost none of them need to. A 30 line script ties the form to the calendar to the CRM to the email to the spreadsheet, and the work disappears overnight. The reason it's still manual at most firms is that it grew organically over years, and nobody ever sat down to look at the whole flow at once.\n\nThe second is document generation. Engagement letters, NDAs, statements of work, proposals, retainer agreements. Most firms have a paralegal or an admin manually editing a Word template for every new client, swapping out names and dates and project scope and pricing. This is genuinely 90% of the value that some firms pay an admin for, and it can be done with a form that fills a template and emails the signed PDF back. Not glamorous. Saves 5 to 10 hours a week per admin in most firms I've measured.\n\nThe third is recurring client communication. Status updates, reminders that quarterly filings are due, prompts that a contract is up for renewal, the \"we haven't heard from you in 30 days\" nudges. Every firm I've worked with has at least one person whose job partly involves remembering to send these emails on schedule. None of them need a person doing this. A simple workflow that watches a date column in a spreadsheet and triggers the right template at the right time replaces the whole thing, and the client gets more consistent communication than they did before, which is the part owners don't expect.\n\nThe fourth is internal reporting. The weekly partners meeting, the monthly billing summary, the report that goes to the founder every Friday morning showing pipeline status. Most firms have a junior person who spends a couple of hours every week pulling numbers from three or four systems and pasting them into a deck or a doc. The systems all have APIs. The numbers can pull themselves and assemble the report. The junior person can go do work that actually develops their career instead of being a human ETL pipeline.\n\nThe fifth one is the most awkward to bring up but it's almost always the biggest win. It's the founder's own admin work. Most owners of professional services firms are doing 8 to 12 hours a week of work that has no business being on their plate. Reviewing timesheets, approving expenses, chasing late invoices, drafting follow up emails to prospects who went quiet, manually updating their pipeline tracker. They keep doing it themselves because they don't trust anyone else to do it right. So we don't replace them with a person, we replace them with a workflow that does the boring 80% and only escalates to them when something actually needs a judgment call. The founder gets a day a week back, and that day usually goes into sales or client work, both of which directly grow revenue.\n\nHere's the part nobody mentions in automation pitches. None of these five tasks need AI agents. They need plumbing. APIs talking to other APIs, with maybe one LLM call sitting somewhere in the middle to draft a paragraph or classify an email. The whole industry is yelling about agentic this and agentic that, and meanwhile the actual money is sitting in form-to-CRM-to-email pipes that have been possible since 2015.\n\nI think a lot of founders don't automate their firm because they read the AI Twitter conversation, decide they need a multi agent orchestration layer with a vector database and a reasoning loop, then realize they can't afford that and don't know who to hire for it. So they do nothing. And the grunt work continues.\n\nThe simpler version is right there. The first project we ship for most firms costs less than one month of an admin's salary and replaces about 60% of what that admin actually does. The admin doesn't get fired, they get promoted to client work because suddenly the firm has the budget and the breathing room.",
"author_fullname": "t2_1s3qfdq4ll",
"saved": false,
"mod_reason_title": null,
"gilded": 0,
"clicked": false,
"title": "After automating workflows for 30+ professional services firms, the same 5 tasks show up in every project. None of them need AI agents.",
"link_flair_richtext": [],
"subreddit_name_prefixed": "r/AI_Agents",
"hidden": false,
"pwls": 6,
"link_flair_css_class": "",
"downs": 0,
"thumbnail_height": null,
"top_awarded_type": null,
"hide_score": false,
"name": "t3_1sxpslr",
"quarantine": false,
"link_flair_text_color": "dark",
"upvote_ratio": 0.96,
"author_flair_background_color": null,
"subreddit_type": "public",
"ups": 193,
"total_awards_received": 0,
"media_embed": {},
"thumbnail_width": null,
"author_flair_template_id": null,
"is_original_content": false,
"user_reports": [],
"secure_media": null,
"is_reddit_media_domain": false,
"is_meta": false,
"category": null,
"secure_media_embed": {},
"link_flair_text": "Discussion",
"can_mod_post": false,
"score": 193,
"approved_by": null,
"is_created_from_ads_ui": false,
"author_premium": false,
"thumbnail": "self",
"edited": false,
"author_flair_css_class": null,
"author_flair_richtext": [],
"gildings": {},
"content_categories": null,
"is_self": true,
"mod_note": null,
"created": 1777346847.0,
"link_flair_type": "text",
"wls": 6,
"removed_by_category": null,
"banned_by": null,
"author_flair_type": "text",
"domain": "self.AI_Agents",
"allow_live_comments": false,
"selftext_html": "<!-- SC_OFF --><div class=\"md\"><p>Bit of context. Over the last couple of years I've shipped automation projects for around 30 professional services founders. Law firms, accounting practices, recruiting agencies, a couple of small consultancies, a few marketing shops. Different industries, different sizes, different software stacks underneath them.</p>\n\n<p>But every single project ends up automating some version of the same five tasks. I started keeping a list after I noticed the pattern around project number 12, and I haven't had to add anything new to it in over a year now. Whatever firm you run, your grunt work is probably one of these five.</p>\n\n<p>The first one is intake. Some version of "lead fills out a form, someone manually creates a record in the CRM, someone schedules a call, someone sends a confirmation email, someone drops the lead into a spreadsheet for the partner to review." Almost every firm I work with has 4 or 5 humans touching this process, and almost none of them need to. A 30 line script ties the form to the calendar to the CRM to the email to the spreadsheet, and the work disappears overnight. The reason it's still manual at most firms is that it grew organically over years, and nobody ever sat down to look at the whole flow at once.</p>\n\n<p>The second is document generation. Engagement letters, NDAs, statements of work, proposals, retainer agreements. Most firms have a paralegal or an admin manually editing a Word template for every new client, swapping out names and dates and project scope and pricing. This is genuinely 90% of the value that some firms pay an admin for, and it can be done with a form that fills a template and emails the signed PDF back. Not glamorous. Saves 5 to 10 hours a week per admin in most firms I've measured.</p>\n\n<p>The third is recurring client communication. Status updates, reminders that quarterly filings are due, prompts that a contract is up for renewal, the "we haven't heard from you in 30 days" nudges. Every firm I've worked with has at least one person whose job partly involves remembering to send these emails on schedule. None of them need a person doing this. A simple workflow that watches a date column in a spreadsheet and triggers the right template at the right time replaces the whole thing, and the client gets more consistent communication than they did before, which is the part owners don't expect.</p>\n\n<p>The fourth is internal reporting. The weekly partners meeting, the monthly billing summary, the report that goes to the founder every Friday morning showing pipeline status. Most firms have a junior person who spends a couple of hours every week pulling numbers from three or four systems and pasting them into a deck or a doc. The systems all have APIs. The numbers can pull themselves and assemble the report. The junior person can go do work that actually develops their career instead of being a human ETL pipeline.</p>\n\n<p>The fifth one is the most awkward to bring up but it's almost always the biggest win. It's the founder's own admin work. Most owners of professional services firms are doing 8 to 12 hours a week of work that has no business being on their plate. Reviewing timesheets, approving expenses, chasing late invoices, drafting follow up emails to prospects who went quiet, manually updating their pipeline tracker. They keep doing it themselves because they don't trust anyone else to do it right. So we don't replace them with a person, we replace them with a workflow that does the boring 80% and only escalates to them when something actually needs a judgment call. The founder gets a day a week back, and that day usually goes into sales or client work, both of which directly grow revenue.</p>\n\n<p>Here's the part nobody mentions in automation pitches. None of these five tasks need AI agents. They need plumbing. APIs talking to other APIs, with maybe one LLM call sitting somewhere in the middle to draft a paragraph or classify an email. The whole industry is yelling about agentic this and agentic that, and meanwhile the actual money is sitting in form-to-CRM-to-email pipes that have been possible since 2015.</p>\n\n<p>I think a lot of founders don't automate their firm because they read the AI Twitter conversation, decide they need a multi agent orchestration layer with a vector database and a reasoning loop, then realize they can't afford that and don't know who to hire for it. So they do nothing. And the grunt work continues.</p>\n\n<p>The simpler version is right there. The first project we ship for most firms costs less than one month of an admin's salary and replaces about 60% of what that admin actually does. The admin doesn't get fired, they get promoted to client work because suddenly the firm has the budget and the breathing room.</p>\n</div><!-- SC_ON -->",
"likes": null,
"suggested_sort": null,
"banned_at_utc": null,
"view_count": null,
"archived": false,
"no_follow": false,
"is_crosspostable": false,
"pinned": false,
"over_18": false,
"all_awardings": [],
"awarders": [],
"media_only": false,
"link_flair_template_id": "c0fb994c-948c-11ef-838b-96bd8d9b4e91",
"can_gild": false,
"spoiler": false,
"locked": false,
"author_flair_text": null,
"treatment_tags": [],
"visited": false,
"removed_by": null,
"num_reports": null,
"distinguished": null,
"subreddit_id": "t5_8b5cvj",
"author_is_blocked": false,
"mod_reason_by": null,
"removal_reason": null,
"link_flair_background_color": "#a026d9",
"id": "1sxpslr",
"is_robot_indexable": true,
"report_reasons": null,
"author": "Warm-Reaction-456",
"discussion_type": null,
"num_comments": 69,
"send_replies": true,
"contest_mode": false,
"mod_reports": [],
"author_patreon_flair": false,
"author_flair_text_color": null,
"permalink": "/r/AI_Agents/comments/1sxpslr/after_automating_workflows_for_30_professional/",
"stickied": false,
"url": "https://www.reddit.com/r/AI_Agents/comments/1sxpslr/after_automating_workflows_for_30_professional/",
"subreddit_subscribers": 354147,
"created_utc": 1777346847.0,
"num_crossposts": 2,
"media": null,
"is_video": false,
"_source_url": "https://www.reddit.com/by_id/t3_1sxpslr.json",
"_insight": "This post resonates by pushing against agent hype from a services-automation perspective: the author argues many repeatable business tasks need deterministic workflows, not autonomous agents. Its popularity shows an emerging buyer-side filter—people want ROI and reliability before adopting agent branding.",
"_discovery": "r/AI_Agents search/top month JSON; query AI agents; high score/upvote ratio"
},
{
"approved_at_utc": null,
"subreddit": "MachineLearning",
"selftext": "I’ve been in QA for almost a decade. My mental model for quality was always: given input X, assert output Y. Now I’m on a team that’s shipping an LLM-based agent that handles multi-step tasks. I genuinely do not know how to test this in a way that feels rigorous. \n\nThe thing works. But the output isn’t deterministic. The same input can produce different reasoning chains across runs. Hell even with temp=0 I see variation in tool selection and intermediate steps. My normal instincts don’t map here. I can’t write an assertion and run it a thousand times to track flakiness. I’m at a loss for what to do.\n\nSnapshot testing on final outputs is too brittle. If there’s a correct response that’s worded differently it breaks the test. Regex/keyword matching on outputs misses reasoning errors that accidentally land on the correct answer. Human eval isn’t automatable and doesn’t scale. Evals with a scoring rubric almost works but I don’t have a way to set pass/fail thresholds. \n\nI want something conceptually equivalent to integration tests for reasoning steps. Like, given this tool result does the next step correctly incorporate it? I don’t know how to make that assertion without either hardcoding expected outputs or using another LLM as a judge, which would introduce a new failure mode into my test suite.\n\nThe agent runs inside our product. There are real uses and actual consequences when it makes a bad call. \n\nIs there a framework that allows for verifying of agentic reasoning? \n\n ",
"author_fullname": "t2_1vpv4x3l69",
"saved": false,
"mod_reason_title": null,
"gilded": 0,
"clicked": false,
"title": "How do you test AI agents in production? The unpredictability is overwhelming.[D]",
"link_flair_richtext": [],
"subreddit_name_prefixed": "r/MachineLearning",
"hidden": false,
"pwls": 6,
"link_flair_css_class": "",
"downs": 0,
"thumbnail_height": null,
"top_awarded_type": null,
"hide_score": false,
"name": "t3_1sx3p40",
"quarantine": false,
"link_flair_text_color": "dark",
"upvote_ratio": 0.89,
"author_flair_background_color": null,
"subreddit_type": "public",
"ups": 41,
"total_awards_received": 0,
"media_embed": {},
"thumbnail_width": null,
"author_flair_template_id": null,
"is_original_content": false,
"user_reports": [],
"secure_media": null,
"is_reddit_media_domain": false,
"is_meta": false,
"category": null,
"secure_media_embed": {},
"link_flair_text": "Discussion",
"can_mod_post": false,
"score": 41,
"approved_by": null,
"is_created_from_ads_ui": false,
"author_premium": false,
"thumbnail": "self",
"edited": 1777296956.0,
"author_flair_css_class": null,
"author_flair_richtext": [],
"gildings": {},
"content_categories": null,
"is_self": true,
"mod_note": null,
"created": 1777296498.0,
"link_flair_type": "text",
"wls": 6,
"removed_by_category": null,
"banned_by": null,
"author_flair_type": "text",
"domain": "self.MachineLearning",
"allow_live_comments": false,
"selftext_html": "<!-- SC_OFF --><div class=\"md\"><p>I’ve been in QA for almost a decade. My mental model for quality was always: given input X, assert output Y. Now I’m on a team that’s shipping an LLM-based agent that handles multi-step tasks. I genuinely do not know how to test this in a way that feels rigorous. </p>\n\n<p>The thing works. But the output isn’t deterministic. The same input can produce different reasoning chains across runs. Hell even with temp=0 I see variation in tool selection and intermediate steps. My normal instincts don’t map here. I can’t write an assertion and run it a thousand times to track flakiness. I’m at a loss for what to do.</p>\n\n<p>Snapshot testing on final outputs is too brittle. If there’s a correct response that’s worded differently it breaks the test. Regex/keyword matching on outputs misses reasoning errors that accidentally land on the correct answer. Human eval isn’t automatable and doesn’t scale. Evals with a scoring rubric almost works but I don’t have a way to set pass/fail thresholds. </p>\n\n<p>I want something conceptually equivalent to integration tests for reasoning steps. Like, given this tool result does the next step correctly incorporate it? I don’t know how to make that assertion without either hardcoding expected outputs or using another LLM as a judge, which would introduce a new failure mode into my test suite.</p>\n\n<p>The agent runs inside our product. There are real uses and actual consequences when it makes a bad call. </p>\n\n<p>Is there a framework that allows for verifying of agentic reasoning? </p>\n\n<p> </p>\n</div><!-- SC_ON -->",
"likes": null,
"suggested_sort": null,
"banned_at_utc": null,
"view_count": null,
"archived": false,
"no_follow": false,
"is_crosspostable": false,
"pinned": false,
"over_18": false,
"all_awardings": [],
"awarders": [],
"media_only": false,
"link_flair_template_id": "15995904-19d4-11f0-b8c9-0eed6ea89bc1",
"can_gild": false,
"spoiler": false,
"locked": false,
"author_flair_text": null,
"treatment_tags": [],
"visited": false,
"removed_by": null,
"num_reports": null,
"distinguished": null,
"subreddit_id": "t5_2r3gv",
"author_is_blocked": false,
"mod_reason_by": null,
"removal_reason": null,
"link_flair_background_color": "#26c4d9",
"id": "1sx3p40",
"is_robot_indexable": true,
"report_reasons": null,
"author": "this_aint_taliya",
"discussion_type": null,
"num_comments": 42,
"send_replies": true,
"contest_mode": false,
"mod_reports": [],
"author_patreon_flair": false,
"author_flair_text_color": null,
"permalink": "/r/MachineLearning/comments/1sx3p40/how_do_you_test_ai_agents_in_production_the/",
"stickied": false,
"url": "https://www.reddit.com/r/MachineLearning/comments/1sx3p40/how_do_you_test_ai_agents_in_production_the/",
"subreddit_subscribers": 3043120,
"created_utc": 1777296498.0,
"num_crossposts": 0,
"media": null,
"is_video": false,
"_source_url": "https://www.reddit.com/by_id/t3_1sx3p40.json",
"_insight": "The production-testing question is resonating because it names the core QA challenge for agents: multi-step, non-deterministic behavior breaks traditional input-output assertions. The balanced score/comments ratio indicates a practitioner-heavy thread where people are debating evals, traces, regression suites, and guardrails.",
"_discovery": "r/MachineLearning search/top month JSON; query AI agents in production; active discussion"
},
{
"approved_at_utc": null,
"subreddit": "SaaS",
"selftext": "I run a [SaaS](http://www.getibex.com). Every other week someone tells me I'm building a horse-drawn carriage in the age of cars. AI agents are going to replace every app. SaaS is dead. Why bother.\n\nSo I went and looked at the receipts. SaaS has supposedly been dying since 1999:\n\n* 1999: \"Just a fad\" (Salesforce launches)\n* 2008: Recession kills SaaS spending\n* 2015: Open source / self-hosted will replace it\n* 2018: No-code (Bubble, Webflow, Airtable)\n* 2020: Post-COVID SaaS bubble collapse\n* 2021: Web3 decentralised apps replace SaaS\n* 2022: ChatGPT replacing software\n* 2024: Microsoft's CEO declares SaaS dead on the BG2 podcast\n* 2025: Claude Code killing dev SaaS ($1B ARR in 9 months)\n* 2026: AI agents replace every app\n\nMeanwhile the actual numbers:\n\n* SaaS is a $375B+ industry growing \\~18% a year\n* The average enterprise uses 275 SaaS apps\n* Cursor hit $2B ARR faster than any company in history (which is, by the way, SaaS)\n* 85% of the world has never even opened ChatGPT\n\nHere's what people keep missing. SaaS is software you access over the internet, hosted by the vendor, paid by subscription. That's the definition. Most \"AI replacements\" being held up as SaaS killers fit it exactly. Cursor is SaaS. ChatGPT is SaaS. Claude Code is SaaS. The replacement framing collapses the moment you check what these products actually are.\n\nAI agents don't kill SaaS. They become the engine inside it. The wrapper stays the same. The interface changes. The billing model is identical.\n\nSo yeah, I'm building SaaS in 2026 on purpose. Very bullish on it.\n\nDon't be the Microsoft CEO of 2024 in 2026.",
"author_fullname": "t2_1i43t0mnw6",
"saved": false,
"mod_reason_title": null,
"gilded": 0,
"clicked": false,
"title": "People keep asking how I can be stupid enough to found a SaaS in 2026. Here's my answer.",
"link_flair_richtext": [],
"subreddit_name_prefixed": "r/SaaS",
"hidden": false,
"pwls": 6,
"link_flair_css_class": null,
"downs": 0,
"thumbnail_height": 140,
"top_awarded_type": null,
"hide_score": false,
"name": "t3_1sy8hxy",
"quarantine": false,
"link_flair_text_color": "dark",
"upvote_ratio": 0.92,
"author_flair_background_color": null,
"subreddit_type": "public",
"ups": 215,
"total_awards_received": 0,
"media_embed": {},
"thumbnail_width": 140,
"author_flair_template_id": null,
"is_original_content": false,
"user_reports": [],
"secure_media": null,
"is_reddit_media_domain": true,
"is_meta": false,
"category": null,
"secure_media_embed": {},
"link_flair_text": null,
"can_mod_post": false,
"score": 215,
"approved_by": null,
"is_created_from_ads_ui": false,
"author_premium": false,
"thumbnail": "https://preview.redd.it/swql1xbqwyxg1.png?width=140&height=140&crop=1:1,smart&auto=webp&s=2c26f6842fefd365487a52cbcf87f7d8a9eaf7f9",
"edited": 1777446213.0,
"author_flair_css_class": null,
"author_flair_richtext": [],
"gildings": {},
"post_hint": "image",
"content_categories": null,
"is_self": false,
"mod_note": null,
"created": 1777398067.0,
"link_flair_type": "text",
"wls": 6,
"removed_by_category": null,
"banned_by": null,
"author_flair_type": "text",
"domain": "i.redd.it",
"allow_live_comments": false,
"selftext_html": "<!-- SC_OFF --><div class=\"md\"><p>I run a <a href=\"http://www.getibex.com\">SaaS</a>. Every other week someone tells me I'm building a horse-drawn carriage in the age of cars. AI agents are going to replace every app. SaaS is dead. Why bother.</p>\n\n<p>So I went and looked at the receipts. SaaS has supposedly been dying since 1999:</p>\n\n<ul>\n<li>1999: "Just a fad" (Salesforce launches)</li>\n<li>2008: Recession kills SaaS spending</li>\n<li>2015: Open source / self-hosted will replace it</li>\n<li>2018: No-code (Bubble, Webflow, Airtable)</li>\n<li>2020: Post-COVID SaaS bubble collapse</li>\n<li>2021: Web3 decentralised apps replace SaaS</li>\n<li>2022: ChatGPT replacing software</li>\n<li>2024: Microsoft's CEO declares SaaS dead on the BG2 podcast</li>\n<li>2025: Claude Code killing dev SaaS ($1B ARR in 9 months)</li>\n<li>2026: AI agents replace every app</li>\n</ul>\n\n<p>Meanwhile the actual numbers:</p>\n\n<ul>\n<li>SaaS is a $375B+ industry growing ~18% a year</li>\n<li>The average enterprise uses 275 SaaS apps</li>\n<li>Cursor hit $2B ARR faster than any company in history (which is, by the way, SaaS)</li>\n<li>85% of the world has never even opened ChatGPT</li>\n</ul>\n\n<p>Here's what people keep missing. SaaS is software you access over the internet, hosted by the vendor, paid by subscription. That's the definition. Most "AI replacements" being held up as SaaS killers fit it exactly. Cursor is SaaS. ChatGPT is SaaS. Claude Code is SaaS. The replacement framing collapses the moment you check what these products actually are.</p>\n\n<p>AI agents don't kill SaaS. They become the engine inside it. The wrapper stays the same. The interface changes. The billing model is identical.</p>\n\n<p>So yeah, I'm building SaaS in 2026 on purpose. Very bullish on it.</p>\n\n<p>Don't be the Microsoft CEO of 2024 in 2026.</p>\n</div><!-- SC_ON -->",
"likes": null,
"suggested_sort": null,
"banned_at_utc": null,
"url_overridden_by_dest": "https://i.redd.it/swql1xbqwyxg1.png",
"view_count": null,
"archived": false,
"no_follow": false,
"is_crosspostable": false,
"pinned": false,
"over_18": false,
"preview": {
"images": [
{
"source": {
"url": "https://preview.redd.it/swql1xbqwyxg1.png?auto=webp&s=f28afed198daf7cbd0341c07be094730e99165a1",
"width": 1454,
"height": 1468
},
"resolutions": [
{
"url": "https://preview.redd.it/swql1xbqwyxg1.png?width=108&crop=smart&auto=webp&s=027d96bfcace781fd65ea983a67c003e70f37bcc",
"width": 108,
"height": 109
},
{
"url": "https://preview.redd.it/swql1xbqwyxg1.png?width=216&crop=smart&auto=webp&s=2eb721e7e9d7a06dd615b327a6c47af7c9e03e11",
"width": 216,
"height": 218
},
{
"url": "https://preview.redd.it/swql1xbqwyxg1.png?width=320&crop=smart&auto=webp&s=05c0014f2cfa6dde102799ea2156e4e272606e0e",
"width": 320,
"height": 323
},
{
"url": "https://preview.redd.it/swql1xbqwyxg1.png?width=640&crop=smart&auto=webp&s=fbfad7096fee976a1127b27e430c317288dcb586",
"width": 640,
"height": 646
},
{
"url": "https://preview.redd.it/swql1xbqwyxg1.png?width=960&crop=smart&auto=webp&s=2e4a57364f587316bdb9a08acc352437b7d6c7bd",
"width": 960,
"height": 969
},
{
"url": "https://preview.redd.it/swql1xbqwyxg1.png?width=1080&crop=smart&auto=webp&s=c7b1afffbcbf68164fefed83b58bb2f72dc4e541",
"width": 1080,
"height": 1090
}
],
"variants": {},
"id": "swql1xbqwyxg1"
}
],
"enabled": true
},
"all_awardings": [],
"awarders": [],
"media_only": false,
"can_gild": false,
"spoiler": false,
"locked": false,
"author_flair_text": null,
"treatment_tags": [],
"visited": false,
"removed_by": null,
"num_reports": null,
"distinguished": null,
"subreddit_id": "t5_2qkq6",
"author_is_blocked": false,
"mod_reason_by": null,
"removal_reason": null,
"link_flair_background_color": "",
"id": "1sy8hxy",
"is_robot_indexable": true,
"report_reasons": null,
"author": "balubala1",
"discussion_type": null,
"num_comments": 84,
"send_replies": true,
"contest_mode": false,
"mod_reports": [],
"author_patreon_flair": false,
"author_flair_text_color": null,
"permalink": "/r/SaaS/comments/1sy8hxy/people_keep_asking_how_i_can_be_stupid_enough_to/",
"stickied": false,
"url": "https://i.redd.it/swql1xbqwyxg1.png",
"subreddit_subscribers": 678084,
"created_utc": 1777398067.0,
"num_crossposts": 0,
"media": null,
"is_video": false,
"_source_url": "https://www.reddit.com/by_id/t3_1sy8hxy.json",
"_insight": "Although framed as a SaaS-defense essay, the post opens with the common claim that “AI agents are going to replace every app,” which makes it relevant to agent market narratives. It resonates because founders are debating whether agents kill SaaS or simply become another interface and automation layer on top of durable software businesses.",
"_discovery": "r/SaaS search/top month JSON; query AI agent / AI agents; high current engagement"
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment