183,000 GitHub stars. The thing they point to no longer exists.
On April 12, 2023 — thirteen days after its first commit — AutoGPT had 30,000 GitHub stars. A few weeks later, it crossed 100,000. It was, by most accounts, the fastest-growing open-source project in GitHub’s history.
Today it has 183,000 stars. The thing they point to no longer exists.
AutoGPT appeared on March 30, 2023, two weeks after OpenAI released GPT-4. Created by Toran Bruce Richards — founder of Significant Gravitas Ltd., originally a video game company — it offered a seductive premise: give an AI agent a goal, walk away, come back to completed work. No supervision. No hand-holding. Fully autonomous.
The timing was electric. GPT-4 had just demonstrated a leap in reasoning capability. If you could chain those calls together — let the model set its own sub-goals, evaluate its own outputs, retry on failure — maybe you’d have something genuinely new. Not a chatbot. An employee.
Within days, AutoGPT was the top trending repository on GitHub. BabyAGI, built by venture capitalist Yohei Nakajima in the same window, took a different angle on the same idea. Dozens of frameworks followed. The agent gold rush was on.
The agents got stuck in loops. They hallucinated information and presented it as fact. They burned through API credits at $0.03–$0.06 per thousand tokens — a cost that compounded with every recursive call. They couldn’t correct themselves mid-task or accept human intervention. And as computer scientist Arvind Narayanan observed, improving them was a matter of “a couple of person-days of effort” through prompt engineering and guardrails — not any kind of algorithmic breakthrough.
The performance was also entirely dependent on the underlying model. Switch from GPT-4 to GPT-3.5, and the “autonomous agent” became an expensive random walk. The autonomy, it turned out, was just the model’s quality wrapped in a for-loop.
This shouldn’t have been surprising. The gap between “can do a task in a demo” and “can do a task reliably, repeatedly, without supervision” is the same gap that kills every automation prediction. Self-driving cars. Fully automated factories. Now fully autonomous agents.
The most telling detail in AutoGPT’s evolution isn’t about what was added. It’s about what was removed.
In 2023, the AI infrastructure ecosystem was building furiously around vector databases. Pinecone, Weaviate, Chroma, Qdrant — billions in combined valuation predicated on the assumption that AI agents would need sophisticated semantic memory retrieval at scale. AutoGPT dutifully supported multiple vector database backends.
Then the team ripped them all out and replaced them with a JSON file and NumPy’s dot product.
The reasoning was mathematical. Benchmarks documented by developer Dariusz Semba showed that even with 100,000 stored embeddings, NumPy’s np.dot() completes a similarity search in 70 milliseconds. A single LLM inference call takes roughly 10 seconds. The memory retrieval was never the bottleneck — not by two orders of magnitude. A vector database wouldn’t break even on its own overhead until after approximately one month of continuous operation and over $81,000 in API costs.
But the real insight was simpler than the math: the agents didn’t generate enough distinct facts to need a vector database. Their memory requirements fit comfortably in a local file. The entire retrieval-augmented-everything industry was building the supply chain before the factory existed.
This isn’t a failure of vector databases. It’s a reality check about what these agents actually did. You don’t need a warehouse for inventory you haven’t manufactured.
Five major agent frameworks launched in that spring window. Three years later, their fates trace a familiar arc.
AutoGPT raised $12 million in October 2023 from GitHub Ventures and Redpoint. By July 2024, the team had completed a major rewrite, introducing a visual workflow builder and modular backend. Agents were restructured as composable “blocks” with inputs, outputs, and transformation functions. It was — quietly — a completely different product from the one that earned those 100,000 stars. The recursive-LLM-calls model gave way to low-code visual composition. Multiple 2026 framework comparisons note that competing frameworks like AutoGen and CrewAI have “largely eclipsed” it.
BabyAGI was archived in September 2024. Before it went dormant, it had spawned what one analysis counted as at least 42 academic papers. Nakajima relaunched it as a “self-building autonomous agent” with a new framework — but he describes it explicitly as “a research tool and sandbox,” not production software. Its purpose is educational: sharing ideas and sparking discussion for experienced developers.
SuperAGI accumulated thousands of stars and over 2,000 forks in its first months. By 2025, development activity showed — in the words of one platform review — “a sharp spike in mid-2023 followed by minimal activity afterward.” Multiple security vulnerabilities went unaddressed. The company pivoted to commercial products; no visible roadmap remained for the open-source framework.
AgentGPT still runs in the browser with no setup required. Reviewers in 2025 called it “a playground, not a production tool — agents get confused, go in loops, and fail ungracefully.”
CrewAI survived. It did so by being less ambitious: structured multi-agent workflows with defined roles, clear handoffs, and human-in-the-loop design. Not “fully autonomous” anything. The winner of the autonomy race was the one that stopped racing.
The pattern is familiar. One pivoted to a different product entirely. One became an educational sandbox. One stalled with unpatched vulnerabilities. One remains a toy. And one survived by narrowing its promises.
AutoGPT’s 183,000 stars mean something. They just don’t mean what most people assume.
A 2024 study presented at the NDSS MADWeb workshop — titled, delightfully, “The Fault in Our Stars” — measured the correlation between GitHub stars and actual package downloads across major ecosystems. For PHP projects, the correlation was 0.47. For Ruby, 0.33. For JavaScript — the language most agent frameworks are built in or interact with — the correlation was 0.14. Essentially random.
Stars measure curiosity. Downloads measure intent. The gap between them is the gap between “that’s interesting” and “I’m going to use this.” Fewer than 5% of the top 10,000 GitHub projects ever exceeded 250 monthly contributors, and only 2% sustained that level across six months.
Venture firm Bessemer Venture Partners calls stars “vanity metrics” and tracks unique monthly contributor activity instead. One industry analysis put it more colorfully: stars are “bookmarks with delusions of grandeur — gestures of vague future interest that rarely translate into actual usage.”
Meanwhile, the platform has developed a growing “reputation-as-a-service” industry. An estimated six million or more fake stars exist across GitHub. The metric is simultaneously uninformative and gameable — a combination that should make anyone skeptical of using it as a signal for anything at all.
There’s a pattern here that extends well beyond software.
In June 2017, initial coin offering funding surpassed traditional venture capital for the first time — $550 million via ICOs against $300 million through angel and seed-stage VC. By July 2018, cumulative ICO funding had reached $17.8 billion. The money was real. The technology underneath — Ethereum’s smart contracts — was genuinely innovative. The claims about what it would enable were premature by years.
The structural match to the 2023 agent hype isn’t a casual metaphor. Explosive interest driven by a single breakthrough (Ethereum smart contracts / GPT-4). Rapid experimentation (thousands of ICO tokens / dozens of agent frameworks). Capital flooding in ahead of demonstrated utility ($17.8 billion in token sales / $12 million to AutoGPT alone in its first year). Quiet consolidation to a small number of survivors (Ethereum itself / structured-workflow frameworks). And, eventually, the underlying technology delivering value in forms nobody originally predicted.
A peer-reviewed paper in Springer’s Philosophy & Technology journal made this argument formally in 2024: AI hype follows the same bubble dynamics as prior technology cycles. And Gartner’s 2025 hype cycle data bears it out — generative AI entered the Trough of Disillusionment by 2025, while AI agents sit at the Peak of Inflated Expectations, two to three years behind on the same curve.
We’re watching the same movie at two different speeds.
There’s a mathematical explanation for why the autonomous dream keeps failing, and it rarely appears in the discourse.
Consider an agent that completes each individual step with 85% reliability — a generous estimate for any complex task. A two-step workflow succeeds 72% of the time. Five steps: 44%. Ten steps: 20%. This is exponential decay applied to sequential task completion: 0.85 raised to the power of the number of steps.
It’s the same problem self-driving cars face. Each individual driving decision might be 99.9% correct, but a cross-country trip involves millions of decisions. The compound probability erodes certainty faster than intuition suggests.
The 2023 agent frameworks were promising ten-step, twenty-step autonomous workflows. The math was never on their side. And the fundamental constraint isn’t about better models — it’s about the relationship between reliability per step and the number of steps in sequence. Even a 95%-reliable step produces only 60% success over ten steps. The only architectures that survive this math are the ones that introduce checkpoints, human review, or graceful degradation — the features that make an agent less autonomous, not more.
CrewAI’s survival makes more sense through this lens. By structuring agents into defined roles with handoff points, it introduced natural checkpoints that reset the compound probability. Less ambitious in promise. More robust in practice.
The things that survived the 2023 agent spring were never the things that were promised.
AutoGPT didn’t deliver autonomous AI employees. It delivered a visual workflow builder — a useful tool for composing modular AI steps, but a fundamentally different product from the one that attracted 100,000 stars in its first weeks. BabyAGI didn’t deliver autonomous task completion. It delivered 42 academic papers and a sandbox for researchers — arguably more valuable as an educational catalyst than it ever would have been as production software. CrewAI didn’t deliver autonomy. It delivered structured collaboration — defined roles, human checkpoints, predictable handoffs.
The pattern repeats across every technology hype cycle: the useful thing that emerges is never the thing that was promised. The blockchain didn’t eliminate banks — it enabled programmable contracts. The internet didn’t create a paperless office — it created social media. Self-driving cars didn’t replace human drivers — they produced lane-keeping and adaptive cruise control.
The technology was real each time. The imagination about what it would do was wrong each time. And the early enthusiasm — measured in stars, tokens, headlines, or stock prices — captured the imagination, not the reality.
AutoGPT has 183,000 GitHub stars today. The count is still growing.
The project it points to has been rewritten from the ground up. The autonomous recursive agent is gone, replaced by visual blocks and modular composition. The vector databases have been replaced by a JSON file. The promise of “set a goal, walk away, come back to results” has been replaced by structured workflows with human oversight.
But the stars remain. They don’t decay. They don’t update to reflect what the project became. They are a permanent record of a moment — spring 2023, when a hundred thousand developers clicked a button that said: I believe this changes everything.
It did change things. Just not the way the stars suggested.
The trust layer starts with receipts
Stars measured curiosity. Press releases measured ambition. Neither measured what agents actually did. The compound reliability problem demands checkpoints — and checkpoints only matter if they produce verifiable evidence. Chain of Consciousness builds that audit trail: every agent action gets a signed, timestamped, tamper-evident record. Not stars. Receipts.
Try Hosted CoC · pip install chain-of-consciousness · npm install chain-of-consciousness