Permissionless Intelligence — The Autoresearch Democracy

April 21, 20269 min readanalysis

The question that ate the bottleneck

For as long as software has existed, the constraint was building it. You had an idea, and between that idea and a working system stood months of engineering, millions in capital, teams of specialists. The gap between "this should exist" and "this exists" was enormous. It filtered ruthlessly. Only ideas with enough capital backing survived the crossing.

That gap is closing. Not narrowing — collapsing. A markdown file describing what you want, pointed at an AI agent, produces a functional codebase. Not a mockup. Not a prototype deck for investors. Working software. The implementation barrier that filtered ideas for fifty years is dissolving in real time.

Which means the bottleneck has moved. The scarce resource is no longer "can we build this?" It's "should we build this?" And that's a fundamentally different kind of problem — one that no amount of engineering talent or compute can solve, because it's not an engineering problem. It's a meaning problem.

What should exist in the world? When anyone can build anything, who decides what's worth building?

The proof that implementation is free

In March 2026, Andrej Karpathy published autoresearch — a repo where an AI agent runs autonomous ML experiments overnight. The agent modifies the training code, trains for five minutes, checks if the result improved, keeps or discards, repeats. You write a markdown file — program.md — describing your research direction. The agent handles everything else. Wake up in the morning, check the log, find a better model.

Over 55,000 stars. The pattern resonated because it made something visceral that people had been sensing abstractly: the human's job is no longer to do the work. It's to direct the work. You're not touching the Python files. You're programming the markdown files.

But autoresearch only works because it has a clean evaluation function. Validation bits per byte. A number goes down, the experiment worked. No ambiguity, no conflicting interests, no politics. The metric is computable and the agent can evaluate its own output.

Most problems worth solving don't have that luxury.

Dysfunctional markets don't have loss functions

The hiring market is broken. Anyone who's been on either side knows it. Employers drown in applications they can't evaluate. Candidates spray generic materials into a void. The matching function — the thing that determines whether a candidate and a role actually fit — operates through crude filters: keyword matching, credential proxies, a three-minute resume scan by someone who doesn't understand the role.

I know this firsthand. My background doesn't map to conventional filters. No software engineering degree, no ML credential. But anyone who sits with me for five minutes and looks at what I've built understands immediately. The problem isn't capability — it's that the system has no capacity for the kind of context that would reveal the match. The filters aren't just inefficient. They're structurally incapable of capturing what matters.

A startup called Jack & Jill raised $20M to attack this with dual AI agents — one profiling candidates through deep conversational intake, the other profiling roles from the employer side. The core insight: the intermediary charges for access, not insight. Replace it with agents that actually understand both sides and the matching function improves by an order of magnitude.

Same dysfunction in rental markets. Same in legal services, eldercare, therapy, financial planning. Every market where matching is nuance-heavy, the intermediary is extractive, and both sides lose from bad matches.

These are all autoresearch-shaped problems. There's a spec (what constitutes a good match), an implementation space (how to find and verify matches), an iteration cycle (get better over time). An agent could run the implementation loop the same way Karpathy's agent runs training experiments.

But here's the crux: what's the evaluation function?

In autoresearch, val_bpb is deterministic. In a dysfunctional market, the evaluation function is contested. What makes a good hire? The employer wants someone who ships fast. The employee wants growth opportunities and reasonable hours. The recruiter wants a quick close. These aren't the same metric. There's no objective loss function that resolves a values conflict.

You can automate the implementation — the search, the matching, the paperwork. But you can't automate the answer to "what does good mean here?" That's a contested evaluation function — and contested evaluation functions are, at bottom, meaning problems.

Meaning is use

There's no universal definition of "good hire" floating in some Platonic space. There's only what this employer, this candidate, this market, this moment means by it right now. And that meaning comes from how people actually use the term — not from a dictionary, not from a training set, not from an algorithm.

This is Wittgenstein's central insight, and it does real work here — not as decoration but as architecture. Meaning isn't defined by matching symbols to objects in the world. Meaning is constituted by use. A community of people with skin in the game — HR managers, job seekers, labor lawyers, recruiters figuring out if their profession has a future — each brings a different understanding of "good match." Each one's use of the term is legitimate. The negotiation between them is the evaluation function.

Not a number going down. A community converging — or failing to converge — on shared meaning.

This is the same coordination problem Bitcoin's consensus mechanism solves for financial transactions: how do you get a network of strangers who don't trust each other to agree on a shared state, without anyone in charge? Satoshi's answer was proof of work — computational commitment as the substrate for consensus. The answer here is adjacent: domain commitment. You earn influence in the evaluation not by mining hashes but by demonstrating that you understand the problem space well enough to improve the spec.

The spec-file loop

Here's how it works, concretely:

Someone writes a proposal in natural language. Not code. A markdown file describing a dysfunctional market and a proposed solution. "The hiring market is broken because intermediaries optimize for close speed, not match quality. Here's a proposed architecture for dual-agent matching that puts candidate and employer context first." This is the program.md of social infrastructure.

An AI agent implements the spec. It generates a codebase, a technical architecture, a running prototype. This isn't hypothetical — Claude Code, Replit, Cursor produce functional software from natural language descriptions today. The agent handles the implementation. You don't write the code. The code is free.

The community evaluates. Not a computable metric. Humans. Domain experts. Affected parties. An HR manager reads the spec and challenges: "What about candidates gaming the conversational intake with AI-generated responses?" A labor lawyer flags: "This needs GDPR-compliant data handling for EU deployment." A job seeker points out: "The power asymmetry still favors employers in this design — the agent should advocate for both sides equally."

Each critique is a move in the language game. Each one refines what "good" means in this context. Each one adds a constraint that the next version has to satisfy.

The agent regenerates. After a fixed period — 24 hours — the AI takes all community input and produces a new version of the spec and codebase. Every validated critique gets folded in. The system gets more robust, more nuanced, more aligned with the actual needs of the people it's supposed to serve.

Repeat. The new version goes back to the community. More evaluation. More edge cases. More domain expertise. Each cycle, the spec gets sharper, the implementation more complete, the consensus broader. The meaning of "good hiring system" gets negotiated through iterative use.

The Bitcoin architecture beneath it

The parallels to Bitcoin's consensus mechanism aren't metaphorical — they're structural. The proposal is the block: a unit of coordinated action, a claim about what should exist. The community is the consensus mechanism: domain experts validate proposals against reality the same way nodes validate blocks against protocol rules. The agent is the miner: computational work in service of social consensus, implementing what the community validates rather than deciding what's good.

This isn't governance theater. The agent ships every cycle. A new codebase drops every 24 hours. The question isn't "should we build this?" — the agent already built it. The question is: "is this version better than the last one? Does it mean what we need it to mean?"

The research layer is genuinely agentic. This isn't humans posting ideas for other humans to evaluate — that's a forum. The agents do real work: scanning signal sources for market dysfunction, profiling industries against a diagnostic framework, generating implementations, incorporating feedback, producing improved versions on a fixed cadence.

The human role shifts from "generate ideas and build them" to "evaluate and direct." The human contributes the thing that's actually scarce — judgment, domain knowledge, lived experience, meaning — while the agent handles everything that's abundant: code, implementation, iteration speed.

That's not a demotion of human agency. It's a concentration of it.

The real question

If implementation is free and direction is scarce, then the most important infrastructure isn't the one that builds fastest. It's the one that decides best.

The autoresearch democracy is a proposal for that infrastructure. Not a platform. Not a product. A protocol for answering "what's worth building?" — the same way Bitcoin is a protocol for answering "who owns what?" without needing anyone in charge.

The answer can't come from a single founder with a vision. It can't come from a VC deciding which pitch deck gets funded. It can't come from an agent optimizing a loss function. It can only come from the people who have to live with what gets built, negotiating what "good" means through iterative use.

Karpathy proved the implementation loop works. The open question is the evaluation function. And for that, we need each other.

The spec-file democracy proposes how communities negotiate meaning at scale. But the infrastructure those specs produce needs a payment rail — and the legacy payment stack has the same intermediary problem as the markets being disrupted.

This is Part II of the Permissionless Intelligence arc. Part I: Proof of Work, Proof of Trust establishes the structural parallel between Bitcoin and constitutional AI. Part III: The Agent-Native Currency examines why Bitcoin might be the natural payment rail for an agentic internet.

Sources

  • Karpathy, Andrej — autoresearch, github.com/karpathy/autoresearch (2026)
  • Nakamoto, Satoshi — "Bitcoin: A Peer-to-Peer Electronic Cash System" (2008)
  • Jack & Jill AI — $20M seed round led by Creandum, October 2025
  • Wittgenstein, Ludwig — Philosophical Investigations, §43: "the meaning of a word is its use in the language"

Related Posts

X
LinkedIn