Proactive intelligence

Almost all AI in business right now is reactive. You ask it something, it answers, you decide what to do. Chatbots, copilots, RAG pipelines, whatever the flavor of the month is. The human always starts the conversation, and the AI always waits politely until spoken to, like a very expensive butler who's not allowed to notice the house is on fire unless you specifically ask "hey, is the house on fire?"

I think that's a local maximum. The thing I keep seeing people circle around on Twitter and in engineering orgs, the thing that seems to genuinely excite people who've been jaded about AI for a while now, is what I'd call proactive intelligence. Agents with no UI that just run. They watch what's happening across your business, they reason about it, and they do stuff without anyone asking them to.

Karpathy's "autoresearch" project captures the vibe really well. He's got an agent running LLM training experiments in a loop, committing improvements to a git branch, forever, with nobody involved:

Andrej Karpathy

@karpathy

·Follow

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the Show more

7:53 PM · Mar 7, 2026

28.4K

Read 1.1K replies

That's ML research, but the same idea maps directly to running a business, and honestly the business version might be easier because the feedback loops are more concrete. You don't need to evaluate whether a loss curve is meaningfully better. You can just look at whether the inventory got restocked before you ran out.

Paperclip is another one that's been going around. It's an open-source framework for running entire companies with AI agents:

Paperclip — runs your business

You define a goal like "build the #1 AI note-taking app to $1M MRR," spin up a team of agents (CEO, CTO, engineers, designers, marketers), give them budgets, and hit go. They delegate work to each other, wake up on schedules to check on things, and coordinate autonomously. You watch from a dashboard and jump in when you feel like it, or don't.

It's early and rough around the edges, obviously. But the fact that this exists as an open-source repo right now, that someone built it and other people are forking it and running it, tells you something about where the gravity is pulling.

What this looks like day to day

Say you run an e-commerce company. You set up a few agents and go about your day:

Inventory agent watches stock levels and supplier lead times. A TikTok goes viral on Tuesday and demand spikes 400% on one SKU. The agent notices, checks that the supplier's port is backed up, sees marketing has another push scheduled for next week, and reorders from a backup supplier before you even hear about the TikTok.
Support agent reads every incoming ticket in real time. It spots that 30 customers in the last hour all mentioned the same defect on the same product batch. Flags it to the quality team, pauses the listing, starts drafting responses.
Pricing agent tracks competitor prices across a dozen SKUs. One competitor drops their price by 15% on a Friday night. The agent adjusts your margins within the bounds you set, logs why, and moves on.

Nobody kicked any of this off. Nobody's checking dashboards. The agents just run.

And I want to be specific about why this isn't Zapier, because I think that's the first place people's minds go and it's the wrong comparison. These aren't if-then rules. Each agent is reasoning about context. The inventory agent doesn't fire because "stock < 100." It fires because it connected a viral moment to a port delay to an upcoming campaign and decided the combination warranted action. That's a judgment call, not a pattern match.

The world model is the interesting part

Individual agents doing individual tasks is, honestly, not that interesting by itself. You could build most of those examples with some well-written cron jobs and a decent alerting system. The part that gets genuinely exciting is when the agents share a world model, when the sales agent closes a big deal and the supply chain agent already knows about it, when support starts getting complaints about a specific batch and the quality agent picks it up and the comms agent starts drafting a response, all of this happening before anyone opens a dashboard or sends a Slack message or schedules a meeting to discuss what to do.

For decades companies have been chasing the "single pane of glass" dashboard, this mythical surface where all the data comes together and a smart human stares at it and makes good decisions. And it kind of works, in the sense that the data does end up on a screen somewhere. But dashboards just put data in front of people and hope they look at it. A world model puts data in front of agents that actually act on it. Your job shifts from staring at charts to setting goals and guardrails, which is a fundamentally different kind of work.

Why now

A year ago, having an LLM monitor your entire support queue around the clock would've been stupidly expensive. Tens of thousands of dollars a month just for inference, and that's before you even get to the part where you'd need someone to babysit it. Now it costs almost nothing. But cheap inference alone wouldn't matter if the models couldn't reason about messy real-world data or reliably call tools and APIs without hallucinating their way into catastrophe. Both of those capabilities crossed the "good enough" line in roughly the same window, which is one of those lucky timing coincidences that tends to produce entirely new categories of software.

The scary part

How much autonomy do you actually give these things? That's the question everyone asks and nobody has a great answer for yet.

It's easy to say "the agent can issue refunds under $50" until it issues 2,000 of them in one hour because it found a shipping problem you didn't know about. It was technically right about every single one of those refunds. The customers were owed the money. But that's the kind of thing that gets someone fired when a human does it without asking first, and I don't think most organizations have figured out what the equivalent of "getting fired" looks like for an agent that was technically correct but wildly out of scope.

You can't just wrap everything in approval flows though, because at that point you've rebuilt the bottleneck that the agents were supposed to eliminate. What you actually need is a real framework for autonomy: spending caps, rate limits, escalation rules, clear boundaries between "handle this yourself" and "flag this for a human." Not a binary switch between "fully autonomous" and "ask permission for everything," but a gradient that you can tune as you build trust. The teams that figure this out early are going to move at a noticeably different speed than everyone else, and I mean noticeably in the way that people on the outside look at it and go "how are they doing that."

Where I think this goes

Give it a couple years and I think the best-run companies won't be the ones with the best analytics teams or the fanciest dashboards. They'll be the ones whose agents have the richest world model and the right amount of autonomy. The CEO job starts looking less like "make decisions" and more like "tune the system that makes them," which is a weird thing to say but I genuinely believe it.

We went from "ask AI a question" to "give AI a task" to something that looks a lot like "AI just runs parts of your business." Most orgs aren't ready for that last step, and to be fair, the tooling isn't quite there either. But some teams are already doing it, and I find myself increasingly curious about what happens when the rest of the industry notices.