Tim Lee Is Right About Hunches.

2 minute read

Here’s What Comes Next.

In a recent piece, Tim Lee offers one of the cleaner arguments I’ve read for why today’s agent architectures are unlikely to produce “AI scientists” anytime soon. His central observation: the implicit knowledge knowledge workers carry, the hunches, the half-formed associations, the things on the tip of the tongue, doesn’t survive the handoffs that agentic systems require. He borrows Marc Andreessen’s framing that “your agent is just its files,” then turns it against the optimistic reading. If the agent is just its files, then whatever the language model can’t articulate gets left behind every time the context window resets. The temp-worker analogy that follows, a different person each Monday, however well-trained, however meticulous the predecessor’s notes, is the most legible version of this argument I’ve seen in popular tech writing. Lee has translated something the field has been gesturing at for two years into language a general reader can hold.

But here’s where I want to push further. Lee describes hunches as compressed pattern recognition, knowledge the brain holds but can’t articulate. That’s true, and it’s part of the story. What it leaves out is why a seasoned practitioner’s hunch is trustworthy in a way that a fresh one isn’t. The seasoned hunch isn’t just a denser pattern; it has carried a consequence. The practitioner has made calls, watched them play out, paid for the wrong ones, and adjusted. That loop, judgment, action, cost, revision, is what gives the hunch its weight. An LLM mid-session might develop a hunch-like feeling. Nothing in its loop ever bears a cost. Nothing carries forward. This isn’t a context-window problem that a longer window or smarter compaction will solve. It’s a stake problem. The files can hold what the model said. They cannot hold what they would have lost by being wrong.

Lee ends where he can: we’ll still need human workers to do our deep thinking for us. That’s the right conclusion if you stop at the inspection level, at what the agent produces. But it leaves the harder question untouched. If the consequence-carrying loop is what makes judgment trustworthy, and if that loop can’t survive the handoffs an agentic system requires, then institutions can’t govern AI use by reviewing outputs. Outputs are exactly the explicit residue Lee has just told us is insufficient. They are the files. Reviewing them well will not tell you whether the practice that produced them was a practice at all, or only its performance. Lee has named the gap. The next move is figuring out what kind of institutional architecture closes it, or learns to live within it.

Originally published on Substack.