We’re past forty people now, and since the start of the year we’ve been turning the way our best people work into reusable Skills, so an AI can run them for the whole team. It looks like a writing problem. Write down what the expert does, ship it.
It isn’t. The reason why is older than the tools.
The thing you copy is judgment, not steps
The Skill that made this obvious looked dull on paper: check whether a document actually follows our brand. The rules are the easy part. Spacing, colours, wording. The hard part is the reaction, the small “that’s not right” that fires before anyone can explain it. Packaging that reaction, not the rules, is the real job. O’Reilly calls it turning judgment into artifacts.
And the reaction is almost impossible to write down. It is there, but the thing underneath it, the part that could be handed to a colleague, mostly is not.
A forty-year-old problem
The difficulty of saying what you know is old. Michael Polanyi put it in one line: “we know more than we can tell.” The economist David Autor later named it Polanyi’s Paradox. The first wave of AI ran straight into it in the 1980s and called it the knowledge acquisition bottleneck: the hardest part of building an expert system was getting the rules out of the expert’s head, because experts can’t fully say what they know.
Here is the new part. Pulling the knowledge out used to be a human job, the “knowledge engineer,” and it barely worked. Now the AI does it. It asks the scoping questions, and it reads everything else: the Slack threads, the Notion pages, the Jira tickets, the inbox. The systems where white-collar work, and most of the thinking behind it, already lives. The extraction that used to be the bottleneck is mostly solved. The problem should be gone.
It is not.
Why it isn’t gone
Judgment sits on three layers. One is what someone would say if asked. Another is what their behaviour shows, even when they never said it. The third is the reaction in the moment, plus the reason underneath it, the layer that never reaches words.
The AI drains the first two easily. Given access to everything a person has ever typed, it builds a sharp picture of what they did. The why is mostly missing. Those same systems record decisions, not the evaluation behind them.
And the obvious fix, just ask, is weaker than it sounds. Nisbett and Wilson showed in 1977 that people have little direct access to their own reasons; we narrate a plausible story after the fact. That strong claim has been qualified since: what we report in the moment holds up far better than what we reconstruct later. So the AI gets the after-the-fact story, then fills the gap with a confident “because.” That is not judgment. It is a fluent guess.
Some go further. Marc Andreessen has said he runs on as little introspection as possible. The psychologist Nick Chater argues the inner world is flatter than it feels: reasons and beliefs are not stored states we read off, but stories the mind improvises on demand. Take that seriously and the problem gets worse, not better. Judgment this situational has no fixed rule waiting to be extracted. Reproducing it faithfully would mean capturing the whole sequence of thought that produced each call, every prior reaction that shaped the next one. A rubric cannot hold that. A snapshot of criteria is the wrong shape for something that gets rebuilt fresh every time it runs.
The bottleneck didn’t close. It moved.
It moved from extraction, which is solved, to capture, which is not. Call it the capture gap: the distance between the judgment that fires in someone’s head and anything written down where a tool could find it.
That changes the work. The problem is not weak articulation. The reaction is simply gone before it lands anywhere, in a meeting or halfway through a message, and no interview, human or AI, can retrieve a thought that was never recorded.
What to do about it
Two moves, and both have a ceiling worth naming.
The first is recognition over recall. Listing criteria cold produces platitudes like “ownership” and “proactivity.” The better move is to let the AI draft the rubric and react to it: “No. The real tell is whether the weekly update arrives without being chased.” Reacting is far easier than recalling. The ceiling: only the cases the draft already imagined ever get a reaction.
The second is in-the-moment capture. When the reaction fires, write the trigger down right then. One line is enough: what was expected, and what showed up instead. This is the only move that reaches the third layer, because it catches the judgment before the narrator rewrites it. The ceiling: it takes discipline, and half of it gets forgotten anyway. A thin log of real reactions still beats a tidy set of invented ones.
The harder truth sits underneath both. The help AI gives here has a floor, and the floor is human attention. The machine can run the interview. It cannot notice what no one ever said out loud.
So the real work was never prompt engineering. It is the unglamorous habit of catching a reaction the moment it fires and leaving a trace of it. And if judgment really is rebuilt fresh each time, that running log is not a lossy copy of some inner rulebook. It is the closest thing to the judgment that exists at all.


Leave a Reply