Your Model Is Not the Product: The Delivery Layer Is
A correct model or a sharp analysis is necessary but not sufficient. What gets internal AI actually used is the last mile almost no one designs for - routing each insight to its owner, defaulting it into their workflow, and making it actionable enough to act on without a meeting.
flowchart LR Mo["model / analysis"] --> D["delivery layer
route, subscribe, audit"] D -->|"actionable"| OW["the right owner acts"] classDef key fill:#e8f1fb,stroke:#1e1e1e,color:#1e1e1e classDef good fill:#eaf6ec,stroke:#1e1e1e,color:#1e1e1e class D key class OW good linkStyle 1 stroke:#2f9e44,color:#2f9e44
The model is the easy half
There is a particular kind of disappointment in shipping something that works and watching no one use it. You build a model or an analysis that is genuinely good - it finds the real risks, it is calibrated, it survives the eval - and then it lands in a dashboard that three people open twice and forget. The output was correct. The adoption was zero. It took me longer than it should have to internalize that these are two separate problems, and the second one is usually the harder.
The trap is treating the model as the product. The model is an input. The product is whatever causes an engineer to do something differently because of that input, and most of the work that determines whether that happens lives downstream of the model entirely. I have come to think of this as the delivery layer: routing, defaults, triage, review, and actionability. None of it is the part anyone demos, none of it shows up in a benchmark, and all of it decides whether the model matters. A correct insight that never reaches the one person who can act on it, in a form they can act on, has the same impact as no insight at all.
Route to the owner, do not publish to a wall
The most common delivery failure is broadcasting. You produce N findings and put them somewhere central - a dashboard, a report, a channel - and assume the right people will come look. They will not. A central surface puts the burden of discovery on the reader, and the reader has a job that is not reading your surface. Findings on a wall are findings nobody owns.
The fix is to invert it: resolve each insight to a specific owner and deliver it to them, rather than waiting for them to find it. That means the system has to know who owns the code, the service, or the decision each finding touches - which is real plumbing, attribution and ownership lookup, not an afterthought. Then layer subscriptions and sensible defaults on top. The right default is the one that reaches the likely owner without anyone having to configure it, with subscriptions available for the people who want to widen or narrow what they see. Opt-out beats opt-in for anything that matters, because the people who most need a finding are exactly the ones who will never go set up an alert for it. When delivery is addressed to an owner with a default that already includes them, the insight arrives as part of their work instead of as homework.
Make each insight actionable, or it is just trivia
An insight that tells someone a fact is far weaker than one that tells them what to do about it. "This dependency looks risky" produces a shrug. "This dependency looks risky, here is the specific call site, here is the evidence the system used to decide, and here is the concrete next step" produces an action. The difference is not the quality of the model. It is whether the delivery layer packaged the output as a decision the recipient can make rather than an observation they have to interpret.
Three things make an insight actionable. First, evidence: the finding has to carry its supporting artifacts with it - the exact location, the inputs the conclusion was drawn from, a trace of how it was reached - so the recipient can verify it in seconds instead of either trusting blindly or relaunching the investigation themselves. This is far easier when retrieval happens through explicit, auditable tool calls - structured steps you can follow - rather than an opaque or embedding-based process, because every claim traces back to the artifact it came from. Second, a next step: the output should name the action, not just the problem, so the path from reading to doing is a single short hop. Third, scoping: one clear finding with one clear action beats ten findings that each need triage, because the cost of figuring out what to do is itself a tax that suppresses adoption.
Keep a human in the loop, and give them a triage workflow
Internal AI tooling that acts without review gets switched off the first time it is confidently wrong about something expensive, and it will eventually be confidently wrong. The honest framing is that the model reduces the volume of things a human has to look at and sharpens what they look at first; it does not remove the human. So the delivery layer needs a review and triage workflow as a first-class surface, not a bolt-on - a place where findings land with clear states, where a person can accept, dismiss, or escalate, and where a dismissal is captured rather than lost.
That workflow earns trust in both directions. The recipient trusts the system more because nothing acts behind their back and every finding has an off switch. And the system gets better because the triage actions are signal: a dismissal with a reason is a corrected example of where the model overreached, an acceptance confirms a boundary, and an escalation flags where the stakes were higher than the model assumed. These captured decisions feed back the same way good evals do - as corrected reference examples and adjustments to prompting and decision thresholds, not as training data - so the boundary stays calibrated as the system moves underneath it. Lowering the model's sampling temperature for these graded, repeatable paths helps too, because a reviewer building a mental model of the tool needs the same input to behave the same way each time. That is a reproducibility lever, not a way to make the model correct. A triage workflow that captures decisions turns review from pure overhead into the mechanism that keeps the system honest over time.
Meet engineers where they already work
The last principle is the one most often skipped: deliver the insight into the place the recipient already is. Engineers live in their code review, their incident tooling, their ticketing system, their chat - and every context switch you ask of them is a tax that compounds against adoption. A finding that surfaces inside the code review someone is already doing gets handled. The same finding sitting in a separate portal they would have to remember to visit does not, no matter how correct it is.
This is unglamorous integration work - wiring into existing tools, matching their conventions, respecting their notification budgets - and it is precisely the work that separates a tool people use from a tool people were told to use. The deeper lesson is that adoption is a design problem, not a quality problem, and it has to be designed from the recipient backward. Start from the person who will act, the moment they will act, and the surface they are already looking at, then run the delivery layer back from there to the model. Build it in that direction and a good model becomes a used one. Build it the other way, model first and delivery as an afterthought, and you get the thing I started with: a correct answer nobody acts on.
Takeaways
- The model is an input; the product is what makes someone act differently because of it. Adoption is a separate, usually harder problem than correctness, and it lives entirely in the delivery layer.
- Route each insight to its specific owner instead of publishing to a central wall. Combine ownership attribution with subscriptions and opt-out defaults, because the people who most need a finding will never go configure an alert for it.
- Make every insight actionable: ship it with its evidence (exact location, the inputs behind the conclusion, a traceable path) and a concrete next step. One clear finding with one clear action beats ten that each need triage.
- Keep a human in the loop and give them a real triage workflow. Captured accept/dismiss/escalate decisions build trust and feed back as corrected reference examples and threshold tuning - not training data - keeping the boundary calibrated. The model reduces and sharpens the review load; it does not remove the reviewer.
- Meet engineers inside the tools they already use - code review, incident tooling, tickets, chat. Design the delivery layer backward from the person who will act and the surface they are already looking at, not forward from the model.