← home / notes

Notes.

Thirteen notes on distributed systems, reliability, and LLMs in production - drawn as a graph, because that is how the ideas connect. Start anywhere and follow the edges.

13 notes

The Service Graph Is a Lower Boundstart here

The dependency graph you draw from RPC traffic is real but incomplete. The dependencies that take you down are the ones no edge shows.

distributed systemsobservability

5 min2026

Streaming Ingest in Rust

When you fuse many noisy event streams into one model, the hard problems are not throughput - they are identity and time.

ruststreamingdistributed systems

7 min2026

Build the Substrate First

Most platforms that reason about a system are really stacks of queries against a model of that system. Platform quality is capped by the model underneath.

platformreliabilityml

6 min2026

SLO-Driven Risk

Reliability effort gets spent on whatever feels scary in the room. Here is how to replace that gut feel with a number.

slo/slireliabilityrisk

5 min2026

Scaling a Live Stream to a Billion Viewers

A live broadcast turns one source into millions of simultaneous viewers in seconds. The hard part is not the video.

distributed systemsstreamingscale

4 min2026

Closing the Loop

When an LLM makes a judgment at scale, do not let it grade itself. Pair a cheap predictor with an expensive ground-truth engine.

fault injectionml labelschaos eng

6 min2026

Evals Before Features

Before you wire an LLM into a real workflow, decide how you will know it is good enough - because the eval is the product spec.

evalsllm

5 min2026

LLM as a Judge

A separate model can score outputs you cannot label by hand - but only if you treat it like a measuring instrument.

evalsllmquality gates

6 min2026

Cost-Aware LLM Pipelines

Most items in a large workload are easy, and a few are genuinely hard. The cheapest reliable pipeline routes compute by criticality.

llmcostinference

6 min2026

Agent Harness vs RAG

Two ways to feed a model context: retrieve fuzzy passages by similarity, or call purpose-built tools that return exact answers.

llm agentsragtool calls

6 min2026

Designing APIs for Agents, Not Humans

When an LLM is the caller, the interface is the prompt. Typed responses, idempotent writes, granular composable calls.

api designllm agents

6 min2026

Your Model Is Not the Product

A correct model or a sharp analysis is necessary but not sufficient. What gets internal AI actually used is the last mile.

llmproduct

6 min2026

Natural Language to SQL, Then and Now

In 2018 I helped build an RNN model that turned English questions into SQL. What that project looks like from the LLM era.

nlpllmresearch

3 min2026