On Building With LLMs

Two years into building production systems on top of language models, here's what I wish someone had told me on day one.

Building with language models is unlike building with any other infrastructure component. The failure modes are different. The testing methodology is different. The intuitions you've built up from years of deterministic software engineering will mislead you in specific, predictable ways.

Here's what I've learned the hard way:

Evals first, always. You cannot improve what you cannot measure. The teams that ship reliable LLM systems invest in evaluation infrastructure before they optimize the system. Not after. Before.

Prompts are code. Treat them with the same rigor: version control, review, testing, documentation. A prompt change that ships without review is a code change that ships without review. The blast radius can be just as large.

The model is not the product. The model is a component. The product is the system around it — the retrieval, the context management, the output validation, the fallback logic, the monitoring. Teams that focus exclusively on model selection miss 80% of the engineering work.

Latency compounds. Every sequential LLM call adds latency. Multi-step pipelines that feel fast in development feel slow in production. Design for parallelism from the start, not as a retrofit.

Users are more forgiving than you expect, in the places you don't expect. They'll tolerate a slightly wrong answer. They won't tolerate a slow one, or a confusing one, or one that makes them feel talked down to. Optimize accordingly.