Tom Moulton·3 June 2026·Engineering·8 min read

Four things we build into every AI system

Observability, safety, cost control, durability. Here's why they're not optional, and what happens when they get bolted on later.

X LinkedIn

Four things we build into every AI system

A working demo is easy. A system you can run, trust and afford for two years is a different thing entirely, and the difference is mostly craft. The gap is made of four things that teams under pressure to ship tend to treat as afterthoughts: observability, safety, cost control and durability. We build all four in from the very first commit, because each is cheap to design in and painful to bolt on later. It's a large part of what we mean when we talk about precision and substance.

The demo is the easy 20%

Getting a model to produce an impressive answer in a controlled setting takes an afternoon. The hard 80% is everything that happens when real users, real data and real load arrive, and when the answer is occasionally wrong, slow or expensive. The four properties below are what quietly turn a demo into something you can actually operate.

1. Observability: you can't fix what you can't see

AI systems are non-deterministic: the same input can produce different output on two consecutive runs, which means the debugging habits built over decades of conventional software don't transfer cleanly. If you aren't recording every prompt, every response, the latency and the token counts, you aren't debugging an incident, you're guessing about one. So we instrument from day one: every model call traced, inputs and outputs captured, answer quality tracked over time, so that degradation shows up as a trend on a chart rather than as a customer complaint.

If you can't replay exactly what the model saw and said, you're not debugging: you're guessing.

2. Safety: assume it will say the wrong thing

Generation is probabilistic. Hallucination, confidently stating something false, isn't a bug awaiting a patch; it's a property of how these models produce text. So we design as though the model will occasionally be wrong, because it will. That means validating outputs against a known schema before they're trusted, grounding answers in your real data through retrieval rather than the model's memory, constraining what the system is allowed to do and say, and keeping a person in the loop wherever the cost of an error is high. Safety isn't a filter you add at the end. It's the shape of the whole system.

3. Cost control: the bill that surprises you

Token-based pricing means cost scales with usage in a way fixed-licence software never did, and the price per token varies by orders of magnitude between the cheapest and the most capable models. Without budgets, caching and deliberate model routing, a feature that succeeds can quietly become a feature that's too expensive to keep. So we cap spend, cache repeated work, and route by difficulty: the cheap, fast model wherever it's good enough, the expensive one reserved for the cases that genuinely need it.

4. Durability: the model you launch on isn't the one you'll run on

Model providers deprecate and replace models on their own schedule, not yours, and prompts tuned for one version drift when the version changes underneath them. If a system is welded to the quirks of a single model, every upgrade becomes a rewrite and every deprecation notice becomes a fire drill. So we put the model behind an interface, keep a suite of evaluations so a swap can be verified rather than hoped for, and treat prompts as versioned, tested assets rather than strings buried in the code.

Why bolting them on later costs more

Each of these is inexpensive to design in at the start and expensive to add after the fact. Observability retrofitted after an outage means you already flew blind through the incident that mattered. Safety added after a bad answer reaches a customer is added after the damage is done. Cost control introduced after the invoice is reactive by definition. Durability bolted on during a forced migration is the most expensive version of all. None of the four is glamorous, and none of them shows up in a demo. They're simply the difference between an AI system that impresses once and one you can depend on for years. We'd rather build the second kind.

X LinkedIn

Ready to see what we could build for your business?

Talk through a new build on a free discovery call, or find where AI pays back with an AI Opportunity Audit.

Start a project→Book an AI Audit