Anatomy of an AI product
When people picture "an AI product," they picture the model — the clever bit that writes the answer. The model is maybe a fifth of it; the product is the stack of layers around that call, and getting the layers right is the actual engineering.
The trap is treating it as one thing. It's two planes: a live request path that answers a user in real time, and an offline batch plane that runs on a clock to prepare what the live path needs. Confuse the two and you end up doing slow work in the request and stale work in the background. Here's the whole anatomy.
The live request path
This is the part that answers a user while they wait, top to bottom:
- Product surface. The web app, API, or chat the user actually touches, where the request comes in and the answer goes out.
- Runtime orchestration. The agent loop or app code that handles that live request: what to call, in what order, when to stop. This is runtime control, second by second.
- Retrieval (RAG), "the legs." It reads a prebuilt index and reranks to fetch the facts the model needs. Note the word reads: the index was built earlier, off to the side.
- Cognition (DSPy), "the brain." The step that turns the question and the retrieved context into an answer. This is where DSPy lives: typed signatures and modules instead of hand-written prompts, so the thinking is testable and improvable.
- Serving. The model runtime under cognition: LiteLLM routing to Ollama, vLLM, or a hosted model. DSPy decides what to ask; serving is what runs it.
- Guardrails → response. Validate the output against a schema, repair or fall back if it's malformed, then hand a clean answer back to the surface.
Every box here is on the clock with the user. The whole job of this plane is to be fast and reliable.
The offline batch plane — where Airflow actually lives
Here's the part people put in the wrong place, and I did too at first. Airflow is not in the request path. It's a batch orchestrator — it runs on a clock, off to the side, doing the slow work that the live path can't afford to do per request:
- ingest and embed documents → this is what builds the retrieval index the live path reads
- refresh data that the product depends on
- run evals against the current system
- compile or tune the DSPy program and save the artifact the live path loads
- batch scoring and other heavy precomputation
The rule that keeps the two planes honest is one line: Airflow writes; the request path only reads. No slow ingestion inside a user request, no stale index because nothing refreshed it. I lay out exactly this split for one product in the Airflow DAGs that run paiddaily.io — Airflow keeps Postgres fresh, the API serves pure reads.
If you've thought of Airflow as "a data engineering tool," that instinct is right — and it's why it belongs here, not in the live loop. Data-engineering orchestration is exactly what an AI product needs on its offline plane.
Evals and observability wrap everything
Down the right side of the diagram is the layer that isn't a step at all — it's cross-cutting. Evals and observability sit across every box: golden sets and metrics that say whether quality is holding, plus traces, token counts, latency, cost per feature, and drift over time. This is how you know the product works instead of hoping it does. It also closes the loop with the brain — the eval is the metric the DSPy compile step on the batch plane optimizes toward.
Where DSPy fits — and why the boxes matter
DSPy is one layer: cognition. It doesn't retrieve, serve, schedule, or store — it makes the thinking typed and improvable, and lets the rest of the stack be what it's good at. That's the real point of drawing the anatomy this way: every box is a separate, swappable part. Change the model and only serving moves. Swap the vector store and only retrieval and the batch ingest job move. Improve the brain and only the DSPy layer and its compile step move. Nothing else has to know.
An AI product isn't a model with some glue. It's two planes of clean layers, each doing one job, with evals across the whole thing telling you the truth. Build it that way and the model becomes the easy part to change — which, given how fast models change, is exactly where you want your flexibility.
