Eric Roby published an engineering-focused breakdown of the AI agent infrastructure stack on his Brain Bytes Substack Monday, mapping six discrete layers and two vertical rails that define how production agent systems are built in 2026. The piece updates Letta’s 2024 agent-stack diagram and Paolo Perrone’s 2026 O’Reilly revision with a backend engineer’s perspective on where teams should and should not invest.
The Six Layers
Roby’s stack runs bottom to top:
Layer 6: Models and Inference. The foundation. Roby notes that production agents in 2026 rarely use a single model. The pattern is model routing, sending each request to the cheapest model that can handle it: small models for classification and triage, frontier models for hard reasoning, dedicated models for embedding and evaluation.
Layer 5: Tools and MCP. How agents act on the outside world. This layer did not exist as a distinct category 14 months ago. Anthropic launched MCP (Model Context Protocol) on November 25, 2024, and it has since become the standard protocol for tool connectivity. Roby highlights the math it fixes: N agents and M tools previously required N times M custom connectors. With a shared protocol, the requirement drops to N plus M.
Layer 4: Knowledge (RAG and Retrieval). External information the agent retrieves at query time. Roby covers the convergence toward hybrid search using Reciprocal Rank Fusion to merge sparse (BM25) and dense (embedding) retrieval results.
Layer 3: Memory. What the agent remembers across steps, sessions, and users. This layer used to be lumped in with knowledge/RAG but has split into its own concern in 2026.
Layer 2: Orchestration and Runtime. The control plane that runs the agent loop: think, act, observe. LangChain, provider-native SDKs, and custom loop implementations live here.
Layer 1: Agent Surface. Where the agent appears to the user, whether that is a chat interface, a Slack integration, or an autonomous background process.
Two vertical rails cut across every layer: observability/eval (traces, metrics, quality measurement) and governance/security (permissions, audit, human-in-the-loop controls).
Four Changes Since 2024
Roby identifies four structural shifts since Letta’s original diagram. MCP emerged as a distinct tool-connectivity standard. Memory separated from knowledge into its own layer. Eval became a first-class concern that was not on the original map. And provider-native SDKs absorbed several layers into single APIs, collapsing what used to require multiple libraries into one integration point.
The Anti-Pattern
The article’s central argument is practical: teams fail agents not by choosing bad models but by adding infrastructure layers before they can articulate the specific problem each layer solves. A backend team building a support agent that answers questions, looks up a customer record, and calls one refund endpoint does not need a graph runtime, persistent state, retries, custom tool wrappers, a vector database, memory, tracing, and dashboards from day one.
“The stack needs to be treated like a map, not a checklist,” Roby writes. “You do not need every layer just because the layer exists. You need the layer when you can point to a specific failure and say, ‘This is the thing that fixes it.’”
The diagnostic framework is concrete: if users keep repeating preferences, add memory. If one model call cannot handle the workflow, add orchestration. If tool calls affect production data, add governance. If prompt changes ship without verification, add evals. If none of those conditions hold, the three-layer stack (model, tools, retrieval) handles the job.