Paolo Perrone published an updated AI agent stack framework through The AI Engineer Substack on June 8, redrawn from scratch to replace Letta’s November 2024 reference diagram that had become the default architecture visual for engineering teams building agents. The 2024 stack had four layers. The 2026 version has six. Three of the new layers didn’t exist as recognized infrastructure categories when the original was published.
The Six Layers
The updated stack maps every component between an LLM and a production agent across six tiers: models and inference, protocols and tools, memory and knowledge, frameworks and SDKs, eval and observability, and guardrails and safety.
The bottom layer, models and inference, is the most stable and increasingly commoditized. Perrone notes that open-weight models like Llama 3.3, DeepSeek V3, and Qwen 2.5 closed the quality gap with closed-source alternatives, making “always use the biggest closed model” outdated advice. The emerging default: prototype on closed-source APIs, deploy on open-weight.
The protocols and tools layer is entirely new. In 2024, every framework defined its own JSON schema for tool calls. MCP (Model Context Protocol) has since consolidated that fragmentation, reaching 97 million monthly SDK downloads with adoption from OpenAI, Google, and Microsoft, plus a donation to the Linux Foundation. Browser Use hit 78,000 GitHub stars in under a year. Agent-to-agent communication protocols (IBM ACP, Google A2A) exist but neither has reached critical mass.
Security remains the open problem in this layer. Endor Labs analyzed 2,614 MCP server implementations and found 82% prone to path traversal and 67% to code injection.
Memory as Architecture, Not Afterthought
Memory moved from “pick a vector database and do RAG” to a first-class architectural primitive with three distinct tiers: in-context state, vector search, and persistent cross-session memory. Perrone identifies “context engineering” as the discipline that replaced prompt engineering. Instead of writing better prompts, teams now architect what information the agent sees on every call through named, structured memory blocks the agent can read and overwrite each turn.
pgvector became the default for teams that don’t need dedicated vector infrastructure. GraphRAG, led by Neo4j, added relationship-based retrieval as a second option alongside embedding similarity search.
The Eval Gap
Eval and observability, the fifth layer, is the one Perrone identifies as the widest gap between demo and production. LangChain’s State of Agent Engineering survey found 89% of teams running production agents have implemented observability, according to LangChain’s own reporting. Only 52% have evals. That 37-point gap is where, as Perrone puts it, “production quality dies.”
The framework identifies three emerging tiers of evaluation: fast checks on every PR (did the agent call the right tools?), nightly regression suites using LLM-as-judge, and continuous production monitoring for performance drift. New agent-specific benchmarks have appeared, including Context-Bench for memory management, Recovery-Bench for error recovery, and Terminal-Bench for coding agents.
The Framework Lock-in Problem
At the framework layer, the landscape fractured from LangChain-or-nothing into three camps: provider SDKs (OpenAI Agents SDK, Google ADK, Microsoft Semantic Kernel), graph-based orchestration (LangGraph, which hit v1.0 in October 2025 with production deployments at Uber, JPMorgan, LinkedIn, and Klarna), and teams writing thin wrappers over provider APIs and MCP with no framework at all.
Perrone flags this layer as carrying the highest vendor lock-in risk in the entire stack. Orchestration code doesn’t port. A LangGraph agent rewritten for CrewAI is a new codebase. MCP is the only layer that transfers across all three camps.
Five Times More Complex
The overall takeaway: the 2026 agent stack is roughly five times more complex than the 2024 version, driven by production deployments breaking under approaches designed for demos. The framework’s default advice is to start simple and add complexity only when something specific breaks, not before. But production deployments now require governance at inference time, memory consistency to prevent hallucination reuse, and multi-agent orchestration that no single tool fully solves.