Carnegie Mellon University database researcher Andy Pavlo has identified query optimization as the hardest unsolved challenge for AI coding agents working on database systems. According to The New Stack, Pavlo observed that agents can reliably build most standard database components but hit a wall on the problem that determines whether a database actually performs well under production workloads.
Where Agents Succeed
Pavlo noted that CMU student submissions for database projects saw a “massive spike in lines of code” once LLMs were permitted. The pattern is consistent: agents reproduce known implementations effectively.
“The coding agents are very good at building almost every part of a database: B+ trees, hash tables, buffer managers, because they can regurgitate standard implementations found in textbooks and open-source repos,” Pavlo told The New Stack.
These components have well-documented reference implementations across academic papers, open-source projects, and textbook code. Agents pattern-match against that corpus and generate functional versions.
The Query Optimizer Wall
The component agents cannot build is the query optimizer, which Pavlo called the “double black diamond” challenge, according to The New Stack. Unlike a B+ tree, which follows a fixed algorithmic structure, a query optimizer must evaluate multiple possible execution plans for a given SQL query, estimate the cost of each plan based on data statistics, and select the plan that minimizes I/O, CPU, and memory usage for a specific dataset and workload pattern.
This requires reasoning about cardinality estimation (how many rows will pass through each stage of the query), join ordering (which tables to combine first when multiple joins are involved), and index selection (which access paths to use). These decisions interact with each other, and the optimal combination depends on the actual data distribution, which changes over time.
No textbook provides a single canonical implementation. Production query optimizers in systems like PostgreSQL, MySQL, and Oracle have been refined over decades through empirical tuning, and their behavior reflects accumulated engineering judgment that does not exist in a training corpus in reproducible form.
Why This Matters for Agent Infrastructure
The finding carries direct implications for teams deploying long-running autonomous agents. Every persistent agent, whether managing customer support tickets, executing trades, or orchestrating supply chains, depends on a data layer for state management, decision history, and context retrieval.
According to RisingWave’s analysis of agent data infrastructure, production agents in 2026 are “buying advertising inventory, routing support tickets, adjusting dynamic pricing, filing expense reports, approving small transactions, and managing cloud resource allocation.” These workloads generate read and write patterns that differ fundamentally from traditional analytics queries. Agent data access tends to be high-frequency, low-latency, temporally ordered, and state-dependent.
As agent deployments scale past thousands of concurrent decision cycles, database performance becomes a binding constraint. If the data layer cannot optimize queries for agent-specific access patterns, latency compounds through every observe-think-act loop the agent executes.
The Infrastructure Gap
Pavlo’s analysis points to a structural gap in the agent stack. The model layer (LLMs, reasoning engines) and the orchestration layer (agent frameworks, tool registries) have both seen rapid maturation in 2026. The data layer has not kept pace with the specific demands of agentic workloads.
This creates an opening for database vendors and startups building agent-optimized data infrastructure. The first systems that solve query optimization for agent-specific workloads (temporal state queries, audit trail retrieval, high-frequency context lookups) will hold a structural advantage similar to what Kafka provided when microservices architectures hit message queue bottlenecks in the 2010s.
For builders, the immediate takeaway is concrete: before scaling an agent deployment past pilot stage, benchmark the database layer under realistic agent query loads. At scale, the optimizer is the bottleneck — benchmark the database layer early.