Samuel Colvin, founder and CEO of Pydantic, told Business Insider that OpenAI and Anthropic are building “databases of coding intent” through their coding agent products. The play, according to Colvin: lock developers into platforms not through better models, but through non-exportable records of why every line of AI-generated code exists.

Pydantic, the widely used Python data validation library, works directly with both frontier labs on data validation for AI systems. Sequoia Capital led the company’s $12.5 million Series A. That proximity gives Colvin a specific vantage point on how labs are evolving their business strategies as both prepare for potential IPOs.

The Lock-in Mechanism

Colvin’s argument starts with economics. Both labs offer $200/month coding subscriptions (Codex for OpenAI, Claude Code for Anthropic) where the actual inference cost per user can reach thousands of dollars. The subsidy is deliberate.

“They’re trying to grow market share,” Colvin told Business Insider. “There’s perhaps a more profound thing they may be trying to do, though. Once customers have these enormous code bases, which would be basically written AI, you get to a point where you can’t maintain them as a human.”

A developer who uses AI to generate 20,000 lines of code overnight can use a model to fix that code, Colvin explained, but cannot maintain it manually. The codebase becomes dependent on the platform that created it.

From Code Generation to Intent Storage

The next step, Colvin predicts, is that labs will offer to store the full traces of every coding session: the prompts, the model’s reasoning, the developer’s intent, the full exchange that produced each line of code.

“Imagine you have a software bug,” Colvin told Business Insider. “Now you can click on that line of code and see the full exchange that my colleague had with the AI model to write that line of code, along with all of the reasoning, including the reasoning from the model, the input from the human, therefore a full explanation.”

The value is real. Intent databases would make debugging faster and reduce the risk of modifying unfamiliar code. But Colvin expects the data will not be exportable. “We give you that for free, but you can’t export it,” he predicted the labs will say. “So now you’re locked into whoever you’re using for that across the whole business.”

Coding Agents as Platform Infrastructure

The shift Colvin describes reframes what coding agents are. Codex and Claude Code are not standalone copilot products competing on suggestion quality. They are data collection mechanisms that accumulate behavioral information about how organizations write software. Each coding session feeds a proprietary dataset that becomes more valuable, and harder to leave, over time.

For teams running autonomous coding agents in CI/CD pipelines or development workflows, the implication is structural. The agent that writes your code also becomes the only system that understands your code. Switching providers does not just mean adapting to a new model’s style. It means losing the institutional memory of why your codebase exists in its current form.

As Let’s Data Science noted, the pattern is familiar from cloud infrastructure: offer a useful service at a loss, accumulate switching costs through proprietary data formats, then raise prices once migration is impractical.