Trajectory, the continual learning startup founded by researchers from Google DeepMind, Apple, and Meta Superintelligence Labs, has open-sourced a concurrent multi-LoRA reinforcement learning training stack that enables production AI agents to learn from real user interactions. Built with UC Berkeley Sky Lab and Anyscale, the system reports a 2.81× end-to-end experiment-throughput gain over single-tenant RL training with no accuracy regression.

How C-LoRA Works

The approach, called Continuous Multi-LoRA Training (C-LoRA), maps each learning experiment to a dedicated LoRA adapter on a shared, always-running inference engine, according to MarkTechPost. Instead of the standard cycle of collecting data, training a new version, and shipping months later, C-LoRA lets agents update from live feedback as it arrives.

A coding agent can learn engineering patterns when developers correct its output. A support agent can improve ticket resolution when operators intervene on hard cases. The training code is fully open-sourced in the NovaSky-AI/SkyRL GitHub repository.

The throughput gains come primarily from inference. Using vLLM’s multi-LoRA support and the SGMV decode kernel, multiple adapters stay hot in GPU memory and process tokens from different experiments in the same batch. Training remains serialized across tenants for now, with one adapter active on the GPU at a time while others wait in pinned CPU memory.

The Numbers

Trajectory tested the stack on a single H200 node running Qwen3-4B-Instruct-2507, using GSM8K reframed as an agentic tool-use task where the model decides when to call a Calculator and a Final Answer tool, as MarkTechPost reported. At eight concurrent experiments, final experiment time hit 5,433 seconds, a 2.81× speedup. Eight concurrent experiments finished before three serial runs completed back-to-back. Every concurrency level reached above 90% reward accuracy by step 9.

The tradeoff is per-step latency. At eight concurrent adapters, mean step time rises from 191 seconds to 500 seconds. Most of that increase, roughly 77%, comes from rollout time. At two concurrent experiments, doubling the load adds only 15% rollout time.

Why Continual Learning Matters Now

Trajectory raised a $15 million seed round at a $115 million post-money valuation, led by Conviction with participation from Bessemer Venture Partners, Radical VC, and BoxGroup. Individual investors include Google DeepMind Chief Scientist Jeff Dean and Stanford professor Fei-Fei Li.

CEO Ronak Malde, who previously researched AI at Windsurf before joining Google DeepMind as part of its $2.4 billion acquisition, told WIRED that coding products like Cursor are already doing early versions of continual learning, using real interaction data to post-train and ship regular model improvements. He attributes coding AI’s rapid adoption partly to this feedback-driven iteration cycle. The other cofounders are Arjun Karanam, formerly an Apple Vision Pro AI researcher, and Michael Elabd, previously in Google DeepMind’s robotics division.

At the December 2025 NeurIPS conference, Turing Award winner Richard Sutton argued that continual learning is essential for building superintelligent agents, as KeepingUpWith.AI noted, lending institutional weight to a problem that has historically remained a lab exercise.

The Infrastructure Gap for Agent Teams

For teams running production agents, the gap between “model was trained” and “model learns from what just happened” is the single largest source of repeated failures. Users correct the same mistakes across sessions because the model never sees the correction. C-LoRA’s open-source release gives agent builders a concrete starting point: swap LoRA adapters on a warm engine, run multiple learning experiments concurrently, and feed production interactions back into training without taking the service offline. The constraint is hardware. The stack requires an 8× H100/H200 node and a Megatron build, which limits immediate adoption to teams with GPU access. But for those who have it, the question shifts from “can agents learn in production?” to “how fast?”