Tencent Cloud’s database team open-sourced TencentDB Agent Memory on Wednesday, releasing the full hierarchical memory engine under the MIT License on GitHub. The engine addresses a specific, expensive problem: agents running continuous multi-hour tasks accumulate so much context that token costs grow linearly and model performance degrades from information overload.
The benchmarks are concrete. When integrated as an OpenClaw plugin, the memory engine reduced token consumption by 61.38% on the WideSearch benchmark while improving task pass rates by 51.52%, according to BigGo Finance. On SWE-bench, running 50 consecutive tasks per session to simulate real production pressure, tokens dropped 33% and pass rates climbed nearly 10%. Personalized memory accuracy on the PersonaMem benchmark jumped from 48% to 76%.
How the Four-Tier Architecture Works
The engine replaces flat vector stores with a structured pyramid. L0 (Raw Dialogue) retains the complete record of every interaction. L1 (Atomic Memory) automatically extracts facts, preferences, constraints, and interim conclusions. L2 (Scene Induction) aggregates related memories by task type. L3 (User Profile) continuously distills information into stable long-term profiles.
The practical effect: an agent processing a current task pulls only from the refined upper layers instead of traversing the entire conversation history. According to Tencent’s repository documentation, the system uses a dual-layer storage strategy where bottom layers (facts, logs, traces) persist in databases for full-text retrieval, while top layers (personas, scenes, canvases) are stored as human-readable Markdown files for inspection.
Context Offloading and the Mermaid Canvas
Two mechanisms enable the compression. Context offloading moves heavyweight data, such as complete tool call results and intermediate files, to external storage while retaining only summaries and indexes in the active context window. The Mermaid canvas collapses complex task execution structures into navigable visual flowcharts, letting agents trace back and restore original information for any node on demand using indexed references.
This combination means original data is never lost. The system avoids irreversible summarization by maintaining deterministic paths from high-level abstractions back to ground-truth evidence, as documented in the GitHub README.
Token Economics at Production Scale
The timing matters. As agents move from experimental demos to production workloads spanning hours or days, token consumption becomes the dominant cost driver. The traditional approach of stuffing all historical interactions into the context window scales linearly with task length. For enterprise deployments running hundreds of concurrent agent sessions, that linear scaling translates directly to infrastructure spend.
Tencent’s contribution joins a broader infrastructure maturation pattern. BrowserAct shipped web access skills with 93% token reduction last week. TestMu launched agent-native testing frameworks. Now Tencent adds the memory layer. Each addresses a distinct bottleneck in making agents viable for sustained, complex workflows rather than single-turn interactions.
The MIT License and zero external API dependencies lower the adoption barrier. Any team running OpenClaw agents on long-horizon tasks can integrate the plugin without vendor lock-in or additional service costs.