LinkedIn published the technical architecture behind its Cognitive Memory Agent (CMA), a platform layer designed to solve the core limitation of LLM-based agent systems: they forget everything between sessions. CMA gives agents persistent, structured memory that survives context windows and accumulates over time, according to LinkedIn’s engineering blog.
The system already powers LinkedIn’s Hiring Assistant, the company’s first production AI agent for recruiters.
Four Memory Layers
CMA organizes knowledge into four distinct tiers, each with different semantics and retrieval characteristics, as detailed in LinkedIn’s blog post:
Conversational memory stores compressed in-session state. Messages are indexed chronologically and by computed embeddings in a vector store, creating two parallel representations: a time-ordered log and a semantic index. Past dialogues are summarized periodically for context compression.
Episodic memory records timestamped interaction traces. When a recruiter archives a candidate for lacking specific machine learning skills, that signal persists. The next time the recruiter hires for a similar role, the agent adjusts outreach strategy based on those historical decisions.
Semantic memory aggregates structured knowledge and inferred preferences across sessions. Rather than requiring explicit user instructions (“I prefer concise InMails”), the system dynamically extracts implicit preferences from interaction patterns using developer-configured memory prompts.
Procedural memory captures invocation plans and reasoning traces, encoding learned workflows so agents improve task execution over time.
Shared Memory for Multi-Agent Coordination
In multi-agent systems, CMA functions as a coordination substrate rather than a per-agent context store. Multiple specialized agents (planning, reasoning, execution) access the same memory layer, which reduces state duplication and prevents conflicting actions across distributed workflows, according to InfoQ’s coverage.
“Good agentic AI isn’t stateless: it remembers, adapts, and compounds,” Karthik Ramgopal, Distinguished Engineer at LinkedIn, stated. “One of the key capabilities enabling this is memory that lives beyond context windows.”
Engineering Trade-offs
LinkedIn’s blog acknowledges that persistent memory introduces distributed systems challenges. Determining what to store, when to retrieve it, and how to handle staleness becomes central to correctness. The ingestion layer must interpret unstructured inputs, decide what information to extract, and manage storage timing. The retrieval orchestration layer infers user intent, dynamically retrieves across memory layers, and synthesizes responses using reasoning and planning rather than simple embedding-based recall.
Subhojit Banerjee, an MLOPS Data Engineer, noted on LinkedIn that “cache invalidation is one of the hardest problems in computer science,” highlighting that correctly identifying episode boundaries, staleness, and conflict resolution remains the central challenge.
The system is built on Couchbase and LinkedIn’s internal Espresso database for durable storage, with vector embeddings for semantic retrieval.
Memory as Infrastructure Primitive
CMA represents a specific architectural bet: memory should be a first-class infrastructure primitive with explicit read/write semantics and lifecycle management, not incidental context stuffed into prompts. LinkedIn’s earlier approach, a simple hierarchical key-value store, required every application to handle preference extraction and indexing manually, preventing broad adoption. CMA replaces that with an intelligent layer that handles extraction, storage, and retrieval without per-application custom integration.
For teams building multi-session agent systems, the takeaway is concrete: treating memory as prompt engineering (stuff the context window with history) does not scale. LinkedIn’s answer is to externalize it into dedicated infrastructure with its own retrieval, compaction, and consistency guarantees.
Published April 21, 2026.