Concepts

The Complete Guide to LLM Memory

March 20, 2026 · 12 min read

Memory ConceptsLLM ProvidersFuture

I have been thinking about memory for large language models for over a year now. Not the kind of thinking where you read a few papers and move on. The kind where you build something, break it, rebuild it, read neuroscience textbooks at 2 AM, and still feel like you are scratching the surface.

I use Claude the most in my day-to-day work, but I also test with ChatGPT, Gemini, DeepSeek, and Perplexity regularly. I need to understand how each of them handles memory because I am building a memory layer that sits underneath all of them. The more I tested, the more I realized: most people do not understand what "memory" even means in this context. And the LLM providers are not helping clarify it.

This article is my attempt to lay it all out. The core concepts, how the major players handle memory today, what memory layers actually are, and where I think this whole thing is heading. I will also explain why I built widemem and what makes it different. Fair warning: I have opinions.

1. WHAT IS MEMORY FOR AN LLM, REALLY?

When people say "memory" for an LLM, they usually mean one of four very different things. Cognitive science has been studying these distinctions in the human brain for decades, and the taxonomy maps surprisingly well to what we are trying to build for AI.

Working memory (the context window)

This is what every LLM has by default. The conversation you are having right now sits in the context window, and the model can reference anything in it. Think of it as the mental scratchpad you use when someone is talking to you. It is fast, it is always there, and it disappears the moment the conversation ends.

The problem is obvious: context windows have limits. GPT-4o gives you 128k tokens. Claude offers 200k. DeepSeek recently pushed to 1 million. But even a million tokens is not memory. It is a very long scratchpad. You still lose everything when the session ends.

Episodic memory (what happened)

This is the ability to recall specific events. "Last Tuesday, the user asked me to refactor the auth module." "Two weeks ago, they mentioned they were switching from PostgreSQL to SQLite." Episodic memory is time-stamped and contextual. It captures not just what was said, but when and in what situation.

Most LLM memory implementations today are primarily episodic. They store conversation snippets or extracted facts with timestamps. This is useful, but it is only one piece of the puzzle.

Semantic memory (what you know)

Facts and knowledge that are true independent of when you learned them. "The user is a software engineer." "They prefer TypeScript over JavaScript." "Their project uses Next.js 16." Semantic memory does not care about the conversation where these facts came up. It just knows them.

This is what most people actually want when they ask for "AI memory." They want the AI to know things about them without being reminded every session. But building a reliable semantic memory store is surprisingly hard. You need extraction (pulling facts from conversations), deduplication (not storing the same fact twice), contradiction resolution (what happens when the user moves cities?), and decay (when is a fact too old to be relevant?).

Procedural memory (how to do things)

Skills, habits, and learned patterns. "This user likes concise responses." "They always want tests before implementation." "When they say 'deploy,' they mean push to the staging branch first." Procedural memory is about behavior adaptation, not fact storage.

Almost nobody in the LLM memory space is working on procedural memory explicitly. It gets folded into "preferences" or "user profile," but the distinction matters. Knowing that someone is an engineer (semantic) is different from knowing how they like to work (procedural). The first informs what you say. The second informs how you say it.

2. HOW THE BIG PLAYERS HANDLE MEMORY

I have tested memory features across every major LLM I can get my hands on. Here is what each of them actually does, stripped of marketing language.

Estimated monthly active users (millions), early 2026. Claude and Perplexity do not publish MAU; figures are rough estimates.

OpenAI (ChatGPT)

ChatGPT has the most mature consumer memory feature. It works in two ways: you can explicitly tell it to remember something ("remember that I prefer dark mode"), or it automatically extracts facts from conversations over time. You can view, edit, and delete individual memories.

I think ChatGPT's memory is good for casual use. It remembers my name, my projects, my preferred coding style. But it has real limits. It summarizes rather than storing exact details. It has no concept of importance, so it treats "I like pizza" the same as "I have a peanut allergy." And there is no API. If you are building an application, you cannot use ChatGPT's memory. You are on your own.

Anthropic (Claude)

I use Claude daily, so I know its memory well. Claude has an editable memory summary per user, plus a Projects feature that gives you separate memory per workspace. Claude Code (the CLI tool) has a different approach: CLAUDE.md files and an auto-memory system that saves learnings to disk.

What I like about Claude's approach is that it is transparent. You can see and edit what it remembers. What I find limiting is that the memory is summarized and compressed. It loses detail over time. And like ChatGPT, there is no API for memory. The Projects feature helps, but it is manual. You are curating the memory yourself.

xAI (Grok)

Grok added memory more recently, and it is still in beta. The interesting thing about Grok is transparency: you can see exactly what it remembers, no summarization. But it is basic. No automatic extraction, no importance scoring, and not available in the EU/UK due to GDPR compliance gaps. I test with Grok occasionally but it is not part of my regular workflow.

DeepSeek

DeepSeek took a fundamentally different approach: instead of building a memory feature, they expanded the context window to 1 million tokens. Their bet is that if the scratchpad is big enough, you do not need a separate memory system.

I think this is a reasonable bet for certain use cases. If you are processing a single long document or having a marathon coding session, a massive context window works great. But it does not solve the cross-session problem. Close the tab and everything is gone. And research has shown that LLMs do not reliably process all evidence in very long contexts. They tend to focus on the beginning and end, missing things in the middle.

Perplexity

Perplexity launched memory in beta around April 2025. It remembers details across conversations for personalized answers. I use Perplexity mainly for research and search, and the memory is helpful there. It remembers what topics I have been exploring, what I care about. But it is search-focused, not general-purpose. Different tool, different memory needs.

Google Gemini

Gemini is the interesting outlier. It has the deepest integration (spanning Gmail, Docs, Sheets, the whole Workspace suite), and it is the only major provider that actually offers memory through an API via Vertex AI Memory Bank. If you are building on GCP, this is the only native option where you can programmatically access the memory layer.

Explicit = user-triggered saves. Auto = background extraction. API = developer access. Decay = time-based forgetting. YMYL = safety tier for critical facts.

The pattern is clear: every provider has added some form of memory, but almost none of them let developers build on it. They treat memory as a consumer feature, not infrastructure. This is the gap that memory layers exist to fill.

3. WHAT ARE MEMORY LAYERS AND WHY DO THEY EXIST?

A memory layer is a separate system that sits between your application and the LLM. It intercepts conversations, extracts important information, stores it persistently, and injects relevant context back into future conversations. Think of it as giving your LLM a hard drive. The context window is RAM. The memory layer is the disk.

The reason memory layers exist is simple: LLM providers do not solve the developer problem. If you are building a customer support bot, a health assistant, a coding agent, or any application that needs to remember things across sessions, you cannot rely on ChatGPT's built-in memory. You need something you control, something with an API, something that works regardless of which LLM you use underneath.

I thought about this a lot before building widemem. The question was not "should memory layers exist?" That was obvious. The question was "what should a memory layer actually do?" And here, the industry is far from consensus.

4. THE FIVE DIRECTIONS FOR MEMORY LAYERS

After cataloging every memory project I could find (see my complete landscape post), the approaches cluster into five distinct directions. Each one makes a different bet on what matters most.

Approximate distribution of architectural approaches across active memory layer projects, 2026.

Direction 1: Vector + similarity search

The most common approach. Convert facts into embeddings, store them in a vector database, and retrieve the most similar ones when needed. Fast, well-understood, and plenty of tooling available. The downside: similarity is not relevance. Two facts can be semantically similar but not both relevant. And vector search alone cannot detect contradictions. "I live in Boston" and "I live in San Francisco" will have high similarity (both are about where someone lives) but opposite meanings.

Direction 2: Knowledge graphs

Store memories as entities and relationships in a graph. "Alice works at Google" becomes an edge connecting the Alice node to the Google node. Powerful for capturing relationships between facts, and graph traversal can find connections that vector search misses. The tradeoff is complexity. You need a graph database, entity resolution, and relationship extraction. Zep/Graphiti and Cognee lead here.

Direction 3: Hybrid (vector + graph)

Combine both. Use vectors for fast similarity retrieval and a graph for relationship modeling. This is where the top projects (Mem0, Cognee) have converged. It is the most capable approach but also the heaviest. You are running two storage systems and need to keep them in sync.

Direction 4: Self-editing memory (LLM-as-memory-manager)

Give the LLM tools to read and write its own memory. The model decides what to store, what to update, and what to forget. Letta/MemGPT pioneered this. It is the most flexible approach because the LLM can handle nuance that rule-based systems miss. But every memory operation costs tokens, and the model can drift over time as it edits its own state. It is also hard to audit: you cannot easily explain why the model chose to remember X but forget Y.

Direction 5: SQL / structured storage

The contrarian bet: skip vectors entirely and use regular SQL databases. Memori (Gibson AI) is the main project here. If your infrastructure is already SQL-first and you do not want to add a vector database, this meets you where you are. The tradeoff is that you lose semantic similarity search, which is arguably the core capability of a memory layer.

5. WHAT WIDEMEM DOES AND WHY I BUILT IT

I built widemem because none of the existing options solved the problems I cared about most. I kept running into the same issues across every memory system I tried: they treat all facts as equally important, they do not know when a fact is stale, they cannot handle contradictions without multiple expensive API calls, and they have no concept of "this fact could kill someone if it is wrong."

Importance scoring and time decay

Every fact in widemem gets an importance score from 1 to 10. A peanut allergy gets a 9. "Had pizza for lunch" gets a 2. Low-importance facts decay over time and eventually fall below the retrieval threshold. High-importance facts persist. This seems obvious in hindsight, but most memory systems still treat "favorite color is blue" the same as "takes blood thinners daily."

Batch conflict resolution

When new facts come in, widemem checks them against existing memories for contradictions. But instead of making one API call per fact (which is what most systems do), it batches them. One LLM call resolves N facts against the existing store. The model decides for each: add, update, delete, or skip. This is not just a cost optimization. It also means the model sees the full picture, all the new facts together with all the relevant existing ones, and can make better decisions.

YMYL safety tier

This one I am most proud of and most worried about. YMYL stands for "Your Money or Your Life," borrowed from Google's search quality guidelines. Health, financial, and legal facts get special treatment: higher minimum importance (floor of 8.0), immunity from time decay, forced contradiction detection, and two-tier confidence scoring to avoid false positives.

I built this because I kept thinking about the use case where someone tells their AI assistant about a medication, and six months later the AI has "forgotten" because it decayed. That cannot happen. Not every fact is created equal, and a memory system that does not understand that is not safe for production.

Hierarchical retrieval

widemem has three retrieval tiers: individual facts, summaries (groups of related facts), and themes (high-level patterns). When a query comes in, the system automatically routes it to the right tier. "What is their email?" hits the fact tier. "Tell me about their work history" hits the summary tier. "What kind of person is this?" hits the theme tier. No second API call needed.

Honest uncertainty

Every search returns a confidence level: high, low, or none. If widemem does not have enough information to answer confidently, it says so. Three modes let you configure the behavior: strict (refuse to answer if unsure), helpful ("I do not have that, but here is what I do know"), or creative ("I can guess, but fair warning it might be wrong"). I think this is essential. A memory system that always returns an answer, even when it is not confident, is a hallucination factory.

Local-first, no lock-in

SQLite + FAISS out of the box. No API keys needed for storage. No cloud dependency. Your data stays on your machine by default. You can plug in OpenAI, Anthropic, Ollama, or Qdrant if you want, but none of them are required. I use Claude for the LLM calls in my own setup, but widemem works with any provider. This is deliberate. Memory is too personal to be locked into a vendor.

6. LOOKING INTO THE FUTURE

I spend a lot of time thinking about where this is all heading. Here are my honest predictions, some confident, some speculative.

Memory becomes infrastructure, not a feature

Right now, memory is a "nice to have" that some LLM providers include and most developers ignore. I think within two years, memory will be as fundamental as a database. Every serious AI application will have a memory layer, the same way every serious web application has a database. The question will not be "do we need memory?" but "which memory system do we use?"

Context windows will keep growing, but they are not the answer

DeepSeek hit 1 million tokens. Others will follow. Eventually we will see 10 million or even unlimited context windows. But context is not memory. Shoving everything into the prompt is expensive, slow, and unreliable for retrieval. The future is intelligent selection: a memory layer that knows what to inject into a 128k context window, not a 10M context window stuffed with everything.

Forgetting will become a first-class feature

The brain does not store everything. It actively forgets to stay functional. AI memory systems will need the same capability. Not just "delete old things" but intelligent pruning: understanding which facts are outdated, which ones contradict newer information, which ones were true once but are not anymore. I think the projects that solve forgetting well will win the space. This is a core bet I am making with widemem.

Personal AI agents will drive adoption

The biggest near-term use case for memory layers is personal AI agents. Not enterprise chatbots, not customer support bots, but the AI assistant that knows you. Knows your projects, your preferences, your health conditions, your schedule. This is what people actually want from AI, and it requires robust, private, local memory. Cloud-only memory solutions will not work here. People will not send their medical history to a third-party API.

Multi-modal memory is coming

Today, memory is mostly text. But LLMs are becoming multi-modal. Memory layers will need to store and retrieve images, audio, video, and structured data alongside text. "Remember this screenshot" or "remember what that chart looked like" will be real use cases. Nobody is doing this well yet. The infrastructure is not ready.

Regulation will force transparency

The EU is already pushing for AI transparency. I think we will see regulation specifically around AI memory: what is stored, how long it is kept, who can access it, and how to delete it. Systems that are opaque about what they remember (looking at most LLM providers) will face pressure to open up. Open-source and local-first approaches will have a natural advantage here.

WHERE I STAND

I think we are in the early innings of AI memory. What we have today, including widemem, is primitive compared to what will exist in three years. But the direction is clear: memory layers are becoming essential infrastructure, forgetting is as important as remembering, safety cannot be an afterthought, and the best memory system is one you can control and inspect.

I built widemem because I needed it for my own work and nobody was solving the problems I cared about. Importance scoring, batch conflict resolution, YMYL safety, honest uncertainty. These are not theoretical features. They come from real problems I hit while testing memory across ChatGPT, Claude, Gemini, and the open-source alternatives.

If you are building something that needs to remember, I think you should try at least two or three memory systems with your actual data. See what comes back. See what gets lost. See what gets confused. The results will surprise you, and they will inform your architecture decisions better than any benchmark.

The future of AI is not just smarter models. It is models that know you. And that requires memory that actually works.

READ RELATED

EVERY LLM MEMORY PROJECT, RATED

A complete directory of 25+ memory systems. Open-source, commercial, and built-in. Star counts, approaches, pricing, and honest notes.

I BUILT A MEMORY LAYER THAT FORGETS ONLY WHAT DOESN'T MATTER

Why forgetting is harder than remembering, and how importance scoring, decay, and batch conflict resolution work under the hood.

THE CONTRADICTION PROBLEM IN AI MEMORY

What happens when AI agents accumulate conflicting facts, and why vector similarity alone cannot detect it.