THE REAL COST OF AI MEMORY: EVERY PROVIDER, COMPARED
AI memory costs range from $0/mo (widemem, LangMem, Cognee self-hosted) to $249/mo (Mem0 Pro) to $475/mo (Zep Flex Plus) before you count LLM API calls, embeddings, or infrastructure. To be fair: paid providers give you real value for that money. Mem0 and Zep handle infrastructure, scaling, uptime, and graph construction so you never think about it. You are paying for managed ops, not just features. The biggest hidden cost is LLM calls during ingestion: Mem0 with graph makes 5+ calls per memory add, Zep burns 600K+ tokens per conversation on graph construction. widemem uses 1 batched call. This post breaks down every cost layer with real numbers as of April 2026.
| Provider | Plan Cost | Infra | LLM/Embed | Total |
|---|---|---|---|---|
| Mem0 Cloud (Pro) | $249 | $0 | included | ~$249 |
| Zep Cloud (Flex+) | $475 | $0 | included | ~$475 |
| Zep Cloud (Flex) | $25 | $0 | included | ~$25 |
| Letta Pro | $20 | $0 | BYOK | ~$20 + API |
| Cognee Dev | $35 | $0 | included | ~$35 |
| Self-hosted + Neo4j | $0 | ~$90+ | ~$1 | ~$91 |
| Self-hosted + pgvector | $0 | ~$25 | ~$1 | ~$26 |
| LangMem + Supabase | $0 | $0-25 | ~$1 | ~$1-26 |
| widemem (local) | $0 | $0 | $0* | $0 |
| widemem (cloud LLM) | $0 | $0 | ~$0.15 | ~$0.15 |
1. THE FOUR COST LAYERS OF AI MEMORY
Every memory system has four cost layers. Some providers bundle them into a single price. Others leave you to assemble the pieces. Understanding the layers is the only way to compare honestly.
Cloud providers (Mem0, Zep, Letta, Cognee) bundle layers 1-2 and sometimes 3-4 into the subscription. Self-hosted options strip away layer 1 but hand you layers 2-4. Fully local setups (widemem with Ollama) eliminate all four.
The trap is layer 3. LLM API calls during memory ingestion are where costs hide. A system that makes 5 LLM calls per memory add at scale will cost more in API fees than the platform subscription itself.
2. PROVIDER PRICING: WHAT EACH CHARGES
MEM0
| Plan | Price | Memories | Retrievals/mo |
|---|---|---|---|
| Hobby (Free) | $0 | 10,000 | 1,000 |
| Starter | $19/mo | 50,000 | 5,000 |
| Pro | $249/mo | Unlimited | 50,000 |
| Enterprise | Custom | Unlimited | Unlimited |
The critical detail: graph memory, Mem0's strongest feature (and the one that scores highest on benchmarks), requires the $249/mo Pro tier. The free and starter plans use flat memory only. If you want what Mem0 actually advertises in their benchmark results, you pay $249/mo minimum.
ZEP
| Plan | Price | Credits/mo | Rate Limit |
|---|---|---|---|
| Free | $0 | 1,000 episodes | Lower priority |
| Flex | $25/mo | 20,000 credits | 600 req/min |
| Flex Plus | $475/mo | 300,000 credits | 1,000 req/min |
| Enterprise | Custom | Custom | Guaranteed |
Zep is more accessible than Mem0 at the low end. $25/mo gets you full feature access including graph capabilities. But the Flex Plus tier at $475/mo is the most expensive option on this list. Unused credits roll over for 60 days, which helps if usage is uneven.
One catch: Zep deprecated their self-hosted Community Edition. If you want to self-host, you use the Graphiti library directly with your own Neo4j instance. That means paying for Neo4j.
LETTA (FORMERLY MEMGPT)
| Plan | Price | Agents | Notes |
|---|---|---|---|
| Free | $0 | Limited | Rotating free models |
| Pro | $20/mo | 20 agents | BYOK supported |
| Max Lite | $100/mo | 50 agents | 5x higher limits |
| Max | $200/mo | Higher | 20x higher limits |
| API Plan | $20/mo base | Per-agent | $0.10/active agent/mo |
Letta is the most flexible on pricing. BYOK (bring your own keys) on every plan means you control LLM costs directly. The API plan at $0.10 per active agent per month plus $0.00015 per second of tool execution is genuinely pay-as-you-go. Self-hosted is free and deploys to Railway for about $5-10/mo.
COGNEE
| Plan | Price | Documents | API Calls |
|---|---|---|---|
| Free (self-hosted) | $0 | Unlimited | Unlimited |
| Developer | $35/mo | 1,000 docs / 1 GB | 10,000 |
| Cloud (Team) | $200/mo | 2,500 docs / 2 GB | 10,000 |
| On-Prem | Custom | Custom | Custom |
Cognee stands out because graph memory is available at every tier including free. The self-hosted stack uses SQLite + LanceDB + Kuzu, all open source with no external dependencies. Add-on pricing is clear: $35 per extra 1,000 documents.
LANGMEM
Free. MIT license. No API keys, no accounts, no monthly bills. It is a library, not a service. You pay for your own infrastructure (Postgres, embedding APIs, compute). Deep LangGraph integration. The lowest cost option if you are already in the LangChain ecosystem.
WIDEMEM
Free. Apache 2.0 license. SQLite + FAISS locally. No accounts, no API keys for storage, no cloud dependency. With Ollama for LLM extraction and sentence-transformers for embeddings, the total infrastructure cost is $0. Use cloud LLMs (GPT-4o-mini) and the API cost is about $0.15/mo at 10K memories.
3. THE HIDDEN COST: LLM CALLS PER MEMORY ADD
This is where most comparisons stop. They show the subscription price and move on. But the real cost of memory is in the LLM API calls that happen every time you add a memory.
| System | LLM Calls | What They Do |
|---|---|---|
| widemem | 1 (batched) | Extract facts + resolve conflicts in single call |
| Mem0 (flat) | 2+ | Extraction + per-fact update decision |
| Mem0 (graph) | 5+ | Extraction + update + entity extraction + relationship gen + contradiction |
| Zep/Graphiti | Multiple | Entity extraction + edge comparison + contradiction detection |
| LangMem | Varies | User configures pipeline |
| Cognee | 1-2 | Extraction + optional graph |
At 10,000 memory adds per month with GPT-4o-mini ($0.15/1M input, $0.60/1M output), each call averaging 500 tokens in and 200 tokens out:
| System | Calls | Est. API Cost |
|---|---|---|
| widemem (1 call) | 10,000 | ~$0.15 |
| Mem0 flat (2 calls) | 20,000 | ~$0.30 |
| Mem0 graph (5 calls) | 50,000 | ~$0.75 |
| Zep/Graphiti | 40,000+ | ~$0.60+ |
| widemem (Ollama) | 10,000 | $0.00 |
At 10K memories, the API costs look small. At 100K or 1M memories, they multiply linearly. Mem0 with graph at 1M adds/month: roughly $75 in API calls alone. widemem with Ollama at 1M adds/month: still $0.
Zep published data showing their graph construction consumes over 600,000 tokens per conversation. That is thorough. It is also expensive. The quality-cost tradeoff is real, and most comparisons ignore it.
widemem batches all fact extraction and conflict resolution into a single LLM call. If a message contains 4 new facts and 2 contradict existing memories, that is still 1 API call. Mem0 with graph would make 5+ calls for the same operation. At scale, this compounds.
4. EMBEDDING COSTS: THE OTHER HIDDEN LAYER
Every memory system needs to convert text into vectors. Some do this via cloud APIs. Others run embeddings locally.
| Model | Standard | Batch | Dims |
|---|---|---|---|
| OpenAI text-embedding-3-small | $0.02 | $0.01 | 1536 |
| OpenAI text-embedding-3-large | $0.13 | $0.065 | 3072 |
| Voyage AI voyage-3.5 | $0.06 | ~$0.04 | 1024 |
| Cohere Embed v4 | $0.12 | -- | 1536 |
| Cohere Embed v3 | $0.10 | -- | 1024 |
| sentence-transformers (local) | $0.00 | $0.00 | 384-1024 |
At 10K memories averaging 200 tokens each (2M tokens total), OpenAI's cheapest embedding costs $0.04. Cohere costs $0.24. Local sentence-transformers cost nothing and run on any machine with Python installed.
For most use cases, embedding cost is negligible. It only matters at extreme scale (millions of memories) or when using expensive models like text-embedding-3-large.
5. INFRASTRUCTURE: THE COST NOBODY MENTIONS
Cloud memory providers bundle infrastructure into their subscription. Self-hosted options require you to provision and pay for it yourself.
VECTOR DATABASES
| Provider | Free Tier | Paid Starting At |
|---|---|---|
| FAISS (local) | Unlimited | $0 (runs in-process) |
| ChromaDB | 1M embeddings | Usage-based |
| Supabase pgvector | 500 MB | $25/mo |
| Pinecone Serverless | 2 GB | ~$8/1M reads |
| Qdrant Cloud | 1 GB RAM | ~$150/mo (8 GB) |
| Weaviate Cloud | 14-day trial | $45/mo minimum |
GRAPH DATABASES (REQUIRED BY ZEP SELF-HOSTED)
| Provider | Free Tier | Paid |
|---|---|---|
| Neo4j Aura Free | 50K nodes, 175K rels | $0 |
| Neo4j Aura Pro | -- | $65/GB/mo |
| Neo4j Aura Business | -- | $146/GB/mo |
If you self-host Zep (now via the Graphiti library), you need Neo4j. The free tier works for prototyping (50K nodes). Production use starts at $65/GB/month. For a memory system with 100K+ entities and relationships, expect $65-200/mo for Neo4j alone.
THE ZERO-INFRASTRUCTURE OPTION
widemem, LangMem, and Cognee (self-hosted) can run with zero external infrastructure. widemem uses SQLite for metadata and FAISS for vectors, both running in-process. No database server, no cloud account, no connection string. pip install and go.
6. TOTAL COST: THREE REAL SCENARIOS
SCENARIO A: SOLO DEVELOPER / SIDE PROJECT
1,000 memories/month, 5,000 retrievals. Building a chatbot with persistent memory.
| Option | Platform | Infra | API | Total |
|---|---|---|---|---|
| Mem0 Free | $0 | $0 | $0 | $0 (1K retrieval limit) |
| Zep Free | $0 | $0 | $0 | $0 (1K episode limit) |
| Letta Free | $0 | $0 | $0 | $0 (limited models) |
| widemem + Ollama | $0 | $0 | $0 | $0 (no limits) |
| LangMem + Supabase Free | $0 | $0 | ~$0.05 | ~$0.05 |
At this scale, most options are free. The difference is limits. Mem0 caps you at 1,000 retrievals. Zep caps at 1,000 episodes. widemem and LangMem have no caps.
SCENARIO B: STARTUP / PRODUCTION APP
50,000 memories/month, 200,000 retrievals. Multi-user app with real traffic.
| Option | Platform | Infra | API | Total |
|---|---|---|---|---|
| Mem0 Pro | $249 | $0 | incl. | ~$249 |
| Zep Flex Plus | $475 | $0 | incl. | ~$475 |
| Zep Flex + overage | $25+ | $0 | incl. | ~$100 |
| Letta Max Lite | $100 | $0 | BYOK ~$5 | ~$105 |
| Cognee Team | $200 | $0 | incl. | ~$200 |
| Self-hosted + Supabase | $0 | $25 | ~$3 | ~$28 |
| widemem + GPT-4o-mini | $0 | $0 | ~$0.75 | ~$1 |
| widemem + Ollama | $0 | $0 | $0 | $0 |
The spread is enormous. Zep Flex Plus at $475/mo vs widemem at $0. Even widemem with cloud LLM (GPT-4o-mini) runs under $1/mo because batched extraction keeps API calls low.
The catch: widemem at $0 means running Ollama on your own hardware. You need a machine with at least 8 GB RAM. A $5/mo VPS or your existing server works. Mem0 at $249 means zero infrastructure management. The tradeoff is cost vs convenience.
SCENARIO C: ENTERPRISE / HIGH VOLUME
500,000 memories/month, 2,000,000 retrievals. Enterprise deployment with strict requirements.
| Option | Platform | Infra | API | Total |
|---|---|---|---|---|
| Mem0 Enterprise | Custom | $0 | incl. | $1,000+ (est.) |
| Zep Enterprise | Custom | $0 | incl. | $2,000+ (est.) |
| Self-hosted + Neo4j | $0 | $250+ | ~$40 | ~$290 |
| Self-hosted + pgvector | $0 | $60+ | ~$40 | ~$100 |
| widemem + dedicated GPU | $0 | $60 | $0 | ~$60 |
At enterprise scale, self-hosted always wins on cost. The question is whether you have the team to operate it. A Hetzner dedicated server with 64 GB RAM runs about $60/mo and can handle Ollama, widemem, and all your vector storage locally.
7. WHAT DO YOU GET FOR THE MONEY?
Cost alone is meaningless without knowing what each system delivers. Here is what your money buys at each tier.
| Feature | Mem0 | Zep | Letta | Cognee | widemem |
|---|---|---|---|---|---|
| Graph memory | $249/mo | $25/mo | No | Free | Planned |
| Flat memory | Free | Free | Free | Free | Free |
| Importance scoring | No | No | No | No | Yes |
| Temporal decay | No | No | No | No | Yes |
| Contradiction detection | Yes | Yes | No | Yes | Yes |
| YMYL safety | No | No | No | No | Yes |
| Confidence scoring | No | No | No | No | Yes |
| Hierarchical memory | No | No | Yes | Yes | Yes |
| Fully local option | No | No | Yes | Yes | Yes |
| MCP server | Yes | No | No | No | Yes |
Mem0 and Zep lead on graph memory quality. Their benchmark scores are higher overall. But their highest-scoring features sit behind expensive tiers ($249/mo for Mem0 graph).
widemem leads on features that no one else offers: importance scoring, temporal decay, YMYL safety, and confidence modes. It also leads on multi-hop benchmark performance (56.54%, beating Mem0 at 51.15%). And it costs $0.
The honest answer: if you need the best possible overall accuracy and can afford $249/mo, Mem0 Pro with graph scores highest on benchmarks. If you need specific capabilities (importance, decay, YMYL, confidence) or cannot justify $249/mo, widemem gives you more features for less money.
8. HOW TO CHOOSE
CHOOSE MEM0 IF:
You need the highest overall benchmark accuracy, can afford $249/mo for graph, and want a managed service with no infrastructure to maintain. Best for teams that value accuracy over cost.
CHOOSE ZEP IF:
You want graph memory at a lower entry point ($25/mo), need credit-based billing that scales with usage, and value the Graphiti library for self-hosted options.
CHOOSE LETTA IF:
You are building agent-based applications, want BYOK control over LLM costs, and need per-agent billing. The most agent-native option.
CHOOSE WIDEMEM IF:
You need features no one else has (importance scoring, decay, YMYL, confidence), want $0 infrastructure cost, care about privacy (local-first), or need multi-hop reasoning (best-in-class at 56.54%). Best for developers who want control and cannot justify $249/mo for a managed service.
CHOOSE LANGMEM IF:
You are already in the LangChain/LangGraph ecosystem and want a library, not a service. Lowest friction if you use LangChain for everything else.
9. THE REAL QUESTION
"How much does AI memory cost?" is the wrong question. The right question is: "How much does AI memory cost per unit of value it delivers?"
A memory system that costs $249/mo but saves your users from repeating themselves every session might pay for itself in retention alone. A system that costs $0 but requires 40 hours of setup might cost more in engineering time than a year of Mem0 Pro.
The numbers in this post are the starting point. The real cost depends on your scale, your team, your tolerance for infrastructure management, and which features actually matter for your use case.
What is clear: the cost spread is 1000x between the cheapest and most expensive options, and many teams are paying for features they do not use. Know the layers. Do the math. Pick the system that fits.
SOURCES AND METHODOLOGY
All pricing confirmed from official pricing pages as of April 2026:
mem0.ai/pricing | getzep.com/pricing | docs.letta.com | cognee.ai/pricing
pinecone.io/pricing | qdrant.tech/pricing | neo4j.com (Aura pricing) | supabase.com/pricing
OpenAI embedding pricing | Voyage AI pricing | Cohere pricing
LLM call counts per operation: estimated from source code analysis and published documentation. Zep 600K+ token figure: vectorize.io comparison analysis.
widemem benchmark data: LoCoMo v2 results. Full methodology at widemem.ai/blog/context-windows.
READ RELATED
WHY CONTEXT WINDOWS AREN'T MEMORY (AND WHY IT MATTERS)
128K, 1M, 10M tokens. Still not memory. Three research papers explain why.
EVERY LLM MEMORY PROJECT, RATED
A directory of open-source, commercial, and LLM provider memory solutions. What each solves and how to choose.
THE COMPLETE GUIDE TO LLM MEMORY
A deep dive into how memory works for LLMs. The core concepts, provider comparison, and where this is all heading.