THE REAL COST OF AI MEMORY: EVERY PROVIDER, COMPARED

12 min read|

Cost AnalysisComparisonPricingInfrastructure

TLDR

AI memory costs range from $0/mo (widemem, LangMem, Cognee self-hosted) to $249/mo (Mem0 Pro) to $475/mo (Zep Flex Plus) before you count LLM API calls, embeddings, or infrastructure. Paid providers give you real value for that money: managed infrastructure, scaling, uptime, and (in some cases) graph construction so you never think about it. The biggest hidden cost across the field is LLM calls during ingestion: graph-construction approaches make several calls per memory add, which compounds at scale. widemem uses a single batched extraction call by design, with the trade-off that it does not build a graph. This post breaks down every cost layer with real numbers as of April 2026.

QUICK COMPARISON: MONTHLY COST AT 10K MEMORIES

Provider	Plan Cost	Infra	LLM/Embed	Total
Mem0 Cloud (Pro)	$249	$0	included	~$249
Zep Cloud (Flex+)	$475	$0	included	~$475
Zep Cloud (Flex)	$25	$0	included	~$25
Letta Pro	$20	$0	BYOK	~$20 + API
Cognee Dev	$35	$0	included	~$35
Self-hosted + Neo4j	$0	~$90+	~$1	~$91
Self-hosted + pgvector	$0	~$25	~$1	~$26
LangMem + Supabase	$0	$0-25	~$1	~$1-26
widemem (local)	$0	$0	$0*	$0
widemem (cloud LLM)	$0	$0	~$0.15	~$0.15

* Using Ollama + sentence-transformers locally. Cloud LLM row uses GPT-4o-mini.

1. THE FOUR COST LAYERS OF AI MEMORY

Every memory system has four cost layers. Some providers bundle them into a single price. Others leave you to assemble the pieces. Understanding the layers is the only way to compare honestly.

Platform fee

Monthly subscription to the provider

Infrastructure

Database, vector store, hosting

LLM API calls

Extraction, conflict resolution, graph

Embedding calls

Converting text to vectors

Cloud providers (Mem0, Zep, Letta, Cognee) bundle layers 1-2 and sometimes 3-4 into the subscription. Self-hosted options strip away layer 1 but hand you layers 2-4. Fully local setups (widemem with Ollama) eliminate all four.

The trap is layer 3. LLM API calls during memory ingestion are where costs hide. A system that makes 5 LLM calls per memory add at scale will cost more in API fees than the platform subscription itself.

2. PROVIDER PRICING: WHAT EACH CHARGES

MEM0

Plan	Price	Memories	Retrievals/mo
Hobby (Free)	$0	10,000	1,000
Starter	$19/mo	50,000	5,000
Pro	$249/mo	Unlimited	50,000
Enterprise	Custom	Unlimited	Unlimited

The critical detail: graph memory, Mem0's strongest feature (and the one that scores highest on benchmarks), requires the $249/mo Pro tier. The free and starter plans use flat memory only. If you want what Mem0 actually advertises in their benchmark results, you pay $249/mo minimum.

ZEP

Plan	Price	Credits/mo	Rate Limit
Free	$0	1,000 episodes	Lower priority
Flex	$25/mo	20,000 credits	600 req/min
Flex Plus	$475/mo	300,000 credits	1,000 req/min
Enterprise	Custom	Custom	Guaranteed

Zep is more accessible than Mem0 at the low end. $25/mo gets you full feature access including graph capabilities. But the Flex Plus tier at $475/mo is the most expensive option on this list. Unused credits roll over for 60 days, which helps if usage is uneven.

One catch: Zep deprecated their self-hosted Community Edition. If you want to self-host, you use the Graphiti library directly with your own Neo4j instance. That means paying for Neo4j.

LETTA (FORMERLY MEMGPT)

Plan	Price	Agents	Notes
Free	$0	Limited	Rotating free models
Pro	$20/mo	20 agents	BYOK supported
Max Lite	$100/mo	50 agents	5x higher limits
Max	$200/mo	Higher	20x higher limits
API Plan	$20/mo base	Per-agent	$0.10/active agent/mo

Letta is the most flexible on pricing. BYOK (bring your own keys) on every plan means you control LLM costs directly. The API plan at $0.10 per active agent per month plus $0.00015 per second of tool execution is genuinely pay-as-you-go. Self-hosted is free and deploys to Railway for about $5-10/mo.

COGNEE

Plan	Price	Documents	API Calls
Free (self-hosted)	$0	Unlimited	Unlimited
Developer	$35/mo	1,000 docs / 1 GB	10,000
Cloud (Team)	$200/mo	2,500 docs / 2 GB	10,000
On-Prem	Custom	Custom	Custom

Cognee stands out because graph memory is available at every tier including free. The self-hosted stack uses SQLite + LanceDB + Kuzu, all open source with no external dependencies. Add-on pricing is clear: $35 per extra 1,000 documents.

LANGMEM

Free. MIT license. No API keys, no accounts, no monthly bills. It is a library, not a service. You pay for your own infrastructure (Postgres, embedding APIs, compute). Deep LangGraph integration. The lowest cost option if you are already in the LangChain ecosystem.

WIDEMEM

Free. Apache 2.0 license. SQLite + FAISS locally. No accounts, no API keys for storage, no cloud dependency. With Ollama for LLM extraction and sentence-transformers for embeddings, the total infrastructure cost is $0. Use cloud LLMs (GPT-4o-mini) and the API cost is about $0.15/mo at 10K memories.

MONTHLY PLATFORM COST COMPARISON

At moderate usage (10K memories, 50K retrievals)

Zep Flex+

$475/mo

Mem0 Pro

$249/mo

Cognee Team

$200/mo

Letta Max

$200/mo

Cognee Dev

$35/mo

Zep Flex

$25/mo

Letta Pro

$20/mo

Mem0 Starter

$19/mo

LangMem

widemem

3. THE HIDDEN COST: LLM CALLS PER MEMORY ADD

This is where most comparisons stop. They show the subscription price and move on. But the real cost of memory is in the LLM API calls that happen every time you add a memory.

LLM CALLS PER MEMORY ADD OPERATION

System	LLM Calls	What They Do
widemem	1 (batched)	Extract facts + resolve conflicts in single call
Mem0 (flat)	2+	Extraction + per-fact update decision
Mem0 (graph)	5+	Extraction + update + entity extraction + relationship gen + contradiction
Zep/Graphiti	Multiple	Entity extraction + edge comparison + contradiction detection
LangMem	Varies	User configures pipeline
Cognee	1-2	Extraction + optional graph

At 10,000 memory adds per month with GPT-4o-mini ($0.15/1M input, $0.60/1M output), each call averaging 500 tokens in and 200 tokens out:

ESTIMATED LLM API COST AT 10K ADDS/MONTH

System	Calls	Est. API Cost
widemem (1 call)	10,000	~$0.15
Mem0 flat (2 calls)	20,000	~$0.30
Mem0 graph (5 calls)	50,000	~$0.75
Zep/Graphiti	40,000+	~$0.60+
widemem (Ollama)	10,000	$0.00

At 10K memories, the API costs look small. At 100K or 1M memories, they multiply linearly. Mem0 with graph at 1M adds/month: roughly $75 in API calls alone. widemem with Ollama at 1M adds/month: still $0.

Zep published data showing their graph construction consumes over 600,000 tokens per conversation. That is thorough. It is also expensive. The quality-cost tradeoff is real, and most comparisons ignore it.

THE 1-CALL ADVANTAGE

widemem batches all fact extraction and conflict resolution into a single LLM call. If a message contains 4 new facts and 2 contradict existing memories, that is still 1 API call. Mem0 with graph would make 5+ calls for the same operation. At scale, this compounds.

4. EMBEDDING COSTS: THE OTHER HIDDEN LAYER

Every memory system needs to convert text into vectors. Some do this via cloud APIs. Others run embeddings locally.

EMBEDDING MODEL PRICING (PER 1M TOKENS)

Model	Standard	Batch	Dims
OpenAI text-embedding-3-small	$0.02	$0.01	1536
OpenAI text-embedding-3-large	$0.13	$0.065	3072
Voyage AI voyage-3.5	$0.06	~$0.04	1024
Cohere Embed v4	$0.12	--	1536
Cohere Embed v3	$0.10	--	1024
sentence-transformers (local)	$0.00	$0.00	384-1024

At 10K memories averaging 200 tokens each (2M tokens total), OpenAI's cheapest embedding costs $0.04. Cohere costs $0.24. Local sentence-transformers cost nothing and run on any machine with Python installed.

For most use cases, embedding cost is negligible. It only matters at extreme scale (millions of memories) or when using expensive models like text-embedding-3-large.

5. INFRASTRUCTURE: THE COST NOBODY MENTIONS

Cloud memory providers bundle infrastructure into their subscription. Self-hosted options require you to provision and pay for it yourself.

VECTOR DATABASES

Provider	Free Tier	Paid Starting At
FAISS (local)	Unlimited	$0 (runs in-process)
ChromaDB	1M embeddings	Usage-based
Supabase pgvector	500 MB	$25/mo
Pinecone Serverless	2 GB	~$8/1M reads
Qdrant Cloud	1 GB RAM	~$150/mo (8 GB)
Weaviate Cloud	14-day trial	$45/mo minimum

GRAPH DATABASES (REQUIRED BY ZEP SELF-HOSTED)

Provider	Free Tier	Paid
Neo4j Aura Free	50K nodes, 175K rels	$0
Neo4j Aura Pro	--	$65/GB/mo
Neo4j Aura Business	--	$146/GB/mo

If you self-host Zep (now via the Graphiti library), you need Neo4j. The free tier works for prototyping (50K nodes). Production use starts at $65/GB/month. For a memory system with 100K+ entities and relationships, expect $65-200/mo for Neo4j alone.

THE ZERO-INFRASTRUCTURE OPTION

widemem, LangMem, and Cognee (self-hosted) can run with zero external infrastructure. widemem uses SQLite for metadata and FAISS for vectors, both running in-process. No database server, no cloud account, no connection string. pip install and go.

6. TOTAL COST: THREE REAL SCENARIOS

SCENARIO A: SOLO DEVELOPER / SIDE PROJECT

1,000 memories/month, 5,000 retrievals. Building a chatbot with persistent memory.

Option	Platform	Infra	API	Total
Mem0 Free	$0	$0	$0	$0 (1K retrieval limit)
Zep Free	$0	$0	$0	$0 (1K episode limit)
Letta Free	$0	$0	$0	$0 (limited models)
widemem + Ollama	$0	$0	$0	$0 (no limits)
LangMem + Supabase Free	$0	$0	~$0.05	~$0.05

At this scale, most options are free. The difference is limits. Mem0 caps you at 1,000 retrievals. Zep caps at 1,000 episodes. widemem and LangMem have no caps.

SCENARIO B: STARTUP / PRODUCTION APP

50,000 memories/month, 200,000 retrievals. Multi-user app with real traffic.

Option	Platform	Infra	API	Total
Mem0 Pro	$249	$0	incl.	~$249
Zep Flex Plus	$475	$0	incl.	~$475
Zep Flex + overage	$25+	$0	incl.	~$100
Letta Max Lite	$100	$0	BYOK ~$5	~$105
Cognee Team	$200	$0	incl.	~$200
Self-hosted + Supabase	$0	$25	~$3	~$28
widemem + GPT-4o-mini	$0	$0	~$0.75	~$1
widemem + Ollama	$0	$0	$0	$0

The spread is enormous. Zep Flex Plus at $475/mo vs widemem at $0. Even widemem with cloud LLM (GPT-4o-mini) runs under $1/mo because batched extraction keeps API calls low.

The catch: widemem at $0 means running Ollama on your own hardware. You need a machine with at least 8 GB RAM. A $5/mo VPS or your existing server works. Mem0 at $249 means zero infrastructure management. The tradeoff is cost vs convenience.

SCENARIO C: ENTERPRISE / HIGH VOLUME

500,000 memories/month, 2,000,000 retrievals. Enterprise deployment with strict requirements.

Option	Platform	Infra	API	Total
Mem0 Enterprise	Custom	$0	incl.	$1,000+ (est.)
Zep Enterprise	Custom	$0	incl.	$2,000+ (est.)
Self-hosted + Neo4j	$0	$250+	~$40	~$290
Self-hosted + pgvector	$0	$60+	~$40	~$100
widemem + dedicated GPU	$0	$60	$0	~$60

At enterprise scale, self-hosted always wins on cost. The question is whether you have the team to operate it. A Hetzner dedicated server with 64 GB RAM runs about $60/mo and can handle Ollama, widemem, and all your vector storage locally.

TOTAL MONTHLY COST: STARTUP SCENARIO (50K MEMORIES)

Platform + infrastructure + API costs combined

Zep Flex+

$475

Mem0 Pro

$249

Cognee Team

$200

Letta Max

$105

Zep Flex+over

$100

Supabase DIY

$28

widemem+API

$0.75

widemem local

7. WHAT DO YOU GET FOR THE MONEY?

Cost alone is meaningless without knowing what each system delivers. Here is what your money buys at each tier.

FEATURE COMPARISON BY PROVIDER

Feature	Mem0	Zep	Letta	Cognee	widemem
Graph memory	$249/mo	$25/mo	No	Free	Planned
Flat memory	Free	Free	Free	Free	Free
Importance scoring	No	No	No	No	Yes
Temporal decay	No	No	No	No	Yes
Contradiction detection	Yes	Yes	No	Yes	Yes
YMYL safety	No	No	No	No	Yes
Confidence scoring	No	No	No	No	Yes
Hierarchical memory	No	No	Yes	Yes	Yes
Fully local option	No	No	Yes	Yes	Yes
MCP server	Yes	No	No	No	Yes

Mem0 and Zep lead on graph memory quality. Their benchmark scores are higher overall. But their highest-scoring features sit behind expensive tiers ($249/mo for Mem0 graph).

widemem leads on features that no one else offers: importance scoring, temporal decay, YMYL safety, and confidence modes. It also leads on multi-hop benchmark performance (56.54%, beating Mem0 at 51.15%). And it costs $0.

The honest answer: if you need the best possible overall accuracy and can afford $249/mo, Mem0 Pro with graph scores highest on benchmarks. If you need specific capabilities (importance, decay, YMYL, confidence) or cannot justify $249/mo, widemem gives you more features for less money.

8. HOW TO CHOOSE

CHOOSE MEM0 IF:

You need the highest overall benchmark accuracy, can afford $249/mo for graph, and want a managed service with no infrastructure to maintain. Best for teams that value accuracy over cost.

CHOOSE ZEP IF:

You want graph memory at a lower entry point ($25/mo), need credit-based billing that scales with usage, and value the Graphiti library for self-hosted options.

CHOOSE LETTA IF:

You are building agent-based applications, want BYOK control over LLM costs, and need per-agent billing. The most agent-native option.

CHOOSE WIDEMEM IF:

You need features no one else has (importance scoring, decay, YMYL, confidence), want $0 infrastructure cost, care about privacy (local-first), or need multi-hop reasoning (best-in-class at 56.54%). Best for developers who want control and cannot justify $249/mo for a managed service.

CHOOSE LANGMEM IF:

You are already in the LangChain/LangGraph ecosystem and want a library, not a service. Lowest friction if you use LangChain for everything else.

9. THE REAL QUESTION

"How much does AI memory cost?" is the wrong question. The right question is: "How much does AI memory cost per unit of value it delivers?"

A memory system that costs $249/mo but saves your users from repeating themselves every session might pay for itself in retention alone. A system that costs $0 but requires 40 hours of setup might cost more in engineering time than a year of Mem0 Pro.

The numbers in this post are the starting point. The real cost depends on your scale, your team, your tolerance for infrastructure management, and which features actually matter for your use case.

What is clear: the cost spread is 1000x between the cheapest and most expensive options, and many teams are paying for features they do not use. Know the layers. Do the math. Pick the system that fits.

SOURCES AND METHODOLOGY

All pricing confirmed from official pricing pages as of April 2026:

mem0.ai/pricing | getzep.com/pricing | docs.letta.com | cognee.ai/pricing

pinecone.io/pricing | qdrant.tech/pricing | neo4j.com (Aura pricing) | supabase.com/pricing

OpenAI embedding pricing | Voyage AI pricing | Cohere pricing

LLM call counts per operation: estimated from source code analysis and published documentation. Zep 600K+ token figure: vectorize.io comparison analysis.

widemem benchmark data: LoCoMo v2 results. Full methodology at widemem.ai/blog/context-windows.

READ RELATED

WHY CONTEXT WINDOWS AREN'T MEMORY (AND WHY IT MATTERS)

128K, 1M, 10M tokens. Still not memory. Three research papers explain why.

EVERY LLM MEMORY PROJECT, RATED

A directory of open-source, commercial, and LLM provider memory solutions. What each solves and how to choose.

THE COMPLETE GUIDE TO LLM MEMORY

A deep dive into how memory works for LLMs. The core concepts, provider comparison, and where this is all heading.