THE REAL COST OF AI MEMORY: EVERY PROVIDER, COMPARED

12 min read|
Cost AnalysisComparisonPricingInfrastructure
TLDR

AI memory costs range from $0/mo (widemem, LangMem, Cognee self-hosted) to $249/mo (Mem0 Pro) to $475/mo (Zep Flex Plus) before you count LLM API calls, embeddings, or infrastructure. To be fair: paid providers give you real value for that money. Mem0 and Zep handle infrastructure, scaling, uptime, and graph construction so you never think about it. You are paying for managed ops, not just features. The biggest hidden cost is LLM calls during ingestion: Mem0 with graph makes 5+ calls per memory add, Zep burns 600K+ tokens per conversation on graph construction. widemem uses 1 batched call. This post breaks down every cost layer with real numbers as of April 2026.

QUICK COMPARISON: MONTHLY COST AT 10K MEMORIES
ProviderPlan CostInfraLLM/EmbedTotal
Mem0 Cloud (Pro)$249$0included~$249
Zep Cloud (Flex+)$475$0included~$475
Zep Cloud (Flex)$25$0included~$25
Letta Pro$20$0BYOK~$20 + API
Cognee Dev$35$0included~$35
Self-hosted + Neo4j$0~$90+~$1~$91
Self-hosted + pgvector$0~$25~$1~$26
LangMem + Supabase$0$0-25~$1~$1-26
widemem (local)$0$0$0*$0
widemem (cloud LLM)$0$0~$0.15~$0.15
* Using Ollama + sentence-transformers locally. Cloud LLM row uses GPT-4o-mini.

1. THE FOUR COST LAYERS OF AI MEMORY

Every memory system has four cost layers. Some providers bundle them into a single price. Others leave you to assemble the pieces. Understanding the layers is the only way to compare honestly.

1
Platform fee
Monthly subscription to the provider
2
Infrastructure
Database, vector store, hosting
3
LLM API calls
Extraction, conflict resolution, graph
4
Embedding calls
Converting text to vectors

Cloud providers (Mem0, Zep, Letta, Cognee) bundle layers 1-2 and sometimes 3-4 into the subscription. Self-hosted options strip away layer 1 but hand you layers 2-4. Fully local setups (widemem with Ollama) eliminate all four.

The trap is layer 3. LLM API calls during memory ingestion are where costs hide. A system that makes 5 LLM calls per memory add at scale will cost more in API fees than the platform subscription itself.


2. PROVIDER PRICING: WHAT EACH CHARGES

MEM0

PlanPriceMemoriesRetrievals/mo
Hobby (Free)$010,0001,000
Starter$19/mo50,0005,000
Pro$249/moUnlimited50,000
EnterpriseCustomUnlimitedUnlimited

The critical detail: graph memory, Mem0's strongest feature (and the one that scores highest on benchmarks), requires the $249/mo Pro tier. The free and starter plans use flat memory only. If you want what Mem0 actually advertises in their benchmark results, you pay $249/mo minimum.

ZEP

PlanPriceCredits/moRate Limit
Free$01,000 episodesLower priority
Flex$25/mo20,000 credits600 req/min
Flex Plus$475/mo300,000 credits1,000 req/min
EnterpriseCustomCustomGuaranteed

Zep is more accessible than Mem0 at the low end. $25/mo gets you full feature access including graph capabilities. But the Flex Plus tier at $475/mo is the most expensive option on this list. Unused credits roll over for 60 days, which helps if usage is uneven.

One catch: Zep deprecated their self-hosted Community Edition. If you want to self-host, you use the Graphiti library directly with your own Neo4j instance. That means paying for Neo4j.

LETTA (FORMERLY MEMGPT)

PlanPriceAgentsNotes
Free$0LimitedRotating free models
Pro$20/mo20 agentsBYOK supported
Max Lite$100/mo50 agents5x higher limits
Max$200/moHigher20x higher limits
API Plan$20/mo basePer-agent$0.10/active agent/mo

Letta is the most flexible on pricing. BYOK (bring your own keys) on every plan means you control LLM costs directly. The API plan at $0.10 per active agent per month plus $0.00015 per second of tool execution is genuinely pay-as-you-go. Self-hosted is free and deploys to Railway for about $5-10/mo.

COGNEE

PlanPriceDocumentsAPI Calls
Free (self-hosted)$0UnlimitedUnlimited
Developer$35/mo1,000 docs / 1 GB10,000
Cloud (Team)$200/mo2,500 docs / 2 GB10,000
On-PremCustomCustomCustom

Cognee stands out because graph memory is available at every tier including free. The self-hosted stack uses SQLite + LanceDB + Kuzu, all open source with no external dependencies. Add-on pricing is clear: $35 per extra 1,000 documents.

LANGMEM

Free. MIT license. No API keys, no accounts, no monthly bills. It is a library, not a service. You pay for your own infrastructure (Postgres, embedding APIs, compute). Deep LangGraph integration. The lowest cost option if you are already in the LangChain ecosystem.

WIDEMEM

Free. Apache 2.0 license. SQLite + FAISS locally. No accounts, no API keys for storage, no cloud dependency. With Ollama for LLM extraction and sentence-transformers for embeddings, the total infrastructure cost is $0. Use cloud LLMs (GPT-4o-mini) and the API cost is about $0.15/mo at 10K memories.

MONTHLY PLATFORM COST COMPARISON
At moderate usage (10K memories, 50K retrievals)
Zep Flex+
$475/mo
Mem0 Pro
$249/mo
Cognee Team
$200/mo
Letta Max
$200/mo
Cognee Dev
$35/mo
Zep Flex
$25/mo
Letta Pro
$20/mo
Mem0 Starter
$19/mo
LangMem
$0
widemem
$0

3. THE HIDDEN COST: LLM CALLS PER MEMORY ADD

This is where most comparisons stop. They show the subscription price and move on. But the real cost of memory is in the LLM API calls that happen every time you add a memory.

LLM CALLS PER MEMORY ADD OPERATION
SystemLLM CallsWhat They Do
widemem1 (batched)Extract facts + resolve conflicts in single call
Mem0 (flat)2+Extraction + per-fact update decision
Mem0 (graph)5+Extraction + update + entity extraction + relationship gen + contradiction
Zep/GraphitiMultipleEntity extraction + edge comparison + contradiction detection
LangMemVariesUser configures pipeline
Cognee1-2Extraction + optional graph

At 10,000 memory adds per month with GPT-4o-mini ($0.15/1M input, $0.60/1M output), each call averaging 500 tokens in and 200 tokens out:

ESTIMATED LLM API COST AT 10K ADDS/MONTH
SystemCallsEst. API Cost
widemem (1 call)10,000~$0.15
Mem0 flat (2 calls)20,000~$0.30
Mem0 graph (5 calls)50,000~$0.75
Zep/Graphiti40,000+~$0.60+
widemem (Ollama)10,000$0.00

At 10K memories, the API costs look small. At 100K or 1M memories, they multiply linearly. Mem0 with graph at 1M adds/month: roughly $75 in API calls alone. widemem with Ollama at 1M adds/month: still $0.

Zep published data showing their graph construction consumes over 600,000 tokens per conversation. That is thorough. It is also expensive. The quality-cost tradeoff is real, and most comparisons ignore it.

THE 1-CALL ADVANTAGE

widemem batches all fact extraction and conflict resolution into a single LLM call. If a message contains 4 new facts and 2 contradict existing memories, that is still 1 API call. Mem0 with graph would make 5+ calls for the same operation. At scale, this compounds.


4. EMBEDDING COSTS: THE OTHER HIDDEN LAYER

Every memory system needs to convert text into vectors. Some do this via cloud APIs. Others run embeddings locally.

EMBEDDING MODEL PRICING (PER 1M TOKENS)
ModelStandardBatchDims
OpenAI text-embedding-3-small$0.02$0.011536
OpenAI text-embedding-3-large$0.13$0.0653072
Voyage AI voyage-3.5$0.06~$0.041024
Cohere Embed v4$0.12--1536
Cohere Embed v3$0.10--1024
sentence-transformers (local)$0.00$0.00384-1024

At 10K memories averaging 200 tokens each (2M tokens total), OpenAI's cheapest embedding costs $0.04. Cohere costs $0.24. Local sentence-transformers cost nothing and run on any machine with Python installed.

For most use cases, embedding cost is negligible. It only matters at extreme scale (millions of memories) or when using expensive models like text-embedding-3-large.


5. INFRASTRUCTURE: THE COST NOBODY MENTIONS

Cloud memory providers bundle infrastructure into their subscription. Self-hosted options require you to provision and pay for it yourself.

VECTOR DATABASES

ProviderFree TierPaid Starting At
FAISS (local)Unlimited$0 (runs in-process)
ChromaDB1M embeddingsUsage-based
Supabase pgvector500 MB$25/mo
Pinecone Serverless2 GB~$8/1M reads
Qdrant Cloud1 GB RAM~$150/mo (8 GB)
Weaviate Cloud14-day trial$45/mo minimum

GRAPH DATABASES (REQUIRED BY ZEP SELF-HOSTED)

ProviderFree TierPaid
Neo4j Aura Free50K nodes, 175K rels$0
Neo4j Aura Pro--$65/GB/mo
Neo4j Aura Business--$146/GB/mo

If you self-host Zep (now via the Graphiti library), you need Neo4j. The free tier works for prototyping (50K nodes). Production use starts at $65/GB/month. For a memory system with 100K+ entities and relationships, expect $65-200/mo for Neo4j alone.

THE ZERO-INFRASTRUCTURE OPTION

widemem, LangMem, and Cognee (self-hosted) can run with zero external infrastructure. widemem uses SQLite for metadata and FAISS for vectors, both running in-process. No database server, no cloud account, no connection string. pip install and go.


6. TOTAL COST: THREE REAL SCENARIOS

SCENARIO A: SOLO DEVELOPER / SIDE PROJECT

1,000 memories/month, 5,000 retrievals. Building a chatbot with persistent memory.

OptionPlatformInfraAPITotal
Mem0 Free$0$0$0$0 (1K retrieval limit)
Zep Free$0$0$0$0 (1K episode limit)
Letta Free$0$0$0$0 (limited models)
widemem + Ollama$0$0$0$0 (no limits)
LangMem + Supabase Free$0$0~$0.05~$0.05

At this scale, most options are free. The difference is limits. Mem0 caps you at 1,000 retrievals. Zep caps at 1,000 episodes. widemem and LangMem have no caps.

SCENARIO B: STARTUP / PRODUCTION APP

50,000 memories/month, 200,000 retrievals. Multi-user app with real traffic.

OptionPlatformInfraAPITotal
Mem0 Pro$249$0incl.~$249
Zep Flex Plus$475$0incl.~$475
Zep Flex + overage$25+$0incl.~$100
Letta Max Lite$100$0BYOK ~$5~$105
Cognee Team$200$0incl.~$200
Self-hosted + Supabase$0$25~$3~$28
widemem + GPT-4o-mini$0$0~$0.75~$1
widemem + Ollama$0$0$0$0

The spread is enormous. Zep Flex Plus at $475/mo vs widemem at $0. Even widemem with cloud LLM (GPT-4o-mini) runs under $1/mo because batched extraction keeps API calls low.

The catch: widemem at $0 means running Ollama on your own hardware. You need a machine with at least 8 GB RAM. A $5/mo VPS or your existing server works. Mem0 at $249 means zero infrastructure management. The tradeoff is cost vs convenience.

SCENARIO C: ENTERPRISE / HIGH VOLUME

500,000 memories/month, 2,000,000 retrievals. Enterprise deployment with strict requirements.

OptionPlatformInfraAPITotal
Mem0 EnterpriseCustom$0incl.$1,000+ (est.)
Zep EnterpriseCustom$0incl.$2,000+ (est.)
Self-hosted + Neo4j$0$250+~$40~$290
Self-hosted + pgvector$0$60+~$40~$100
widemem + dedicated GPU$0$60$0~$60

At enterprise scale, self-hosted always wins on cost. The question is whether you have the team to operate it. A Hetzner dedicated server with 64 GB RAM runs about $60/mo and can handle Ollama, widemem, and all your vector storage locally.

TOTAL MONTHLY COST: STARTUP SCENARIO (50K MEMORIES)
Platform + infrastructure + API costs combined
Zep Flex+
$475
Mem0 Pro
$249
Cognee Team
$200
Letta Max
$105
Zep Flex+over
$100
Supabase DIY
$28
widemem+API
$0.75
widemem local
$0

7. WHAT DO YOU GET FOR THE MONEY?

Cost alone is meaningless without knowing what each system delivers. Here is what your money buys at each tier.

FEATURE COMPARISON BY PROVIDER
FeatureMem0ZepLettaCogneewidemem
Graph memory$249/mo$25/moNoFreePlanned
Flat memoryFreeFreeFreeFreeFree
Importance scoringNoNoNoNoYes
Temporal decayNoNoNoNoYes
Contradiction detectionYesYesNoYesYes
YMYL safetyNoNoNoNoYes
Confidence scoringNoNoNoNoYes
Hierarchical memoryNoNoYesYesYes
Fully local optionNoNoYesYesYes
MCP serverYesNoNoNoYes

Mem0 and Zep lead on graph memory quality. Their benchmark scores are higher overall. But their highest-scoring features sit behind expensive tiers ($249/mo for Mem0 graph).

widemem leads on features that no one else offers: importance scoring, temporal decay, YMYL safety, and confidence modes. It also leads on multi-hop benchmark performance (56.54%, beating Mem0 at 51.15%). And it costs $0.

The honest answer: if you need the best possible overall accuracy and can afford $249/mo, Mem0 Pro with graph scores highest on benchmarks. If you need specific capabilities (importance, decay, YMYL, confidence) or cannot justify $249/mo, widemem gives you more features for less money.


8. HOW TO CHOOSE

CHOOSE MEM0 IF:

You need the highest overall benchmark accuracy, can afford $249/mo for graph, and want a managed service with no infrastructure to maintain. Best for teams that value accuracy over cost.

CHOOSE ZEP IF:

You want graph memory at a lower entry point ($25/mo), need credit-based billing that scales with usage, and value the Graphiti library for self-hosted options.

CHOOSE LETTA IF:

You are building agent-based applications, want BYOK control over LLM costs, and need per-agent billing. The most agent-native option.

CHOOSE WIDEMEM IF:

You need features no one else has (importance scoring, decay, YMYL, confidence), want $0 infrastructure cost, care about privacy (local-first), or need multi-hop reasoning (best-in-class at 56.54%). Best for developers who want control and cannot justify $249/mo for a managed service.

CHOOSE LANGMEM IF:

You are already in the LangChain/LangGraph ecosystem and want a library, not a service. Lowest friction if you use LangChain for everything else.


9. THE REAL QUESTION

"How much does AI memory cost?" is the wrong question. The right question is: "How much does AI memory cost per unit of value it delivers?"

A memory system that costs $249/mo but saves your users from repeating themselves every session might pay for itself in retention alone. A system that costs $0 but requires 40 hours of setup might cost more in engineering time than a year of Mem0 Pro.

The numbers in this post are the starting point. The real cost depends on your scale, your team, your tolerance for infrastructure management, and which features actually matter for your use case.

What is clear: the cost spread is 1000x between the cheapest and most expensive options, and many teams are paying for features they do not use. Know the layers. Do the math. Pick the system that fits.


SOURCES AND METHODOLOGY

All pricing confirmed from official pricing pages as of April 2026:

mem0.ai/pricing | getzep.com/pricing | docs.letta.com | cognee.ai/pricing

pinecone.io/pricing | qdrant.tech/pricing | neo4j.com (Aura pricing) | supabase.com/pricing

OpenAI embedding pricing | Voyage AI pricing | Cohere pricing

LLM call counts per operation: estimated from source code analysis and published documentation. Zep 600K+ token figure: vectorize.io comparison analysis.

widemem benchmark data: LoCoMo v2 results. Full methodology at widemem.ai/blog/context-windows.