SELF-HOSTING

Production deployment guide. widemem is built local-first, which makes it easy to run inside a VPC, an air-gapped network, or on regulated infrastructure.

Architecture you are deploying

A default widemem install is one Python process plus two on-disk artifacts:

There are zero background services, zero external API calls beyond your configured LLM provider, and zero telemetry. widemem does not phone home. If you run it with Ollama and sentence-transformers for the LLM and embedding sides, the whole stack is air-gap capable.

Production setup

1. Install into a pinned virtual environment

python3.10 -m venv /opt/widemem
source /opt/widemem/bin/activate
pip install widemem-ai==1.4.0

Pin the exact version. widemem follows semantic versioning but production should not chase minor releases automatically.

2. Choose a data directory you control

from widemem import WideMemory, MemoryConfig
from widemem.core.types import VectorStoreConfig

config = MemoryConfig(
    history_db_path="/var/lib/widemem/history.db",
    vector_store=VectorStoreConfig(
        provider="faiss",
        path="/var/lib/widemem/faiss_index",
    ),
)

memory = WideMemory(config=config)

Put the data directory on a volume you back up. Set filesystem permissions so only your service user can read or write it. See the security page for the full scope statement.

3. Run widemem as a service

widemem is a library, not a daemon. You embed it inside whatever service holds your agent logic (FastAPI, Flask, a job worker, your own stdio bridge). The simplest production pattern is to wrap it in a FastAPI app and run that under a process supervisor (systemd, Supervisor, Kubernetes).

# /opt/widemem/service.py
from contextlib import asynccontextmanager
from fastapi import FastAPI, Depends
from widemem import WideMemory, MemoryConfig

memory: WideMemory

@asynccontextmanager
async def lifespan(app: FastAPI):
    global memory
    memory = WideMemory(config=MemoryConfig(
        history_db_path="/var/lib/widemem/history.db",
    ))
    yield
    memory.close()

app = FastAPI(lifespan=lifespan)

@app.post("/add")
def add(text: str, user_id: str):
    return memory.add(text, user_id=user_id)

@app.get("/search")
def search(q: str, user_id: str):
    return memory.search(q, user_id=user_id)

4. systemd unit (example)

[Unit]
Description=widemem service
After=network.target

[Service]
Type=simple
User=widemem
Group=widemem
WorkingDirectory=/opt/widemem
Environment="OPENAI_API_KEY=..."
ExecStart=/opt/widemem/bin/uvicorn service:app --host 127.0.0.1 --port 9000
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

Backup and restore

widemem ships with export_json() and import_json() for portable backup and restore.

# Nightly backup cron
python -c "
from widemem import WideMemory
with WideMemory() as m:
    with open('/backup/widemem-$(date +%Y%m%d).json', 'w') as f:
        f.write(m.export_json())
"

For a full point-in-time snapshot, back up the data directory (/var/lib/widemem/) with your existing filesystem backup tool. The SQLite files are WAL-safe if you use a consistent snapshot (LVM, ZFS, or stopping the service briefly).

Scaling guidance

FAISS in the default configuration handles roughly 100k to 1M memories per process before latency becomes noticeable on commodity hardware. Beyond that, swap the vector store to Qdrant.

from widemem.core.types import VectorStoreConfig

config = MemoryConfig(
    vector_store=VectorStoreConfig(
        provider="qdrant",
        host="qdrant.internal",
        port=6333,
        collection_name="widemem",
    ),
)

Qdrant can run in your VPC or as a dedicated cluster. widemem speaks to it over gRPC or HTTP. Nothing else in the pipeline changes.

Operations

Monitoring

The library does not ship a Prometheus exporter yet. In the meantime, wrap your add and search calls with your service's existing metrics instrumentation. Latency, error rate, and memory count are the three signals that matter.

Cost control

LLM calls happen during add() for fact extraction and conflict resolution. A rough budget: ingesting 1,000 turns of conversation with GPT-4o-mini costs about $0.40 to $0.60. Cache embeddings when you can. Use the add_batch() call when ingesting bulk data.

Upgrading

Read the CHANGELOG before bumping versions. The v1.x line maintains storage compatibility. A future v2 will ship a migration script when the schema needs to change.

DOCKER IMAGE (COMING)

A tested Docker image and docker-compose.yml (one service, or widemem plus Ollama for fully-local) is the next shipment. Until then, a pip install into a venv plus a systemd unit is the production path.

Compliance stance

widemem is Apache 2.0, source-available, local-first, and phones no home. It is not itself SOC2 or HIPAA certified because it is a library, not a service. The compliance posture of the deployment is yours. The library gives you the building blocks: local storage, configurable retention via ttl_days, full audit trail via get_history(), and the ability to run the LLM and embedding sides locally so no data ever leaves your perimeter.

For teams deploying into regulated environments (healthcare, finance, government) that want a support contract and dedicated help with the compliance review, the enterprise page is where to start.