
llama-nest
local-first memory sidecar for ollama
The Story
A few months ago I picked up an NVIDIA Jetson Orin to experiment with local inference. I wanted to run Ollama, play with different models, and see what local AI could actually do outside of cloud APIs.
The hardware was great. The problem was everything else.
Conversations were ephemeral. Context disappeared between sessions. When I switched from Llama to Mistral to Qwen, all the accumulated understanding vanished. Semantic memory was completely isolated. There was no elegant way to persist knowledge across workflows or models.
Local AI systems lacked continuity. And that felt like a solvable engineering problem.
What llama-nest Is
llama-nest is a lightweight memory and semantic graph sidecar for Ollama. It runs alongside your local models and provides persistent, structured context that survives sessions, model switches, and system restarts.
Think of it as a local database for AI memory. Not RAG. Not vector search. A graph-based persistence layer that maintains relationships, timelines, and entity state across any model you run.
Architecture
┌─────────────────────────────────────────────────┐
│ Your App │
└─────────────────────┬───────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ llama-nest sidecar │
│ ┌───────────┐ ┌───────────┐ ┌─────────────┐ │
│ │ Memory │ │ Semantic │ │ Timeline │ │
│ │ Store │ │ Graph │ │ State │ │
│ └───────────┘ └───────────┘ └─────────────┘ │
│ │
│ ┌───────────────────────────────────────────┐ │
│ │ Retrieval Policies │ │
│ └───────────────────────────────────────────┘ │
└─────────────────────┬───────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Ollama (any model) │
│ llama3 │ mistral │ qwen │ ... │
└─────────────────────────────────────────────────┘Project Goals
- 01Persistent semantic memory — Conversations and context that survive restarts
- 02Local vector and graph storage — SQLite + embedded vector store, no external dependencies
- 03Model-agnostic — Switch between Llama, Mistral, Qwen without losing context
- 04Inspectable memory timelines — See what your AI remembers and when it learned it
- 05Privacy-first — Everything local. No cloud. No telemetry. Your data stays yours.
- 06Simple deployment — Single binary or Docker container. No infrastructure.
- 07MCP integration — Expose memory as an MCP server for tool-using agents
Quick Start
# with docker
docker run -d -p 11435:11435 -v llama-nest:/data ghcr.io/neuralsurferbro/llama-nest# or from source
cargo install llama-nest && llama-nest servePoint your Ollama client at localhost:11435 instead of localhost:11434. llama-nest proxies requests, enriches context, and persists memory automatically.
Status
This is an experiment in local-first AI infrastructure. The project is being built openly as I learn. Expect breaking changes, incomplete features, and occasional moments of "why did I think that would work?"
If you're interested in local AI memory systems, context portability, or just want to see how this evolves, follow along on GitHub.