Ritesh Mishra.dev

The Story

A few months ago I picked up an NVIDIA Jetson Orin to experiment with local inference. I wanted to run Ollama, play with different models, and see what local AI could actually do outside of cloud APIs.

The hardware was great. The problem was everything else.

Conversations were ephemeral. Context disappeared between sessions. When I switched from Llama to Mistral to Qwen, all the accumulated understanding vanished. Semantic memory was completely isolated. There was no elegant way to persist knowledge across workflows or models.

Local AI systems lacked continuity. And that felt like a solvable engineering problem.

What llama-nest Is

llama-nest is a lightweight memory and semantic graph sidecar for Ollama. It runs alongside your local models and provides persistent, structured context that survives sessions, model switches, and system restarts.

Think of it as a local database for AI memory. Not RAG. Not vector search. A graph-based persistence layer that maintains relationships, timelines, and entity state across any model you run.

Architecture

┌─────────────────────────────────────────────────┐ │ Your App │ └─────────────────────┬───────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────┐ │ llama-nest sidecar │ │ ┌───────────┐ ┌───────────┐ ┌─────────────┐ │ │ │ Memory │ │ Semantic │ │ Timeline │ │ │ │ Store │ │ Graph │ │ State │ │ │ └───────────┘ └───────────┘ └─────────────┘ │ │ │ │ ┌───────────────────────────────────────────┐ │ │ │ Retrieval Policies │ │ │ └───────────────────────────────────────────┘ │ └─────────────────────┬───────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────┐ │ Ollama (any model) │ │ llama3 │ mistral │ qwen │ ... │ └─────────────────────────────────────────────────┘

Project Goals

Persistent semantic memory — Conversations and context that survive restarts

Local vector and graph storage — SQLite + embedded vector store, no external dependencies

Model-agnostic — Switch between Llama, Mistral, Qwen without losing context

Inspectable memory timelines — See what your AI remembers and when it learned it

Privacy-first — Everything local. No cloud. No telemetry. Your data stays yours.

Simple deployment — Single binary or Docker container. No infrastructure.

MCP integration — Expose memory as an MCP server for tool-using agents

Quick Start

# with docker

docker run -d -p 11435:11435 -v llama-nest:/data ghcr.io/neuralsurferbro/llama-nest

# or from source

cargo install llama-nest && llama-nest serve

Point your Ollama client at localhost:11435 instead of localhost:11434. llama-nest proxies requests, enriches context, and persists memory automatically.

Status

This is an experiment in local-first AI infrastructure. The project is being built openly as I learn. Expect breaking changes, incomplete features, and occasional moments of "why did I think that would work?"

If you're interested in local AI memory systems, context portability, or just want to see how this evolves, follow along on GitHub.