ClawMem

Quickstart

Set up ClawMem as persistent memory for AI coding agents in under 5 minutes. By the end you’ll have hooks injecting context on every prompt and an MCP server for agent-initiated retrieval.

Prerequisites

Bun v1.0+ — install via curl -fsSL https://bun.sh/install | bash, not snap (snap Bun has stdin restrictions that break hooks)
A GPU for local inference (default models need ~4GB VRAM; SOTA upgrade needs ~10GB). Or use cloud embedding
Claude Code, OpenClaw, or any MCP-compatible client

Install

# Via npm (recommended)
npm install -g clawmem

# If you use Bun as your package manager:
# bun add -g clawmem

# From source
git clone https://github.com/yoloshii/clawmem.git ~/clawmem
cd ~/clawmem && bun install
ln -sf ~/clawmem/bin/clawmem ~/.bun/bin/clawmem

Bootstrap a vault

The fastest path — one command to init, index, embed, set up hooks, and register MCP:

clawmem bootstrap ~/notes --name notes

This creates a vault at ~/.cache/clawmem/index.sqlite, indexes all .md files under ~/notes, embeds them for vector search, installs Claude Code hooks, and registers the MCP server.

Or step by step

# 1. Initialize the vault
clawmem init

# 2. Add a collection (directory of markdown files)
clawmem collection add ~/notes --name notes

# 3. Index and embed
clawmem update --embed

# 4. Set up Claude Code hooks (automatic context injection)
clawmem setup hooks

# 5. Register the MCP server (agent-initiated tools)
clawmem setup mcp

Start GPU services

ClawMem uses three llama-server instances for best performance. All three models also auto-download and run locally via node-llama-cpp if no server is running — using Metal on Apple Silicon, Vulkan where available, or CPU as last resort. With GPU acceleration (Metal/Vulkan), in-process inference is fast for these small models; on CPU-only systems it is significantly slower. If you’re using GPU servers, run them via systemd services to prevent silent fallback on server crash.

# Embedding (recommended for performance — falls back to in-process if no server)
llama-server -m embeddinggemma-300M-Q8_0.gguf \
  --embeddings --port 8088 --host 0.0.0.0 -ngl 99 -c 2048 --batch-size 2048

# LLM — query expansion (falls back to in-process if unavailable)
llama-server -m qmd-query-expansion-1.7B-q4_k_m.gguf \
  --port 8089 --host 0.0.0.0 -ngl 99 -c 4096

# Reranker (falls back to in-process if unavailable)
llama-server -m Qwen3-Reranker-0.6B-Q8_0.gguf \
  --reranking --port 8090 --host 0.0.0.0 -ngl 99 -c 2048 --batch-size 512

SOTA upgrade (12GB+ GPU): Replace embedding with zembed-1-Q4_K_M (2560d, -b 2048 -ub 2048) and reranker with zerank-2-Q4_K_M (-b 2048 -ub 2048). See GPU services for details.

See GPU services guide for systemd setup and remote GPU configuration.

No GPU? See cloud embedding for OpenAI, Voyage, Jina, or Cohere alternatives.

Verify

clawmem doctor    # Full health check
clawmem status    # Quick index status
bun test                # Run test suite

Build out your collections

The bootstrap command indexed one directory. ClawMem gets more useful as you add more content — the retrieval pipeline surfaces better results from a richer corpus.

# Add your project docs
clawmem collection add ~/projects/myapp --name myapp

# Add research notes, decision records, domain references
clawmem collection add ~/research --name research

# Re-index and embed the new collections
clawmem update --embed

A practical starting point: index every .md file in each project you regularly work on with agents. Include memory files, research outputs, decision records, learnings, project notes, and domain references. The more relevant context in the vault, the more the context-surfacing hook has to work with on each prompt.

Customize index patterns

Each collection has a pattern field that controls which files get indexed (default: **/*.md). Edit ~/.config/clawmem/config.yaml to customize:

collections:
  notes:
    path: ~/notes
    pattern: "**/*.md"              # default — all markdown recursively

  project:
    path: ~/projects/myapp
    pattern: "**/*.md"              # all markdown in the project

  research:
    path: ~/research
    pattern: "**/*.{md,txt}"        # markdown and text files

  obsidian:
    path: ~/vault
    pattern: "**/*.md"              # works with Obsidian, Logseq, Foam, Dendron

After editing the config, re-index and embed:

clawmem update --embed

The watcher service picks up new files automatically, but adding or changing collections requires a manual update. Certain directories are always excluded regardless of pattern: .git, node_modules, dist, build, vendor, and others listed in architecture.

Code stays out of the vault

ClawMem indexes prose, not code. Source files (.ts, .py, .go, etc.) are excluded by design — BM25 and embedding models trained on natural language perform poorly on code syntax. Capture technical decisions and architecture rationale in markdown instead. Use a dedicated code search tool for code retrieval.

Structure documents for better scoring

Documents with headings, lists, and decision keywords score higher in retrieval. Frontmatter adds a 0.2 quality score bonus. The quality multiplier ranges from 0.7x (penalty for flat text) to 1.3x (boost for well-structured docs) — so structure directly affects how often your content gets surfaced.

What happens next

Once set up, ClawMem works automatically:

Every prompt — the context-surfacing hook searches your vault and injects relevant context as <vault-context> XML
Every response — the decision-extractor and handoff-generator hooks capture decisions and session summaries
On demand — the agent can call MCP tools like memory_retrieve, query, or intent_search when hooks don’t surface enough

No agent configuration needed. The hooks are invisible to the agent — it just sees richer context.

Next steps

Architecture — understand how vaults, collections, and scoring work
Setup Hooks — customize which hooks are installed
OpenClaw Plugin — use ClawMem as OpenClaw’s context engine
Multi-Vault — separate memory domains for different projects

This site is open source. Improve this page.