Kiri is a local AI gateway that intercepts LLM prompts and strips protected source code before requests reach cloud models.
AI coding tools are rapidly becoming standard in software development. But every time a developer asks Claude or Cursor to explain, refactor, or debug proprietary code, that implementation is sent to an external LLM provider.
Kiri intercepts those prompts locally and removes protected source code before they leave your machine. Your tools work exactly as before. Your implementation never reaches the cloud.
Kiri runs on localhost and intercepts every outgoing LLM call. No changes to your tools. No cloud component. No data leaving your machine.
All processing runs inside Docker on localhost. The real API key lives only inside the container as a Docker secret. · Filter pipeline spec →
A developer asks Claude about a protected function. Kiri intercepts, strips the implementation, and forwards a stub. Claude still gives useful advice — the implementation never left the machine.
kiri add scorer.py indexes the file and registers its symbols locally.kiri explain shows exactly what Claude received and what stayed on your machine.Edit the stub description that replaces your implementation. Audit every filtered request with a complete local log.
We document the threat model explicitly. Kiri is not a complete DLP solution — it is a focused, local-first tool for a specific class of risk.
l3_classifier_enabled: true)Fail-open is intentional: if Kiri's embedding service or classifier is unavailable, requests pass through rather than blocking work. Symbol matching (L1) always remains active as the safety net. See ADR-004.
63 cases across 6 languages — easy (symbol-level) and hard (semantic, renamed code, prose). Two documented misses, openly included. Structural analogs in the grace zone (0.75–0.90) can produce false REDACT — see known limits.
| Project | Language | Easy (L1+L2) | Hard (L1+L2) | Notes |
|---|---|---|---|---|
| flask | Python | 4/4 | 3/3 | |
| requests | Python | 4/4 | 2/2 | |
| fastapi | Python | 4/4 | 2/2 via L2 | Renamed class still calls APIRouter() |
| express | JavaScript | 4/4 | 2/2 | |
| nestjs | TypeScript | 4/4 | 3/3 | |
| spring-boot | Java | 4/4 | 2/2 | |
| kafka | Java | 4/4 | 2/3 | Prose-only description — sim 0.747, below threshold |
| gin | Go | 4/4 | 2/2 via L2 | Renamed struct still embeds RouterGroup |
| actix-web | Rust | 4/4 | 1/2 | Design-rationale question — sim 0.714, below threshold |
| aspnetcore | C# | 4/4 | 2/2 via L2 | Renamed class still calls new WebApplication() |
Kiri works with any tool that accepts a custom base URL and a Bearer API key. Tools with hardcoded endpoints cannot be redirected.
Chunked with tree-sitter, embedded with all-MiniLM-L6-v2 (384-dim).
| Claude Code (API key) | ✅ | Set ANTHROPIC_BASE_URL |
| Claude Code (Pro/Max) | ✅ | OAuth passthrough mode |
| Cursor | ✅ | Base URL in Settings |
| Windsurf | ✅ | Base URL in Settings |
| Cline / Continue.dev | ✅ | Provider URL in config |
| OpenCode / Codex CLI | ✅ | Set OPENAI_BASE_URL |
| GitHub Copilot | ❌ | Hardcoded endpoint |
We document the gaps honestly. The benchmark results above reflect these limits in practice.
A developer describing code behaviour in plain English, without naming symbols or pasting code, often scores below the L1 threshold. The embedding lacks enough signal. This is the primary documented gap (see benchmark FN cases above).
Method bodies shorter than ~4 lines, or code using only common patterns (trivial getters, constructors with no distinctive logic), may not produce a vector distinctive enough to cross the threshold.
Copilot uses a hardcoded endpoint that cannot be redirected to a custom gateway. There is no client-side workaround.
Codebases that share vocabulary with a protected library (e.g. an internal HTTP client using the same connection pool patterns as requests.HTTPAdapter) can score 0.75–0.90 and receive an unwanted REDACT. The worst case is a REDACT, not a BLOCK — the developer gets a useful response but the implementation is replaced with a stub. This tradeoff is documented as acceptable; it can be reduced by narrowing what is protected.
If the embedding service or Ollama classifier is unavailable, requests are passed through rather than blocked. L1 symbol matching always remains active as the safety net. This is an explicit design decision — documented in ADR-004.
Kiri is not a DLP system. It protects developer workflows without breaking them.