Protect proprietary code when using
Claude, Cursor, and AI coding tools.

Kiri is a local AI gateway that intercepts LLM prompts and strips protected source code before requests reach cloud models.

$ git clone https://github.com/PaoloMassignan/kiri
$ cd kiri && ./install/macos/install.sh

AI coding tools send your code to the cloud

AI coding tools are rapidly becoming standard in software development. But every time a developer asks Claude or Cursor to explain, refactor, or debug proprietary code, that implementation is sent to an external LLM provider.

Kiri intercepts those prompts locally and removes protected source code before they leave your machine. Your tools work exactly as before. Your implementation never reaches the cloud.

A local gateway, invisible to your workflow

Kiri runs on localhost and intercepts every outgoing LLM call. No changes to your tools. No cloud component. No data leaving your machine.

Kiri architecture — AI tools → Kiri local gateway → cloud LLM

All processing runs inside Docker on localhost. The real API key lives only inside the container as a Docker secret.  ·  Filter pipeline spec →

See it in action

A developer asks Claude about a protected function. Kiri intercepts, strips the implementation, and forwards a stub. Claude still gives useful advice — the implementation never left the machine.

Kiri demo — kiri add, claude prompt intercepted, kiri explain
kiri add scorer.py indexes the file and registers its symbols locally.
Claude Code sends a prompt naming a protected symbol — Kiri intercepts and strips the body.
kiri explain shows exactly what Claude received and what stayed on your machine.

Full control over what the LLM sees

Edit the stub description that replaces your implementation. Audit every filtered request with a complete local log.

Control what the LLM sees
When Kiri redacts a function, it replaces the body with a generated summary. You can inspect and override it — giving the LLM context without exposing your implementation.
kiri summary
$ kiri summary show calculate_risk_score

Symbol: calculate_risk_score
Summary: Computes weighted risk score
with adaptive normalisation.
Source: auto-generated

$ kiri summary set calculate_risk_score
"Computes a blended risk score —
no thresholds or weights visible."

Summary updated.
Claude sees this instead of your implementation.
Audit every request
Every filtered request is logged locally — decision, filter level, matched symbol, source file, timestamp. Full traceability, no cloud dependency.
kiri log
$ kiri log --tail 5

timestamp decision lvl matched
2026-05-11T20:08:27 REDACT L1 calculate_risk_score
2026-05-11T20:07:11 PASS — —
2026-05-11T20:06:43 REDACT L2 sim=0.94 · risk_engine.py
2026-05-11T20:05:22 BLOCK L3 extraction intent ← l3_classifier_enabled: true
2026-05-11T20:04:18 PASS — —

$ kiri explain

Why: symbol match → calculate_risk_score
Forwarded: function stub + summary
Kept local: full implementation

What Kiri protects — and what it does not

We document the threat model explicitly. Kiri is not a complete DLP solution — it is a focused, local-first tool for a specific class of risk.

Kiri protects against
  • Accidental code leakage — proprietary source pasted in prompts
  • Direct extraction — "explain / refactor this function" with full body
  • Semantic similarity leakage — renamed classes, partial snippets
  • Symbol-name exposure — protected class and function names in prompts
  • Explicit extraction intent — detectable and blockable by optional local classifier (requires l3_classifier_enabled: true)
Not fully covered
  • Manual prose rewriting — describing behaviour in plain English without naming symbols or pasting code
  • Screenshots or images of source code
  • Intentional malicious insiders with direct filesystem access
  • GitHub Copilot — hardcoded endpoint, cannot be redirected

Fail-open is intentional: if Kiri's embedding service or classifier is unavailable, requests pass through rather than blocking work. Symbol matching (L1) always remains active as the safety net. See ADR-004.

Tested against 10 real open-source projects

63 cases across 6 languages — easy (symbol-level) and hard (semantic, renamed code, prose). Two documented misses, openly included. Structural analogs in the grace zone (0.75–0.90) can produce false REDACT — see known limits.

1.000
F1 — easy tier (L1+L2)
40 cases · 0 false negatives
0.955
F1 — hard tier (L1+L2)
23 cases · 2 documented misses
View full results — 10 projects, 6 languages
ProjectLanguageEasy (L1+L2)Hard (L1+L2)Notes
flaskPython4/43/3
requestsPython4/42/2
fastapiPython4/42/2 via L2Renamed class still calls APIRouter()
expressJavaScript4/42/2
nestjsTypeScript4/43/3
spring-bootJava4/42/2
kafkaJava4/42/3Prose-only description — sim 0.747, below threshold
ginGo4/42/2 via L2Renamed struct still embeds RouterGroup
actix-webRust4/41/2Design-rationale question — sim 0.714, below threshold
aspnetcoreC#4/42/2 via L2Renamed class still calls new WebApplication()
Two documented misses (FN): kafka-h003 (sim=0.747 — generic messaging prose) and actix-web-h001 (sim=0.714 — design-rationale question on a Rust generic struct). Both score below the similarity threshold (L2); the embedding lacks enough signal from prose alone. This is the primary known gap.

Three cases show sim=0.000 — caught by L1 (symbol match) before L2 ran, because the renamed code still called the original library's API. This is a valid real-world detection pattern, not a test artifact.

Languages and tools

Kiri works with any tool that accepts a custom base URL and a Bearer API key. Tools with hardcoded endpoints cannot be redirected.

Indexed languages

Chunked with tree-sitter, embedded with all-MiniLM-L6-v2 (384-dim).

PythonJavaScript TypeScriptJava GoRustC#
AI coding tools
Claude Code (API key)Set ANTHROPIC_BASE_URL
Claude Code (Pro/Max)OAuth passthrough mode
CursorBase URL in Settings
WindsurfBase URL in Settings
Cline / Continue.devProvider URL in config
OpenCode / Codex CLISet OPENAI_BASE_URL
GitHub CopilotHardcoded endpoint

What Kiri does not catch

We document the gaps honestly. The benchmark results above reflect these limits in practice.

📝
Semantic reformulation — pure prose

A developer describing code behaviour in plain English, without naming symbols or pasting code, often scores below the L1 threshold. The embedding lacks enough signal. This is the primary documented gap (see benchmark FN cases above).

📏
Very short or generic snippets

Method bodies shorter than ~4 lines, or code using only common patterns (trivial getters, constructors with no distinctive logic), may not produce a vector distinctive enough to cross the threshold.

🔌
GitHub Copilot

Copilot uses a hardcoded endpoint that cannot be redirected to a custom gateway. There is no client-side workaround.

🔄
Structural analogs — false positives in the grace zone

Codebases that share vocabulary with a protected library (e.g. an internal HTTP client using the same connection pool patterns as requests.HTTPAdapter) can score 0.75–0.90 and receive an unwanted REDACT. The worst case is a REDACT, not a BLOCK — the developer gets a useful response but the implementation is replaced with a stub. This tradeoff is documented as acceptable; it can be reduced by narrowing what is protected.

⚠️
Fail-open by design

If the embedding service or Ollama classifier is unavailable, requests are passed through rather than blocked. L1 symbol matching always remains active as the safety net. This is an explicit design decision — documented in ADR-004.

Kiri is not a DLP system. It protects developer workflows without breaking them.