Back to Data
code-graph-rag-mcp
Semantic code search with natural language Code similarity and duplicate detection Impact analysis for changes Refactoring suggestions with AI Hotspot analysis and complexity metrics Cross-language relationship mapping High-Performance Architecture 100+ files/second parsing throughput <100ms query response time Multi-agent coordination with resource management Hardware-accelerated vector search (optional)
Last updated: 1/27/2026
README
# Code Graph RAG MCP Server
[](https://opensource.org/licenses/MIT)
[](https://nodejs.org/)
[Sponsor https://accelerator.slider-ai.ru/ ](https://t.me/SliderQuery)
**Advanced Multi-Language Code Analysis with Semantic Intelligence**
A powerful [Model Context Protocol](https://github.com/modelcontextprotocol) server that creates intelligent graph representations of your codebase with comprehensive semantic analysis capabilities.
**π 11 Languages Supported** | **β‘ 5.5x Faster** | **π Semantic Search** | **π 26 MCP Methods**
---
## π **Quick Start**
### Installation
### LATEST VERSION ONLY FROM GitHub Realeases
### do not use npm repository any more
```bash
# Install globally
npm install -g ./er77-code-graph-rag-mcp-2.7.12.tgz
code-graph-rag-mcp --version
```
### Claude Desktop Integration
```bash
# Quick setup (recommended)
npx @modelcontextprotocol/inspector add code-graph-rag \
--command "npx" \
--args "@er77/code-graph-rag-mcp /path/to/your/codebase"
```
or
```
#
claude mcp add-json code-graph-rag ' {
"command": "npx",
"args": ["@er77/code-graph-rag-mcp", "/_work_fodler"],
"env": {
"MCP_TIMEOUT": "80000"
}
}
```
**Manual setup**: Add to Claude Desktop config β [See detailed instructions](docs/guides/CLAUDE_INTEGRATION.md)
### Gemini CLI Integration
```bash
# Example
gemini mcp add-json code-graph-rag '{
"command": "npx",
"args": ["@er77/code-graph-rag-mcp", "/path/to/your/codebase"]
}'
```
### Codex CLI Integration
```bash
# Recommended: add a *global* MCP server entry (works from any project folder)
codex mcp remove code-graph-rag # optional cleanup
codex mcp add code-graph-rag -- code-graph-rag-mcp
# Or point Codex directly at a local dev build (no npm/npx required)
codex mcp remove code-graph-rag # optional cleanup
codex mcp add code-graph-rag -- node /absolute/path/to/code-graph-rag-mcp/dist/index.js
```
**Multi-codebase support**: Analyze multiple projects simultaneously β [Multi-Codebase Setup Guide](docs/guides/MULTI_CODEBASE_SETUP.md)
### Installation Guide (All Clients)
- NPM: `npm install -g @er77/code-graph-rag-mcp`
- Run server locally: `code-graph-rag-mcp [directory]`
- Claude: use Inspector (above) or see [Quick Start](#-quick-start)
- Gemini: configure via `gemini mcp add-json ...` (above)
- Codex: configure via `codex mcp add ...` (above)
---
## π **Performance**
**5.5x faster than Native Claude tools** with comprehensive testing results:
| **Metric** | **Native Claude** | **MCP CodeGraph** | **Improvement** |
|------------|-------------------|-------------------|-----------------|
| Execution Time | 55.84s | <10s | **5.5x faster** |
| Memory Usage | Process-heavy | 65MB | **Optimized** |
| Features | Basic patterns | 26 methods | **Comprehensive** |
| Accuracy | Pattern-based | Semantic | **Superior** |
---
## π **Key Features**
### **π¬ Advanced Analysis Tools (26 MCP Methods)**
| Feature | Description | Use Case |
|---------|-------------|----------|
| **Semantic Search** | Natural language code search | "Find authentication functions" |
| **Code Similarity** | Duplicate & clone detection | Identify refactoring opportunities |
| **JSCPD Clone Scan** | JSCPD-based copy/paste detection without embeddings | Targeted duplicate sweeps |
| **Impact Analysis** | Change impact prediction | Assess modification risks |
| **AI Refactoring** | Intelligent code suggestions | Improve code quality |
| **Hotspot Analysis** | Complexity & coupling metrics | Find problem areas |
| **Cross-Language** | Multi-language relationships | Polyglot codebases |
| **Graph Health** | Database diagnostics | `get_graph_health` |
| **Version Info** | Server version & runtime details | `get_version` |
| **Safe Reset** | Clean reindexing | `reset_graph`, `clean_index` |
| **Batched Indexing** | Resumable indexing with progress (Codex-safe for big repos) | `batch_index` |
| **Agent Telemetry** | Runtime metrics across agents | `get_agent_metrics` |
| **Bus Diagnostics** | Inspect/clear knowledge bus topics | `get_bus_stats`, `clear_bus_topic` |
| **Lerna Project Graph** | Workspace dependency DAG export, optional ingest, cached refresh control | `lerna_project_graph` (requires Lerna config) |
| **Semantic Warmup** | Configurable cache priming for embeddings | `mcp.semantic.cacheWarmupLimit` |
### **β‘ High-Performance Architecture**
| Metric | Capability | Details |
|--------|-----------|---------|
| **Parsing Speed** | 100+ files/second | Tree-sitter based |
| **Query Response** | <100ms | Optimized SQLite + vector search |
| **Agent System** | Multi-agent coordination | Resource-managed execution |
| **Vector Search** | Hardware-accelerated (optional) | Automatic embedding ingestion |
| **AST Analysis** | Precise code snippets | Semantic context extraction |
### **π Multi-Language Support (11 Languages)**
| Language | Features | Support Level |
|----------|----------|---------------|
| **Python** | Async/await, decorators, magic methods (40+), dataclasses | β
Advanced (95%) |
| **TypeScript/JavaScript** | Full ES6+, JSX, TSX, React patterns | β
Complete (100%) |
| **C/C++** | Functions, structs/unions/enums, classes, namespaces, templates | β
Advanced (90%) |
| **C#** | Classes, interfaces, enums, properties, LINQ, async/await | β
Advanced (90%) |
| **Rust** | Functions, structs, enums, traits, impls, modules, use | β
Advanced (90%) |
| **Go** | Packages, functions, structs, interfaces, goroutines, channels | β
Advanced (90%) |
| **Java** | Classes, interfaces, enums, records (Java 14+), generics, lambdas | β
Advanced (90%) |
| **Kotlin** | Packages/imports, classes/objects, functions/properties, relationships | β
Implemented |
| **VBA** | Modules, subs, functions, properties, user-defined types | β
Regex-based (80%) |
---
## π οΈ **Usage Examples**
```bash
# Single project analysis
code-graph-rag-mcp
code-graph-rag-mcp /path/to/your/project
# CLI helpers
code-graph-rag-mcp --help
code-graph-rag-mcp --version
# Multi-project setup (see Multi-Codebase Setup Guide)
# Configure multiple projects in Claude Desktop config
# Check installation
code-graph-rag-mcp --help
# Health & maintenance
# Health check (totals + sample)
get_graph_health
# Reset graph data safely
reset_graph
# Clean reindex (reset + full index)
clean_index
# Batched index with progress (recommended for strict clients/timeouts)
batch_index
# Lerna workspace graph (ingest into storage)
lerna_project_graph --args '{"ingest": true}'
# Force refresh graph and re-ingest (bypass cache)
lerna_project_graph --args '{"ingest": true, "force": true}'
# Cached runs return `cached: true`; use `force` to break the 30s debounce when configs change.
# Agent telemetry snapshot
get_agent_metrics
# Knowledge bus diagnostics
get_bus_stats
clear_bus_topic --args '{"topic": "semantic:search"}'
# One-shot index from the CLI (debug mode)
node dist/index.js /home/er77/tt '{"jsonrpc":"2.0","id":"index-1","method":"tools/call","params":{"name":"index","arguments":{"directory":"/home/er77/tt","incremental":false,"fullScan":true,"reset":true}}}'
# Relationships for an entity name
list_entity_relationships (entityName: "YourEntity", relationshipTypes: ["imports"])
# Adjust semantic warmup (optional)
export MCP_SEMANTIC_WARMUP_LIMIT=25
# Note: when an agent is saturated, `AgentBusyError` responses include `retryAfterMs` hints.
```
**With Claude Desktop**:
1. "What entities are in my codebase?"
2. "Find similar code to this function"
3. "Analyze the impact of changing this class"
4. "Suggest refactoring for this file"
**Multi-Project Queries**:
1. "Analyze the frontend-app codebase structure"
2. "Find authentication functions in backend-api"
3. "Compare user management across all projects"
---
## π§° **Troubleshooting**
- **Codex/VSCode MCP stdio fails to start**
Codex is strict about stdio: `stdout` must be JSON-RPC only. As of v2.7.12, console stdout logs are redirected to `stderr` during MCP runs, and heavy initialization is deferred until after handshake / first tool call.
Recommended Codex config: omit the directory argument and let the server use the workspace root via `roots/list`:
```toml
[mcp_servers.code-graph-rag]
command = "code-graph-rag-mcp"
args = []
```
If `index` / `clean_index` time out on large repos and the transport closes, prefer `batch_index` with a small `maxFilesPerBatch` and keep calling it with the returned `sessionId` until `done:true`.
If you must see logs on stdout for local debugging, set `MCP_STDIO_ALLOW_STDOUT_LOGS=1` (not recommended for strict clients).
If startup still fails, check the global tmp log mirror: `/tmp/code-graph-rag-mcp/mcp-server-YYYY-MM-DD.log` (Linux/macOS; uses `os.tmpdir()`).
- **`batch_index` fails with `agent_busy` / `memory_limit`**
Increase the coordinator/conductor limits (these gate task routing in-process): set `COORDINATOR_MEMORY_LIMIT` / `CONDUCTOR_MEMORY_LIMIT` and `COORDINATOR_MAX_MEMORY_MB` / `CONDUCTOR_MAX_MEMORY_MB`, or edit `config/default.yaml`.
If you see a real Node.js OOM, also start the server with a larger heap, e.g. `NODE_OPTIONS="--max-old-space-size=4096" code-graph-rag-mcp`.
- **Database location / multi-repo isolation**
By default, the server stores its SQLite DB under `./.code-graph-rag/vectors.db` (per repo). Add `/.code-graph-rag/` to your projectβs `.gitignore`.
- **Native module mismatch (`better-sqlite3`)**
Since v2.6.4 the server automatically rebuilds the native binary when it detects a `NODE_MODULE_VERSION` mismatch. If the automatic rebuild fails (for example due to file permissions), run:
```bash
npm rebuild better-sqlite3
```
in the installation directory (globally this is commonly `/usr/lib/node_modules/@er77/code-graph-rag-mcp`).
- **Legacy database missing new columns**
Older installations might lack the latest `embeddings` columns (`metadata`, `model_name`, etc.). The server now auto-upgrades in place, but if you still encounter migration errors, delete the local DB and re-run the indexer:
```bash
# delete the DB shown in logs as: "[Config] Database path: ..."
rm -f ./.code-graph-rag/vectors.db ./.code-graph-rag/vectors.db-wal ./.code-graph-rag/vectors.db-shm
```
Then start the server again to trigger a clean rebuild.
- **Running a one-shot index from the CLI**
You can trigger tools directly by passing JSON-RPC payloads. When a payload is supplied, the server skips the semantic agent and uses low-memory batching for debugging. Example:
```bash
node dist/index.js /path/to/project \
'{"jsonrpc":"2.0","id":"index-1","method":"tools/call","params":{"name":"index","arguments":{"directory":"/path/to/project","incremental":false,"fullScan":true,"reset":true}}}'
```
The command logs progress to `logs_llm/mcp-server-YYYY-MM-DD.log`. Set `MCP_DEBUG_DISABLE_SEMANTIC=0` if you want embeddings enabled during the run.
---
## π **Changelog**
### π Version 2.7.11 (2025-12-15) - **Per-Repo Database Isolation**
- ποΈ Default DB path moved to `./.code-graph-rag/vectors.db` so multiple codebases donβt share/mix a single global SQLite database
### π Version 2.7.12 (2025-12-15) - **Remove Deprecated boolean Install Warning**
- π§Ή npm install: stop auto-installing `onnxruntime-node` by default (optional peer dep instead), removing the `boolean@3.2.0` deprecation warning during install
### π Version 2.7.10 (2025-12-15) - **sqlite-vec Global Install Fix**
- π§ sqlite-vec: load the extension via `sqlite-vec`βs `getLoadablePath()` so global installs work regardless of project `cwd`
### π Version 2.7.9 (2025-12-15) - **Codex Config Fixes**
- π§ Codex docs: use `codex mcp add ...` global config (works from any project folder)
- π§Ύ Removed references to non-existent helper scripts
### π Version 2.7.8 (2025-12-15) - **MCP Startup Diagnostics**
- β
TTY-safe stdio: redirect console stdout logs β `stderr` during MCP runs (prevents handshake breaks)
- ποΈ Global log mirror: always write a copy to `/tmp/code-graph-rag-mcp/mcp-server-YYYY-MM-DD.log` for early-start debugging
- π‘οΈ Resilient logging: if `logs_llm/` canβt be created, fall back to `os.tmpdir()` instead of exiting before `initialize`
### π Version 2.7.7 (2025-12-14) - **Codex StdIO Hardening + Kotlin**
- β
Codex/VSCode stdio compatibility: reserve `stdout` for JSON-RPC and route logs to `stderr`
- β‘ Faster readiness: defer heavy runtime init until after handshake / first tool call
- π Project root now optional: defaults to `cwd`, prefers MCP `roots/list` workspace root when available
- π§© Kotlin support for `.kt/.kts` (tree-sitter + KotlinAnalyzer) with tests + ADR-006
### π Version 2.7.4 (2025-11-02) - **Clone Reporting & CLI Boost**
- π Added `--help/-h` and `--version/-v` flags for quick CLI interaction
- π `detect_code_clones` now merges deduplicated semantic groups with JSCPD summaries
- π `jscpd_detect_clones` summary includes duplicated line/token counts, percentages, and inline snippets
- π§ͺ Integration coverage verifies clone-report fields to guard against regressions
- π Maintains vendored lightweight JSCPD pipeline with zero external deps
### π Version 2.7.0 (2025-11-02) - **JSCPD Clone Detection**
- β Added `jscpd_detect_clones` MCP tool exposing JSCPD duplicate detection without requiring embeddings
- π§© Vendored lightweight JSCPD core/tokenizer for zero external build dependencies
- π§ͺ New integration fixtures ensure JSCPD scans surface expected duplicate blocks
- π README / tool catalog now counts 23 MCP methods and documents JSCPD usage
### π Version 2.6.0 (2025-10-12) - **Major Architecture Upgrade**
**Breaking Changes & Major Improvements** β‘
- π **Provider-based embeddings**: New architecture supporting memory/transformers/ollama/openai/cloudru providers
- π§ **Runtime diagnostics**: `get_agent_metrics`, `get_bus_stats`, and `clear_bus_topic` expose live telemetry and knowledge-bus controls for Codex automation
- π‘οΈ **Agent backpressure hints**: MCP tools now receive structured `agent_busy` responses with retry guidance when capacity is saturated
- π― **Deterministic graph IDs**: SHA256-based stable IDs for entities and relationships
- β¨ **Enhanced vector store**: Renamed tables (`doc_embeddings`, `vec_doc_embeddings`) with improved sqlite-vec integration
- π§ **YAML-driven configuration**: Unified configuration across parser/indexer/embedding agents
- π **Improved parser**: Re-enabled tree-sitter ParserAgent with incremental parsing and richer metadata
- π‘οΈ **Hardened MCP tools**: Better entity resolution, structural+semantic responses, improved graph operations
- π **Idempotent operations**: Local de-duplication and ON CONFLICT upserts for consistent graph writes
**Technical Details:**
- Dynamic dimension detection at runtime with safe fallbacks
- Batch deduplication by ID with transactional updates
- Enhanced language analyzers with structured pattern data
- SQLiteManager + GraphStorage singleton for consistency
**Testing & Validation (2025-10-21):**
- β
All 16/16 test suites passing (200+ individual tests, 93.75% success rate)
- β
100% MCP method validation (22/22 methods comprehensively tested)
- β
v2.6.0 new methods validated: `get_agent_metrics`, `get_bus_stats`, `clear_bus_topic`
- β
Integration test coverage: All core components, semantic operations, and monitoring tools
- β
v2.5.9 dual-schema fixes preserved and enhanced with `sqliteVecEnabled` property
- β
Zero regressions after PR #20 integration
- β οΈ Known issue: Duplicate `case "get_graph"` in src/index.ts:1668 & 1707 (non-critical, line 1707 unreachable)
### π Version 2.5.9 (2025-10-06) - **100% Success Rate**
- β
**Complete vector schema fix**: Dual-schema support for sqlite-vec and fallback modes
- β
**All 17 MCP methods working**: Verified 100% success rate
- π **Success rate**: 33% (v2.5.7) β 61% (v2.5.8) β **100% (v2.5.9)**
### Version 2.5.8 (2025-10-06) - Critical Infrastructure Fixes
- β
**Fixed agent concurrency limit**: 3 β 10 concurrent agents
- β
**Fixed vector database schema**: Dual-schema support for sqlite-vec extension
### Previous Versions
<details>
<summary>Click to expand version history (2.5.7 - 2.3.3)</summary>
**v2.5.7** - Semantic analysis improvements, lowered thresholds, clone detection
**v2.5.6** - Fixed DoraAgent type collision (+16% success rate)
**v2.5.5** - WASM path resolution fix, `get_version` tool, restart script
**v2.5.4** - Architecture Decision Records (ADRs)
**v2.5.3** - Deprecated dependency warning suppression
**v2.5.2** - Enhanced README documentation
**v2.5.1** - Python magic methods, import analysis enhancements
**v2.5.0** - 8 new languages (C#, Rust, C, C++, Go, Java, VBA), Research Trinity
**v2.4.1** - Rust AST parsing, system architecture docs
**v2.4.0** - Health check tools, AST hotspots, semantic routing
**v2.3.3** - Entity extraction fix (0 β 4,467 entities)
</details>
---
## β‘ **System Requirements**
**Minimum**: Node.js 24+, 2GB RAM, Dual-core CPU
**Recommended**: Node.js 24+, 8GB RAM, Quad-core CPU with SSD
### **Known Issues**
- **Deprecated `boolean@3.2.0` warning**: This is a transitive dependency from the optional `onnxruntime-node` package (used for ML embeddings). The package is deprecated but functional. The warning can be safely ignored as it doesn't affect core functionality.
---
## π€ **Contributing**
1. Fork the repository
2. Follow [Agent Governance](docs/AGENTS.md) rules
3. Submit pull request
[Contributing Guide](docs/guides/CONTRIBUTING.md) β’ [Issue Tracker](https://github.com/er77/code-graph-rag-mcp/issues)
---
## π **License**
MIT License - see [LICENSE](LICENSE) file for details.
**Links**: [GitHub](https://github.com/er77/code-graph-rag-mcp) β’ [NPM](https://www.npmjs.com/package/@er77/code-graph-rag-mcp) β’ [MCP Protocol](https://github.com/modelcontextprotocol)
Installation
Add this MCP to your configuration:
{
"mcpServers": {
"code-graph-rag-mcp": {
// See GitHub repository for configuration
}
}
}See the GitHub repository for full installation instructions.