AI Code Search Tools Benchmark

Claude Native (Grep/Glob) vs index1 vs qmd — Complete performance comparison of three search tools at different scales

Apple Silicon M2 · 48 GB RAM Test Scale: 10K / 50K / 100K Documents Ollama v0.15.4 · bge-m3 1024d (CJK) 2026-02-06
🌐 EN 中文 日本語 한국어

Claude Native

Grep / Glob / Read — Claude Code Built-in Tools (ripgrep)
Rust + SIMD Zero Dependencies O(N) Linear Scan Exact Match No Index Needed
📚

index1

BM25 + Vector Hybrid Search — Python MCP Server
FTS5 O(log N) sqlite-vec Ollama 5 Chunkers CJK Optimized L1/L2 Cache
🤖

qmd

BM25 + Vector + LLM Reranking — Bun/TypeScript MCP Server
3 GGUF Models Query Expansion LLM Rerank No External Dependencies Markdown Only
10K Documents / ~50 MB Code
Medium Projects (Django, Flask)
50K Documents / ~250 MB Code
Large Projects (React, Vue)
100K Documents / ~500 MB Code
Very Large Projects (Linux Kernel Subset)

0Data Sources & Methodology (Methodology)

Data Label Legend

MEASUREDActually measured on the index1 project (17,725 chunks, 1,707 documents)
REFERENCEFrom official benchmarks or papers (ripgrep, SQLite FTS5, sqlite-vec, Ollama)
PROJECTEDProjected from component performance + algorithm complexity (formulas annotated)
▶ Data Source and Credibility Details (Click to Expand)
Data Source Details Confidence
index1 MeasuredThis project 17,725 chunks / 1,707 docs / 185MB database, median of multiple runsHigh (Measured)
ripgrep BenchmarkOfficial burntsushi/ripgrep repo README benchmarks (Linux kernel, ~1GB corpus)High (Official)
SQLite FTS5sqlite.org official docs + andrewmara.com 18M-row trigram benchmarkHigh (Official+Third-party)
sqlite-vecalexgarcia.xyz official v0.1.0 release benchmark (SIFT1M/GIST500K)High (Author Published)
Ollama embedcollabnix.com embedding model guide + nomic.ai official blogMedium-High
qmd Pipelinetobi/qmd source analysis + component benchmark projection (no official benchmark)Medium (Projected)
Scale ProjectionLinear/logarithmic extrapolation from measured baseline based on O(N)/O(log N)/O(N*D) complexityMedium (Projected)

1AI Agent Experience (Agent Experience)

⚡ AI Search Result Reading Speed Comparison

Regardless of which AI Agent you use (Claude Code, Cursor, Cline...), every search requires reading the returned results.
The more results returned, the slower and more expensive for AI to read.

Metric index1 qmd
Tokens per Search ~460 tok ~900 tok
AI Reading Time < 0.1s ~0.2s
Search Latency (End-to-End) ~100 ms ~1,200 ms
20 Searches Cumulative 9,200 tok 18,000 tok
200K Window Usage 4.6% 9%
AI Cost (20 searches / $3/M) $0.03 $0.05
Tokens AI needs to read per search (shorter is better)
qmd
900 tok — 0.2s
index1
460 tok — < 0.1s 🔥

MEASUREDCost estimated at Claude Sonnet $3/M input tokens

AI Agent search efficiency depends not just on speed, but also oninvocation complexityandtoken consumption. A tool that requires multiple trial-and-error calls wastes the Agent's reasoning tokens and user wait time.
Dimension Claude Native index1 qmd
API Calls per Search1 call (Grep)1 call (docs_search)1-3 calls (must choose search/vsearch/query)
APIs AI Must Understand3 (intuitive)1 unified endpoint6 tools
Index InitializationNot needed1 step:index1 index2 steps:bun run index + bun run vector
Environment Auto-diagnosisN/Aindex1 doctor --fix
💰 Single Query Tokens~36,000 tok (high-frequency words)~460 tok (Top-5)~900 tok (Top-10)
Search Latency (Perceived)< 50 ms~100 ms~1,200 ms (query full pipeline)
When Ollama UnavailableN/AAuto-fallback to text-only searchVector search completely unavailable

AI Agent Actual Usage: Complete Flow for a Single Search

Claude Native (Grep)
1. Grep("search")
2. Returns 950 lines (~36K tokens)
Context window gets flooded
1 call | ~40ms | But token explosion
index1
1. docs_search("搜索怎么工作的")
2. Returns Top-5 results (~460 tokens)
Precise, token-efficient, CJK support
1 call | ~100ms | 98%+ token savings
qmd
1. AI hesitates: use search? vsearch? query?
2. Triessearch("search")→ mediocre results
3. Switches toquery("search")→ waits 1.2s
More precise results, but high AI decision cost
1-3 calls | 1.2s+ | Highest precision but complex flow

Major AI Agent / Editor Compatibility

All tools connect to AI Agents via MCP protocol. Claude Native is built-in, requiring no additional setup.

AI Agent MCP Support Claude Native index1 qmd Notes
Claude Code ✓ Native Built-in ✓ 1 endpoint ✓ 6 tools index1: one docs_search handles all
OpenClaw ✓ MCP Built-in Open-source Claude Code alternative, native MCP
Cursor ✓ MCP Built-in (different impl) ✓ Must choose tool Cursor's AI tends to select fewest tools
Windsurf ✓ MCP Built-in MCP stdio mode
Cline ✓ MCP Built-in (VS Code) VS Code extension, MCP support
Aider Built-in grep Manual integration needed Manual integration needed No native MCP support yet
GitHub Copilot Partial Built-in Requires Agent Mode Requires Agent Mode Copilot Agent Mode supports MCP

index1's1 unified endpointdesign lets all AI Agents hit the right result in one call, without understanding the differences between multiple tools. qmd's 6 tools (search/vsearch/query/get/multi_get/status) increase the AI's decision burden.

2CJK Language Support (CJK Language Support)

CJK (Chinese-Japanese-Korean) search quality depends ontokenization accuracyembedding model language coverageandquery strategy tuningacross three dimensions. The following comparison is based on actual test results with index1 configured with bge-m3 + jieba tokenization.

2.1 CJK Capability Matrix

Capability Claude Native index1 + bge-m3 qmd
Chinese Tokenization N/A (literal matching) ✓ jieba precise tokenization
pip install index1[chinese]
✗ porter unicode61
Splits by Unicode character, no semantics
Embedding Model N/A bge-m3 1024d
BAAI multilingual model, CJK optimized
embeddinggemma 300M
English-focused, severe CJK semantic loss
CJK Query Strategy N/A Dynamic weighting
CJK detected → BM25=0.2 / Vec=0.8
Fixed weights
Language-agnostic, no adaptation
Cross-lingual Search ✗ Literal match only ✓ Chinese query finds English code
"配置合并" → config.py merge()
Limited
Weak model semantics
Japanese / Korean ✓ bge-m3 native support
Vector search works, BM25 needs extra tokenizer

2.2 CJK Configuration Resource Costs

Configuration Model Size Vector Dimensions Memory Usage Index Speed (10K) Storage (10K)
index1(BM25 only) 0 N/A ~80 MB ~10 s ~60 MB
index1 + Ollama
nomic-embed-text 768d
274 MB 768d ~600 MB ~10 min ~150 MB
index1 + Ollama + bge-m3
bge-m3 1024d · CJK optimized
~1.2 GB 1024d ~900 MB ~14 min ~200 MB
qmd
embeddinggemma 300M + 2 GGUF
~2.2 GB (3 models) 768d ~2.5 GB ~8 min ~80 MB

3.3 FTS5 Tokenization Comparison (Why It Matters)

FTS5 full-text search effectiveness entirely depends on thetokenizer. Chinese has no spaces between words — wrong tokenizer = BM25 search is broken.
🔍 View Tokenization Comparison Details
qmd — porter unicode61
-- store.ts table creation
CREATE VIRTUAL TABLE documents_fts
  USING fts5(filepath, title, body,
  tokenize='porter unicode61');

-- Query "中文搜索" processing:
Input:"中文搜索"
split(/\s+/):["中文搜索"] ← as whole string
FTS5 index:中|文|搜|索 ← char by char
✗ Query term ≠ Index term, match fails
porter is an English stemmer (running→run), ineffective for Chinese
unicode61 splits at Unicode character boundaries, breaking Chinese into individual characters
index1 — jieba + default tokenizer
# db.py preprocessing
def tokenize_for_fts5(text):
  if has_cjk(text):
    return " ".join(jieba.cut(text))

# Query "中文搜索" processing:
Input:"中文搜索"
jieba.cut():["中文", "搜索"]
FTS5 query:"中文" OR "搜索"
✓ Query term = Index term, exact match
Both indexing and querying go through jieba, ensuring consistency
Pure English automatically skips jieba, zero overhead
Query Example index1 BM25 Result qmd BM25 Result
"搜索功能" search.py, cli.py (precise) No results or random matches
"配置合并" config.py merge() (precise) No results
"search function" search.py (normal) search.py (normal)

MEASURED Based on actual search tests on the index1 source project · CODEqmd FTS5 config source:store.ts:519

🌏 index1 CJK One-Click Setup

index1 (3 steps)
pip install index1[chinese]
index1 config embedding_model bge-m3
index1 index --force
✓ Done! CJK search + cross-lingual queries enabled
qmd (no official solution)
1. Find a Chinese GGUF embedding model yourself
2. Modify source codestore.tsmodel URI in
3. Change dimension config, rebuild index
4. FTS5 tokenization still can't be improved
✗ No out-of-the-box CJK support

MEASURED bge-m3 model size ~1.2GB (BAAI/bge-m3) · MEASURED1024d vectors vs 768d: +33% storage, +40% indexing, +15% search latency

3Resource Requirements vs Scale (Resources)

3.1 Index Build Time

Tool Index Strategy 10K Docs 50K Docs 100K Docs Source
Claude Native No indexing needed 0 s 0 s 0 s MEASURED
index1 FTS5 inverted index ~10 s ~30 s ~60 s PROJECTED
index1 + Ollama FTS5 + Ollama embedding ~10 min ~45 min ~90 min PROJECTED@9K tok/s
qmd FTS5 + GGUF embedding ~8 min ~35 min ~70 min PROJECTED

3.2 Database / Storage Size

Tool Storage Content 10K 50K 100K
Claude Native None (scans source files) 0 MB 0 MB 0 MB
index1 FTS5 index + original text ~60 MB ~250 MB ~500 MB
index1 + Ollama FTS5 + 768d vectors ~150 MB ~600 MB ~1.2 GB
qmd FTS5 + vectors + llm_cache ~80 MB ~350 MB ~700 MB

3.3 Runtime Memory Usage

Tool Memory Composition 10K 50K 100K
Claude Native ripgrep process + file buffer ~50 MB ~150 MB ~400 MB
index1 Python + SQLite page cache ~80 MB ~150 MB ~250 MB
index1 + Ollama Python + SQLite + Ollama model ~600 MB ~700 MB ~800 MB
qmd Bun + SQLite + 3 GGUF models ~2.5 GB ~2.7 GB ~3.0 GB

REFERENCE ripgrep 9.5M files = ~400MB RSS (burntsushi/ripgrep#1823) · REFERENCEOllama nomic-embed-text ~500MB resident

4Search Latency vs Scale (Latency at Scale)

Latency = end-to-end time from query initiation to result return. Claude Native and index1 have measured data, qmd projected from component benchmarks.

4.1 End-to-End Search Latency (End-to-End Query Latency)

Tool / Mode Formula 10K Docs 50K Docs 100K Docs
Claude Native(Grep) ripgrep_scan 30-50 ms 150-250 ms 300-500 ms
index1 + Ollama BM25 + embed + vec + RRF ~60 ms ~90 ms ~120 ms
index1 BM25 + RRF < 5 ms ~8 ms ~15 ms
index1Cache Hit L1 cache lookup < 1 ms < 1 ms < 1 ms
qmdsearch (BM25) FTS5 only < 5 ms ~8 ms ~15 ms
qmdvsearch (Vector) embed + vec ~30 ms ~65 ms ~90 ms
qmdquery (full pipeline) expand + BM25 + vec + RRF + rerank ~1,200 ms ~1,250 ms ~1,300 ms
qmdCache Hit llm_cache lookup ~5 ms ~5 ms ~5 ms
100K Document Search Latency Comparison
Bar height = latency, shorter is faster
400 ms
120 ms
15 ms
<1 ms
15 ms
90 ms
1,300 ms
5 ms
Grep
index1
+Ollama
index1
BM25
index1
cache
qmd
BM25
qmd
vector
qmd
full
qmd
cache

4.2 Latency Visualization (100K Scale)

Keyword Search @ 100K Documents

Claude Native (Grep)300-500 ms
index1 + Ollama~120 ms
qmd query~1,300 ms

Latency Growth Trend (10K → 100K)

Claude Native10x Growth
index1 + Ollama2x Growth
qmd query1.08x Growth
qmd latency is dominated by LLM inference (constant), minimal impact from scale growth.
Grep O(N) linear growth, index1 BM25 O(log N) nearly constant.

index1 Measured Latency Breakdown (Current Scale)

Python Process Startup~80 ms
Ollama HTTP~50 ms
BM25 + Vec + RRF~85 ms
MEASUREDMCP persistent process mode has zero startup overhead, actual query ~85-120ms
MEASUREDindex1 Measured Data (17,725 chunks / 1,707 documents)
Query Engine Time (Cold) Engine Time (Hot) CLI Total Time Grep Comparison
"中日韩"(low-frequency, 46 lines)576 ms94 ms0.21s0.43s (Grep slower)
"embedding"(mid-frequency, 257 lines)119 ms85 ms0.20s0.24s (close)
"search"(high-frequency, 950 lines)100 ms-0.22s0.21s (tied)
"config"(very high-frequency, 4386 lines)119 ms-0.24s0.18s (Grep faster)
"搜索是怎么工作的"(CJK semantic)91 ms-0.20sGrep can't do semantic search
"向量搜索的实现原理"(CJK semantic)89 ms-0.21sGrep can't do semantic search
"how does search work"(EN semantic)332 ms-0.44sGrep can't do semantic search
"how to configure watch paths"228 ms-0.34sRequires 2-3 Grep combinations

5Token Consumption vs Scale (Context Window Impact)

Token consumption = tokens injected into AI context window from search results. Grep returnsallmatching lines (O(N)), while index1/qmd only returnTop-K(constant). This is the most critical difference at large scale.

5.1 High-frequency Word"search"Query Token Consumption

Scale Grep Matched Lines Claude Native (Grep) index1 (Top-5) qmd (Top-10) index1 Savings
Current (64 files) 950 linesM ~36,000 tokensM ~460 tokensM ~900 tokens 98.7%
10K Documents ~15,000 linesP ~375,000 tokens ~460 tokens ~900 tokens 99.88%
50K Documents ~60,000 linesP ~1,500,000 tokens ~460 tokens ~900 tokens 99.97%
100K Documents ~120,000 linesP ~3,000,000 tokens ~460 tokens ~900 tokens 99.98%

M= Measured,P= Projected from word frequency density. Claude context window is 200K tokens, searching "search" at 100K docs via Grep requires15 fullwindows to fit the results.

5.2 Token Consumption by Word Frequency (100K Document Scale)

Query Frequency Type Grep Matched Lines Grep Tokens index1 qmd index1 Savings
中日韩 Rare word ~1,000 ~25,000 ~460 ~900 98.2%
embedding Mid-frequency ~30,000 ~750,000 ~460 ~900 99.94%
search High-frequency ~120,000 ~3,000,000 ~460 ~900 99.98%
config Very high ~500,000 ~12,500,000 ~460 ~900 99.99%
import Extremely high ~800,000 ~20,000,000 ~460 ~900 99.99%

5.3 Session-level Token Impact (20 Searches / 200K Window)

Grep consumption is orders of magnitude larger (10K project 250K tok / 100K project 2.4M tok), incomparable — only index1 vs qmd shown here.

10K Document Project

index1 Usage~9.2K (5%)
qmd Usage~18K (9%)

50K Document Project

index1 Usage~9.2K (5%)
qmd Usage~18K (9%)

100K Document Project

index1 Usage~9.2K (5%)
qmd Usage~18K (9%)
index1/qmd's Top-K output always < 1K tokens, unaffected by scale.
index1 returns about half of qmd's output (460 tok vs 900 tok).

6Search Quality Scores (Search Quality)

Claude Native (Grep/Glob)

Exact Match
Semantic Search
CJK Queries
Result Ranking
Cross-lingual
Recall
Precision
Best for: Precise lookup of known identifiers

index1 + Ollama

Exact Match
Semantic Search
CJK Queries
Result Ranking
Cross-lingual
Recall
Precision
Best for: Exploratory questions + multilingual projects

qmd (Query + Rerank)

Exact Match
Semantic Search
CJK Queries
Result Ranking
Cross-lingual
Recall
Precision
Best for: High-precision semantic search of English docs

7Feature Matrix (Feature Matrix)

Feature Claude Native index1 qmd
RuntimeRust (ripgrep)Python 3.10+Bun (TypeScript)
Search AlgorithmLiteral match + regexBM25 + Vector + RRFBM25 + Vector + QE + RRF + Rerank
BM25 Full-text Searchripgrep (not BM25)✓ FTS5✓ FTS5
Vector Semantic Search✓ sqlite-vec✓ sqlite-vec
Embedding EngineN/AOllama (local)node-llama-cpp (local GGUF)
Default Embedding ModelN/Anomic-embed-text 768d (default)
bge-m3 1024d (CJK optimized)
embeddinggemma-300M (English only)
Model Ecosystem & Extensibility
Available ModelsN/AOllama ecosystemHundredsGGUF format only
Model SwitchingN/AOne command:index1 config embedding_model xxxEdit source code with GGUF URI
Model PresetsN/A5 presets (lightweight / standard / chinese / multilingual / high_precision)No presets, 3 hardcoded models
Chinese Model SwitchingN/Aollama pull bge-m3One-click switchMust find GGUF Chinese model yourself
Hot Model SwitchingN/A✓ Change config + index --force✗ Requires recompile/restart
Query Expansion✓ Fine-tuned 1.7B GGUF
LLM Reranking✓ qwen3-reranker 0.6B GGUF
RRF Fusion✓ k=60✓ k=60 + position-aware
Query Cache✓ L1/L2 (10min TTL)✓ llm_cache (SQLite)
MCP Tools3 (Grep/Glob/Read)56
Chunking StrategyNone (full text)5 language-aware (md/py/rs/js/txt)Markdown chunking (800 tok/chunk)
Supported File TypesAll text files.md .py .rs .js .ts .jsx .tsx .txt.md (Markdown)
CJK Optimization✓ Dynamic weighting BM25=0.2/Vec=0.8
Web UI✓ Flask (port 6888)
File Watching✓ Real-time✓ watcher
Auto-diagnosis (doctor)N/A
External Service DependencyNoneOllama (optional)None
Cross-platformmacOS/Linux/WindowsmacOS/Linux/WindowsmacOS/Linux (Bun)

8Cross-platform Compatibility (Cross-platform Compatibility)

MCP tools need to work across different AI editors. Below compares integration difficulty and compatibility across major AI Agents.

8.1 AI Editor Compatibility Matrix

AI Editor / Agent Claude Native index1 qmd
Claude Code ✓ Built-in ✓ MCP stdio ✓ MCP stdio
Cursor ✓ Built-in ✓ MCP stdio ✓ MCP stdio
Windsurf ✓ Built-in ✓ MCP stdio ✓ MCP stdio
Cline(VS Code) ✓ Built-in ✓ MCP stdio ✓ MCP stdio
OpenClaw ✓ Built-in ✓ MCP stdio ✓ MCP stdio
CLI / Scripts ✗ No standalone CLI index1 search bun run search

8.2 Operating System Compatibility

Platform index1 qmd
macOS(ARM / Intel) ✓ pip / pipx ✓ bun
Linux(x64 / ARM) ✓ pip / pipx ✓ bun
Windows ✓ pip / pipx ⚠ Bun Windows support limited
Docker

9Use Case Recommendations (When to Use What)

Claude Native (Grep/Glob)

Speed Champion — Best for exact lookup and small projects
Best For
• Exact lookup of known function/class names
• Small projects with < 10K files
• Need zero latency, zero configuration
• Advanced regex matching
Avoid When
• > 10K files + high-frequency words (token explosion)
• Semantic understanding / concept search
• CJK queries searching English code
• Need result ranking
10K→100K Latency Growth
10x(O(N) linear)

index1

Balanced Choice — Low latency + Low tokens + Multilingual
Best For
• Medium to large multilingual code projects
• Mixed CJK-English search (CJK optimized)
• Limited AI Agent token budget
• Need to search .py/.rs/.js/.md simultaneously
Avoid When
• Pure Markdown doc library (qmd is better)
• Pursuing ultimate precision (qmd has rerank)
• Memory < 8 GB
10K→100K Latency Growth
2x(O(log N) BM25 + O(N) vec)

qmd

Precision Champion — LLM Rerank achieves highest search quality
Best For
• Large English doc libraries (README, manuals)
• Need highest search precision
• Don't want to install/maintain Ollama
• Sufficient memory (≥ 16 GB)
Avoid When
• Code file search (only supports .md)
• CJK / multilingual queries
• Need low latency (< 500ms)
• Insufficient memory (< 16 GB)
10K→100K Latency Growth
1.08x(LLM inference constant dominates)

10Overall Rating (Overall Rating)

Dimension Claude Native index1 qmd
Search Speed (Small Scale)★★★★★★★★★☆★★☆☆☆
Search Speed (Large Scale)★★☆☆☆★★★★☆★★★☆☆
Search Precision★★★☆☆★★★★☆★★★★★
Token Efficiency★★☆☆☆★★★★★★★★★☆
Semantic Understanding★☆☆☆☆★★★★☆★★★★★
CJK / Multilingual★☆☆☆☆★★★★☆★★☆☆☆
Ease of Use / Zero Config★★★★★★★★★☆★★★☆☆
Resource Consumption★★★★★★★★☆☆★★☆☆☆
Code File Support★★★★★★★★★☆★★☆☆☆
Large-scale Scalability★★☆☆☆★★★★☆★★★★☆
Cross-platform Compatibility
macOS / Linux / Windows
★★★★★
macOS / Linux / Windows
★★★★★
macOS / Linux / Windows
★★★☆☆
macOS / Linux (Win limited)
🤖 AI Agent Usability★★★★☆★★★★★★★☆☆☆
🏆 Total Score 40 / 60 50 / 60 37 / 60

Conclusion: Three-Tool Strategy

No single tool wins across all dimensions. The recommended strategy is tochoose based on project scale and query type:

< 1K files:Claude Native is sufficient, no extra tools needed.
1K-10K files:Claude Native (exact lookup) + index1 (semantic/CJK queries).
10K-100K files:index1 as primary (99%+ token savings), Grep only for known identifiers.
Large English doc libraries:qmd (highest precision) + Grep (quick lookup).
Multilingual code projects:index1 (the only one supportingCJK optimization+ 5 language chunkers).