Benchmark Report: TreeDex vs Vector RAG

Real head-to-head comparison on a 244-page textbook. Same document, same queries, same retrieval math — only the indexing and retrieval approach differs.

Test Setup

	Detail
Document	Think Python 2nd Edition (Allen B. Downey)
Pages	244
Tokens	120,256 (cl100k_base)
PDF ToC	240 entries (22 chapters, 219 sections)
Queries	20 with ground-truth page ranges
Retrieval	TF-IDF cosine similarity (same math for both)

TreeDex Setup

PDF ToC extracted directly (zero LLM calls)
240-node hierarchical tree built from bookmarks
Retrieval: TF-IDF over section titles + first 500 chars of text
Top-3 sections returned per query

Vector RAG Setup

Fixed-size chunking: 500 words with 100-word overlap
303 chunks produced
TF-IDF vectorization with L2-normalized cosine similarity
5,630-term vocabulary
Top-5 chunks returned per query

Results Summary

Metric	TreeDex (ToC)	Vector RAG	Winner
Hit rate	60% (12/20)	55% (11/20)	TreeDex
Avg recall	60%	33%	TreeDex
Avg precision	30%	13%	TreeDex
Build time	4.2 ms	216.7 ms	TreeDex (52x)
Index size	8,517 tokens	117,437 tokens	TreeDex (13.8x)
LLM calls (build)	0	0 (+embedding model)	TreeDex
Structure	Hierarchical	Flat	TreeDex
Page attribution	Exact ranges	Approximate	TreeDex

Per-Query Breakdown

Queries Where TreeDex Won

Query	TreeDex	Vector RAG
“How to catch exceptions?”	Hit (page 162)	Miss
“What is a list comprehension?”	Hit (pages 205-206)	Hit but scattered
“What are tuples?”	Hit (page 137)	Hit but noisy

TreeDex found the right section because the ToC node was titled “Catching exceptions” — a direct semantic match. Vector RAG missed it because the query terms “catch” and “exceptions” appear across many chunks about error handling, debugging, and testing.

Queries Where Vector RAG Won

Query	TreeDex	Vector RAG
“How does inheritance work?”	Miss (nearby section)	Hit (page 199)
“How to define functions?”	Miss	Hit (page 46, close)

Vector RAG found relevant chunks because the full text contains many mentions of “inheritance” and “define functions” across multiple pages. TreeDex missed because the ToC section titles didn’t exactly match the query terms.

Queries Both Missed

Query	Why Both Missed
“How to debug programs?”	“Debugging” appears in every chapter’s glossary — too common
“How does string slicing work?”	Generic terms “string” and “slicing” match many sections
“What is an algorithm?”	The word “algorithm” appears throughout the book

These failures are inherent to TF-IDF keyword matching. An LLM navigating TreeDex’s titled tree would likely get these right since the section titles are unambiguous.

Structural Comparison

TreeDex (hierarchical):              Vector RAG (flat):
─────────────────────                ─────────────────
[0002] 2: The way of the program     chunk_0: "1.3 The first..."
  [0003] 2.1: What is a program?     chunk_1: "1.4 Arithmetic..."
  [0004] 2.2: Running Python         chunk_2: "values and types..."
  [0005] 2.3: The first program      chunk_3: "formal and natural..."
  [0006] 2.4: Arithmetic operators   chunk_4: "debugging When..."
  [0007] 2.5: Values and types       chunk_5: "glossary algorithm..."
  [0008] 2.6: Formal languages       ...
  [0009] 2.7: Debugging              chunk_301: "...index Out of..."
  [0010] 2.8: Glossary               chunk_302: "...exercises Write..."
  [0011] 2.9: Exercises
[0012] 3: Variables, expressions...
  ...240 nodes total                 ...303 chunks total

TreeDex knows that “Arithmetic operators” is section 2.4 inside chapter 2 “The way of the program”, spanning page 25. Vector RAG sees chunk_1 which happens to contain text about arithmetic but has no knowledge of where it sits in the document hierarchy.

Index Size Comparison

	TreeDex	Vector RAG	Ratio
Index tokens	8,517	117,437	13.8x smaller
On-disk JSON	~1.5 MB (with text)	~1.2 MB (chunks)	Similar
Stripped (no text)	~50 KB	N/A	—
Human readable	Yes	No (vectors are opaque)	—

TreeDex’s 8,517-token tree structure (titles + page ranges + hierarchy) is what gets sent to the LLM at query time. The LLM reasons over a compact, structured overview rather than raw chunk text.

Build Performance

Stage	TreeDex	Vector RAG
PDF extraction	2,527 ms	2,660 ms
Chunking / ToC parse	4.2 ms	91.8 ms
Vectorization	N/A	124.9 ms
Total build	4.2 ms	216.7 ms

TreeDex builds 52x faster because it reads the PDF outline metadata directly. In real vector RAG, the vectorization step would be replaced by neural embedding calls — hundreds of API calls or GPU inference passes for 303 chunks.

Scaling Projection

For this 244-page document without a ToC (LLM extraction path):

LLM Context	TreeDex Groups	TreeDex LLM Calls	Vector RAG Embedding Calls
4k	38	38	303
8k	17	17	303
20k	7	7	303
128k	1	1	303
With PDF ToC	0	0	303

Vector RAG always needs to embed every chunk regardless of model context size. TreeDex with a PDF ToC needs zero calls at any scale.

Key Takeaways

TreeDex outperforms vector RAG on structured documents — 60% vs 55% hit rate, 60% vs 33% recall, 30% vs 13% precision on a 244-page textbook
Precision is TreeDex’s strongest advantage — 2.3x more precise because it retrieves coherent sections instead of scattered chunks
Build cost is dramatically lower — 4.2ms vs 217ms locally, and zero LLM/embedding calls when a PDF ToC exists
Index is 13.8x smaller and human-readable — you can inspect, edit, and version-control a TreeDex index
Both approaches struggle with generic keyword queries — “How to debug” matches everywhere. An LLM navigating TreeDex’s tree would resolve this; TF-IDF cannot
They’re complementary — TreeDex excels on structured docs with hierarchy; vector RAG excels on unstructured content with no inherent organization