Benchmarks

All benchmarks measured on research-paper.pdf (21 pages, 11,710 tokens, 41 ToC entries). Node.js runtime, performance.now() timing.

Core Operations

Takeaway: Heading detection adds ~125ms but only 2.7% more tokens — a worthwhile tradeoff for significantly better hierarchy accuracy.

How documents split based on LLM context window size:

max_tokens	Groups	Avg tokens/group	Time
4,000	4	3,434	0.6 ms
8,000	2	6,279	0.1 ms
20,000	1	11,958	0.07 ms
128,000	1	11,958	0.05 ms

For a simulated 1M-token document (500 pages):

LLM Context	Groups	LLM Calls	Pages/Group
4k	499	499	~2
8k	167	167	~3
20k (default)	56	56	~9
128k	8	8	~63

Time for listToTree() + assignPageRanges() + assignNodeIds():

Sections	Build Time	Tree Nodes	Roots
10	0.6 ms	12	2
50	0.3 ms	50	10
200	0.5 ms	200	40
500	1.3 ms	500	100

Tree building is sub-millisecond for typical documents and scales linearly.

Time to detect and insert synthetic parent nodes:

Orphan Count	Time	Input → Output	Synthetic Parents
5	0.2 ms	10 → 20	10
20	0.5 ms	25 → 65	40
50	1.5 ms	55 → 155	100
100	3.0 ms	105 → 305	200

tocToSections() performance:

Token savings when using capped vs full context in continuation prompts:

Document	Sections	Old (full JSON)	New (capped)	Savings
100 pages	195	9,750 tok	4,800 tok	50.8%
300 pages	586	117,200 tok	19,200 tok	83.6%
500 pages	976	317,200 tok	31,200 tok	90.2%

The capped context sends: top-level chapters + last 30 sections + metadata.

Full indexing pipeline for the 21-page research paper: