Getting Started
Installation
Python
pip install treedex
With optional LLM providers:
pip install treedex[gemini] # Google Gemini
pip install treedex[openai] # OpenAI
pip install treedex[claude] # Anthropic Claude
pip install treedex[all] # All providers
Node.js
npm install treedex
Install your preferred LLM SDK:
npm install @google/generative-ai # Gemini
npm install openai # OpenAI
npm install @anthropic-ai/sdk # Claude
Quick Start
Python
from treedex import TreeDex, GeminiLLM
# 1. Create an LLM backend
llm = GeminiLLM(api_key="your-api-key")
# 2. Index a document
index = TreeDex.from_file("document.pdf", llm=llm)
# 3. See the tree structure
index.show_tree()
# 4. Query
result = index.query("What methods were used?")
print(result.pages_str) # "pages 12-15"
print(result.reasoning) # "Section 3 covers methodology"
print(result.context) # Full text from those pages
# 5. Agentic mode — get a direct answer
result = index.query("What methods were used?", agentic=True)
print(result.answer) # "The study used survey-based..."
Node.js
import { TreeDex, GeminiLLM } from "treedex";
// 1. Create an LLM backend
const llm = new GeminiLLM("your-api-key");
// 2. Index a document
const index = await TreeDex.fromFile("document.pdf", llm);
// 3. See the tree structure
index.showTree();
// 4. Query
const result = await index.query("What methods were used?");
console.log(result.pagesStr); // "pages 12-15"
console.log(result.reasoning); // "Section 3 covers methodology"
console.log(result.context); // Full text
// 5. Agentic mode
const answer = await index.query("What methods?", { agentic: true });
console.log(answer.answer);
Save & Load
Build once, query many times:
# Save
index.save("my_index.json")
# Load later (no re-indexing needed)
loaded = TreeDex.load("my_index.json", llm=llm)
result = loaded.query("question?")
await index.save("my_index.json");
const loaded = await TreeDex.load("my_index.json", llm);
const result = await loaded.query("question?");
The JSON index is cross-compatible — build in Python, query from Node.js, or vice versa.
Supported Formats
| Format | Extension | Python Deps | Node.js Deps |
|---|---|---|---|
.pdf | pymupdf (included) | pdfjs-dist (included) | |
| Plain text | .txt, .md | None | None |
| HTML | .html, .htm | None | htmlparser2 (optional) |
| DOCX | .docx | python-docx | mammoth |
How It Works (Summary)
- Load — Extract pages from your document
- Detect — Check for PDF table of contents or detect headings by font size
- Index — Build a tree structure (from ToC directly, or via LLM extraction)
- Query — LLM navigates the tree to find relevant sections
- Return — Get context text, source pages, and reasoning
For PDFs with a table of contents, zero LLM calls are needed for indexing — the tree is built directly from the bookmarks.
Next: Architecture →