Getting Started

Installation

Python

pip install treedex

With optional LLM providers:

pip install treedex[gemini]     # Google Gemini
pip install treedex[openai]     # OpenAI
pip install treedex[claude]     # Anthropic Claude
pip install treedex[all]        # All providers

Node.js

npm install treedex

Install your preferred LLM SDK:

npm install @google/generative-ai   # Gemini
npm install openai                    # OpenAI
npm install @anthropic-ai/sdk        # Claude

Quick Start

Python

from treedex import TreeDex, GeminiLLM

# 1. Create an LLM backend
llm = GeminiLLM(api_key="your-api-key")

# 2. Index a document
index = TreeDex.from_file("document.pdf", llm=llm)

# 3. See the tree structure
index.show_tree()

# 4. Query
result = index.query("What methods were used?")
print(result.pages_str)    # "pages 12-15"
print(result.reasoning)    # "Section 3 covers methodology"
print(result.context)      # Full text from those pages

# 5. Agentic mode — get a direct answer
result = index.query("What methods were used?", agentic=True)
print(result.answer)       # "The study used survey-based..."

Node.js

import { TreeDex, GeminiLLM } from "treedex";

// 1. Create an LLM backend
const llm = new GeminiLLM("your-api-key");

// 2. Index a document
const index = await TreeDex.fromFile("document.pdf", llm);

// 3. See the tree structure
index.showTree();

// 4. Query
const result = await index.query("What methods were used?");
console.log(result.pagesStr);    // "pages 12-15"
console.log(result.reasoning);   // "Section 3 covers methodology"
console.log(result.context);     // Full text

// 5. Agentic mode
const answer = await index.query("What methods?", { agentic: true });
console.log(answer.answer);

Save & Load

Build once, query many times:

# Save
index.save("my_index.json")

# Load later (no re-indexing needed)
loaded = TreeDex.load("my_index.json", llm=llm)
result = loaded.query("question?")

await index.save("my_index.json");
const loaded = await TreeDex.load("my_index.json", llm);
const result = await loaded.query("question?");

The JSON index is cross-compatible — build in Python, query from Node.js, or vice versa.

Supported Formats

Format	Extension	Python Deps	Node.js Deps
PDF	`.pdf`	`pymupdf` (included)	`pdfjs-dist` (included)
Plain text	`.txt`, `.md`	None	None
HTML	`.html`, `.htm`	None	`htmlparser2` (optional)
DOCX	`.docx`	`python-docx`	`mammoth`

How It Works (Summary)

Load — Extract pages from your document
Detect — Check for PDF table of contents or detect headings by font size
Index — Build a tree structure (from ToC directly, or via LLM extraction)
Query — LLM navigates the tree to find relevant sections
Return — Get context text, source pages, and reasoning

For PDFs with a table of contents, zero LLM calls are needed for indexing — the tree is built directly from the bookmarks.

Next: Architecture →