sdkRun LLama (LlamaIndex Inc.)watching

LlamaIndex

The document-layer framework for production RAG. Readers, node parsers, indices, query engines, and node postprocessors — the messy 80% of "RAG over a corpus" turned into composable primitives. MIT licensed; the paid product is LlamaCloud (managed parse), the framework itself is free. Watching, not applied — the natural next-build is the builddaily.io chat-bridge upgrade.

Updated May 24, 2026

LlamaIndex is the framework you reach for when "RAG over a corpus" is the actual job and you've already burned a week stitching loaders, chunkers, and retrievers together yourself. This page is the orient-and-pick-the-right-primitive surface — official docs at docs.llamaindex.ai own the API contracts.

Not yet in GL production. This page is the map before the build — the builddaily.io chat-bridge upgrade is the natural first slice.

What it is

A Python (and TypeScript) framework for building over private data with LLMs. Released by LlamaIndex Inc. (formed around the open-source project, formerly GPT Index); MIT licensed; pip install llama-index. Three concept tiers:

Loaders — read raw sources (filesystem, S3, Notion, PDFs, web, databases) into Document objects.
Indices — turn Documents into queryable structures via node parsing (chunking), embedding, and storage. VectorStoreIndex, SummaryIndex, KeywordTableIndex, etc.
Query Engines — combine an index with a retrieval strategy, a node postprocessor pipeline (reranking, deduping, citation), and a response synthesizer (the prompt that turns retrieved nodes into an answer).

The pitch: each of the messy parts of RAG — loaders for 100+ source types, semantic / sentence-window / hierarchical chunking, hybrid retrieval (BM25 + vector), reranking through Cohere or local cross-encoders, citation back to source — is a class you swap in, not a week of plumbing.

When to use it

Reach for it when:

The corpus is documents — PDFs, markdown, web pages, knowledge bases — and the user query lands somewhere in them.
You need hybrid retrieval (BM25 + vector) or reranking out of the box. Rolling these from primitives is real work.
You need citation back to source as a product requirement. The node abstraction carries source metadata through the pipeline cleanly.
You're integrating many source types (Notion, S3, Slack, Postgres, web). The loader catalog is worth the framework footprint alone.
The roadmap includes swapping vector stores or rerankers — the abstractions buy you that flexibility cheaply.

Skip it when:

The data is structured (SQL, on-chain reads, REST APIs) — query it directly; RAG is the wrong shape.
The corpus is one document — a single PDF Q&A is pdfplumber + a prompt, not a framework.
The workload is agent orchestration with no document layer — that's DSPy territory, not LlamaIndex.
You need to ship in an hour and the corpus is small enough to fit in the model's context window. Stuffing the context is fine until it isn't.
A LangChain codebase already exists and works — mixing both is a dependency tax without a clear win.

At a glance

Core primitives

SimpleDirectoryReader — walks a directory, infers loader per file type. The "first ten minutes" entry point.
Node parsers — SentenceSplitter, SemanticSplitterNodeParser, HierarchicalNodeParser. Chunking is where most production RAG wins or loses.
Vector stores — pluggable backend. Defaults to in-process SimpleVectorStore; plugs into PGVectorStore, ChromaVectorStore, QdrantVectorStore, WeaviateVectorStore, PineconeVectorStore, and ~30 more.
Retrievers — VectorIndexRetriever, BM25Retriever, QueryFusionRetriever (hybrid). Tunable top-k, alpha (BM25/vector weight), filters.
Node postprocessors — rerankers (CohereRerank, SentenceTransformerRerank), recency / similarity filters, source-citation enrichers.
Response synthesizers — Refine, CompactAndRefine, TreeSummarize. The default answer prompt is configurable; this is the seam where DSPy belongs.

Distribution

Framework — open-source, MIT, pip install llama-index. This is what every GL build would use.
LlamaCloud — paid SaaS for managed parsing (especially PDFs / tables / complex layouts) and managed indices. Free tier exists; not a default for GL.
LlamaParse — the parser product spun out as a standalone service. Useful for messy PDFs; can be replaced by self-hosted alternatives (pdfplumber, Unstructured) for simpler corpora.

How to integrate

Default integration order for a GL document-heavy build:

Load. SimpleDirectoryReader(input_dir, recursive=True).load_data() over the corpus. Sanity-check the document count and a sample's text content before chunking.
Parse into nodes. Start with SentenceSplitter(chunk_size=512, chunk_overlap=64). Promote to SemanticSplitterNodeParser if eval shows retrieval grouping is fighting the chunk boundaries.
Embed + store. Default embed model from the configured LM provider; persist to SimpleVectorStore (file-backed) for v1 of a small corpus. Graduate to pgvector when the corpus or filter requirements outgrow it.
Retrieve. Hybrid retrieval as the default — QueryFusionRetriever fusing BM25 + vector with reciprocal rank fusion. Pure-vector is fine for v0 but tends to miss exact-match queries.
Postprocess. Rerank top-k=20 down to top-n=5 with a reranker. Cohere if budget allows; local cross-encoder otherwise.
Synthesize. Replace the default response synthesizer with a DSPy Module — pass nodes + question in, get structured answer + citations out. This is the seam where the framework comparison stops mattering.

The first version of a GL deployment should stay file-backed and local-embed for as long as possible — no new vendor, no new oncall lane. Promote backends only when the eval set says retrieval quality demands it.

In the GL stack

Concrete places LlamaIndex slots into the three active GL products. None are shipped today; the chat-bridge upgrade is the highest-leverage first slice.

builddaily.io

Chat-bridge retrieval upgrade (first slice). Replace the _load_knowledge() concat-everything path with SimpleDirectoryReader → SemanticSplitterNodeParser → in-process vector store → hybrid retriever → DSPy answer Module. Same Ollama backend. Per-call context drops by an order of magnitude; latency drops; source citations become exact instead of voice-of-the-corpus.
Resources index semantic search. Index the web/content/resources/ tree. A visitor asking "what do you use for X" lands on the right resource page with the relevant snippet pre-highlighted.
Projects + posts unified retrieval. Index posts, projects, and resources together; the chat can answer "what's the read on Pendle?" with citations across all three surfaces, not just markdown blobs.

paiddaily.io

Tickers-as-Resources RAG. The 253 prerendered ticker pages are a corpus. Index them; let users ask "which tickers have the strongest pre-earnings setup right now" with semantic retrieval + reranking over the page text.
Catalyst archive search. Past Pendle catalysts indexed as Documents with (market, date, outcome) metadata. "Show me precedent for this market shape" queries fall out of the same retriever.
Operator playbook lookup. Aerodrome voting, IL handling, pool-selection playbooks. Index the markdown; in-app assistant retrieves the relevant chunk inline when a user is on the matching page.

sagedaily.io

Astrology + tarot canon RAG. Vedic dasha meanings, transit interpretations, tarot symbolism currently bake into module prompts as static text. Externalize the canon as an indexed corpus; each reading retrieves the relevant passages per card + transit, and the DSPy response synthesizer grounds in them. Removes ~thousands of tokens from every reading's system prompt.
User reading history retrieval. Semantic index over a user's prior readings — surface recurring archetypes, themes, intentions. Pairs with Neo4j for the temporal layer (Neo4j answers when, LlamaIndex answers what's similar).
Public sage-daily archive search. Index the published readings + posts so the public surface can answer "what did sage say about Pluto in Aquarius last month."

Gotchas

The default response synthesizer is a demo, not production. Brittle on edge-case queries, hard to constrain output format. Plan to swap it for a DSPy Module — that's where the seam lives.
Chunking is the silent failure. Bad chunk boundaries hurt retrieval more than embedding model choice. Inspect chunks on a sample of real queries before scaling.
LlamaIndex ≠ LangChain. They overlap at the loaders/embeddings layer and diverge elsewhere. Mixing them adds dependency surface for marginal gain — pick one document-layer framework.
LlamaParse is a hosted service. If a build uses it, the corpus leaves your machine. Default to local parsers (pdfplumber, Unstructured) unless the corpus genuinely needs LlamaParse's table/layout fidelity.
API churn. llama-index 0.10 split the monorepo into core + integrations packages. Pin versions; budget a re-pin per upgrade.

Risks

Vendor concentration on the paid surface. LlamaCloud / LlamaParse are single-vendor. Builds that lean on them inherit a dependency. The framework itself is fine — MIT, broad community.
The "easy demo, hard production" gap. Hello-world RAG in LlamaIndex is ten lines; production RAG (chunking, hybrid retrieval, reranking, citation, prompt compilation) is still real engineering. The framework removes plumbing, not judgment.
Lock-in is shallow but spreads. Loaders and node parsers are easy to port; query engine + node postprocessor pipelines accrete framework-specific configuration. Worth keeping the application-level interface (input: query → output: answer + citations) framework-agnostic so the inside can be swapped.

Alternatives

Alternatives · 6 substitutesPick LlamaIndex unless one of these wins on your specific brief.

01
LangChain
Broader chain & agent framework with a document layer.
Wins when ▸the codebase is already on LangChain end-to-end. Document-layer ergonomics are weaker than LlamaIndex's, but the integration surface is wider.
02
Haystack · deepset
Document-oriented NLP framework with a pipeline-as-DAG mental model.
Wins when ▸the team has prior Haystack experience or wants the DAG-style pipeline. Strong on classical NLP + RAG, lighter community than LlamaIndex.
03
Unstructured.io
Parsing & loading layer only — not a full RAG framework.
Wins when ▸the hard problem is parsing — messy PDFs, tables, scanned docs — not the rest of the pipeline. Pair it with a hand-rolled retriever.
04
Vectara
Managed RAG-as-a-service — hosted chunking, embedding, retrieval, citation.
Wins when ▸you want RAG-as-a-service and will trade framework control for ops velocity. Sensible for teams without an ML-infra lane.
05
Hand-rolled
pgvector + tiktoken + Cohere SDK + a custom retriever.
Wins when ▸tiny corpus, narrow query shape, or a team that wants zero framework dependency. The floor-from-scratch path — more work, less abstraction tax.
06
Stuff the context
Concatenate the entire corpus into the system prompt — no retrieval at all.
Wins when ▸the corpus fits in the model's window with room to spare and will stay that size. The current builddaily.io chat-bridge default — fine for v0, hits a ceiling at scale.

DSPy — the brain-layer counterpart. Lives inside a LlamaIndex query engine as the response synthesizer once the workload is document-heavy.