Build Daily

Tinley Park · May 29, 2026
toolNomic AIwatching

nomic-embed-text

A small, fast, open-weight text-embedding model. 137M parameters, 768-dimensional output, 8192-token context. Runs locally through Ollama — same shape as your existing chat model surface. Free, Apache 2.0-licensed weights. The default embedding choice for any GL corpus that fits on local hardware (i.e., all of them, today).

Updated May 24, 2026

nomic-embed-text is the embedding model that lets a GL RAG build stay entirely local with no quality trade you'd notice. Already on the chat-bridge's ALLOWED_MODELS list — just not used for retrieval yet. This page is the orient-and-wire-it surface.

What it is

A general-purpose sentence-embedding model from Nomic AI. Open weights (Apache 2.0); 137M parameters; produces 768-dimensional vectors; supports up to 8192-token input — long-document chunks fit cleanly without aggressive splitting. Released as nomic-embed-text-v1.5 (current) with Matryoshka-Representation-Learning: you can truncate the 768-dim vector down to 512 / 256 / 128 dims at retrieval time and trade quality for storage cleanly.

On MTEB (the standard embedding leaderboard) it sits above OpenAI's text-embedding-3-small on retrieval tasks at the same size class — and you run it on your laptop with no API call.

When to use it

Reach for it when:

  • The corpus fits on local hardware and the build doesn't have an API-budget mandate to "use OpenAI." For a markdown corpus the size of builddaily.io's, this is always.
  • You want zero embedding spend as a hard constraint.
  • The context window matters — chunk sizes up to ~3000 tokens fit cleanly into the model's 8192-token input without truncation.
  • You're already running Ollama for inference and don't want a separate embedding service.

Skip it when:

  • You need a dimension > 768 for some retrieval study. Look at bge-large or e5-mistral instead.
  • The corpus is multilingual-first and you need strong non-English performance — Cohere's embed-multilingual-v3 or bge-m3 lead here.
  • You need proprietary-stack support contracts for compliance reasons — open-weights models don't come with an SLA.

At a glance

Specs

  • Architecture — encoder-only transformer (BERT-family).
  • Parameters — 137M.
  • Output dimension — 768 (truncatable to 512 / 256 / 128 via Matryoshka).
  • Max input — 8192 tokens.
  • License — Apache 2.0 (commercial use OK).

Distribution

  • Ollamaollama pull nomic-embed-text → exposed via /api/embeddings on the local Ollama server. This is the GL default path.
  • sentence-transformersSentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True) for direct Python use.
  • llama.cpp / GGUF — for non-Ollama local hosting; same weights.
  • Hugging Face Inference API — hosted, free tier, but defeats the "stay local" benefit.

Prompt prefixes (important)

Nomic models expect a task prefix on each input:

  • search_document: <text> for documents you're indexing.
  • search_query: <text> for the user query you're matching against the index.
  • clustering: <text> for clustering use cases.
  • classification: <text> for classifier features.

Forgetting these silently drops retrieval quality by 5–15%. Most LlamaIndex integrations handle it automatically; raw API calls don't.

How to integrate

Default integration for a GL retrieval build:

  1. Pull the model. ollama pull nomic-embed-text on the host running Ollama. ~270MB.
  2. Verify the endpoint. curl http://localhost:11434/api/embeddings -d '{"model":"nomic-embed-text","prompt":"search_document: hello"}' → should return a 768-element vector.
  3. Wire to LlamaIndex. from llama_index.embeddings.ollama import OllamaEmbeddingembed_model = OllamaEmbedding(model_name="nomic-embed-text"). Confirm the integration applies the search_document: / search_query: prefixes — it does as of recent versions, but verify against a test query.
  4. Set globally. Settings.embed_model = embed_model so every index / retrieval uses it without per-call config.
  5. Persist the index. First-time embedding of a corpus is the only slow step; persist to disk (StorageContext.persist(...)) so subsequent runs are read-only.
  6. Spot-check. Pull a few documents, embed both query and document with explicit prefixes, cosine-similarity them by hand. Sanity-checks the wiring before trusting retrieval scores.

In the GL stack

builddaily.io

  • Chat-bridge retrieval upgrade — embedding choice. Replace the no-embed concat-all path with: chunked corpus → nomic-embed-text via Ollama → file-backed vector store. Zero new dependencies (model is already on the ALLOWED_MODELS list); zero API spend; same Ollama process the chat already uses.
  • Resources / Posts / Projects unified index. One persisted vector store covering all of web/content/; rebuilt nightly or on content PR merge.

paiddaily.io

  • Tickers-as-Resources index. 253 ticker pages embedded once; queried at chat time. The 8192-token context window absorbs the typical ticker page in a single chunk.
  • Catalyst archive index. Each Pendle catalyst as a Document with (market, date, outcome) metadata; query embedding fused with structured filters.

sagedaily.io

  • Astrology / tarot canon index. Vedic dasha entries, transit interpretations, tarot symbolism — currently inline in module prompts. Embed once, retrieve per reading.
  • User reading history index (paired with Neo4j). Neo4j answers when; this index answers what's similar.

Gotchas

  • Forgetting the task prefix tanks retrieval quality. Document gets search_document:, query gets search_query:. Easy to miss; silent failure mode.
  • Matryoshka truncation needs L2-normalization first. If you truncate from 768 to 256 dims, normalize the truncated vector — not the full one. Most libraries handle this; verify in a sanity test.
  • The Ollama embeddings endpoint is rate-limited differently than chat. High-throughput indexing should batch through the Python ollama package rather than the chat-bridge proxy.
  • Don't mix prefix-aware and prefix-naive corpora in one index. Re-index from scratch when changing the prefix convention.

Risks

  • Single-vendor research-shop output. Nomic AI is a small company. Apache-licensed weights mean you keep them even if Nomic disappears, but a model upgrade path depends on them continuing to ship.
  • Quality ceiling vs frontier embedders. text-embedding-3-small and bge-large both edge it out on some benchmarks. For a markdown corpus where retrieval quality dominates (e.g., scientific literature), worth A/B before locking in.

Related

  • LlamaIndex — the framework that hosts this model in the retrieval pipeline. OllamaEmbedding wires the two together.
  • bge-reranker — the natural companion downstream. Embeddings retrieve top-k=20; reranker collapses to top-n=5 before answer synthesis.