toolNomic AIwatching

nomic-embed-text

A small, fast, open-weight text-embedding model. 137M parameters, 768-dimensional output, 8192-token context. Runs locally through Ollama — same shape as your existing chat model surface. Free, Apache 2.0-licensed weights. The default embedding choice for any GL corpus that fits on local hardware (i.e., all of them, today).

Updated May 24, 2026

nomic-embed-text is the embedding model that lets a GL RAG build stay entirely local with no quality trade you'd notice. Already on the chat-bridge's ALLOWED_MODELS list — just not used for retrieval yet. This page is the orient-and-wire-it surface.

What it is

A general-purpose sentence-embedding model from Nomic AI. Open weights (Apache 2.0); 137M parameters; produces 768-dimensional vectors; supports up to 8192-token input — long-document chunks fit cleanly without aggressive splitting. Released as nomic-embed-text-v1.5 (current) with Matryoshka-Representation-Learning: you can truncate the 768-dim vector down to 512 / 256 / 128 dims at retrieval time and trade quality for storage cleanly.

On MTEB (the standard embedding leaderboard) it sits above OpenAI's text-embedding-3-small on retrieval tasks at the same size class — and you run it on your laptop with no API call.

When to use it

Reach for it when:

The corpus fits on local hardware and the build doesn't have an API-budget mandate to "use OpenAI." For a markdown corpus the size of builddaily.io's, this is always.
You want zero embedding spend as a hard constraint.
The context window matters — chunk sizes up to ~3000 tokens fit cleanly into the model's 8192-token input without truncation.
You're already running Ollama for inference and don't want a separate embedding service.

Skip it when:

You need a dimension > 768 for some retrieval study. Look at bge-large or e5-mistral instead.
The corpus is multilingual-first and you need strong non-English performance — Cohere's embed-multilingual-v3 or bge-m3 lead here.
You need proprietary-stack support contracts for compliance reasons — open-weights models don't come with an SLA.

At a glance

Specs

Architecture — encoder-only transformer (BERT-family).
Parameters — 137M.
Output dimension — 768 (truncatable to 512 / 256 / 128 via Matryoshka).
Max input — 8192 tokens.
License — Apache 2.0 (commercial use OK).

Distribution

Ollama — ollama pull nomic-embed-text → exposed via /api/embeddings on the local Ollama server. This is the GL default path.
sentence-transformers — SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True) for direct Python use.
llama.cpp / GGUF — for non-Ollama local hosting; same weights.
Hugging Face Inference API — hosted, free tier, but defeats the "stay local" benefit.

Prompt prefixes (important)

Nomic models expect a task prefix on each input:

search_document: <text> for documents you're indexing.
search_query: <text> for the user query you're matching against the index.
clustering: <text> for clustering use cases.
classification: <text> for classifier features.

Forgetting these silently drops retrieval quality by 5–15%. Most LlamaIndex integrations handle it automatically; raw API calls don't.

How to integrate

Default integration for a GL retrieval build:

Pull the model. ollama pull nomic-embed-text on the host running Ollama. ~270MB.
Verify the endpoint. curl http://localhost:11434/api/embeddings -d '{"model":"nomic-embed-text","prompt":"search_document: hello"}' → should return a 768-element vector.
Wire to LlamaIndex. from llama_index.embeddings.ollama import OllamaEmbedding → embed_model = OllamaEmbedding(model_name="nomic-embed-text"). Confirm the integration applies the search_document: / search_query: prefixes — it does as of recent versions, but verify against a test query.
Set globally. Settings.embed_model = embed_model so every index / retrieval uses it without per-call config.
Persist the index. First-time embedding of a corpus is the only slow step; persist to disk (StorageContext.persist(...)) so subsequent runs are read-only.
Spot-check. Pull a few documents, embed both query and document with explicit prefixes, cosine-similarity them by hand. Sanity-checks the wiring before trusting retrieval scores.

In the GL stack

builddaily.io

Chat-bridge retrieval upgrade — embedding choice. Replace the no-embed concat-all path with: chunked corpus → nomic-embed-text via Ollama → file-backed vector store. Zero new dependencies (model is already on the ALLOWED_MODELS list); zero API spend; same Ollama process the chat already uses.
Resources / Posts / Projects unified index. One persisted vector store covering all of web/content/; rebuilt nightly or on content PR merge.

paiddaily.io

Tickers-as-Resources index. 253 ticker pages embedded once; queried at chat time. The 8192-token context window absorbs the typical ticker page in a single chunk.
Catalyst archive index. Each Pendle catalyst as a Document with (market, date, outcome) metadata; query embedding fused with structured filters.

sagedaily.io

Astrology / tarot canon index. Vedic dasha entries, transit interpretations, tarot symbolism — currently inline in module prompts. Embed once, retrieve per reading.
User reading history index (paired with Neo4j). Neo4j answers when; this index answers what's similar.

Gotchas

Forgetting the task prefix tanks retrieval quality. Document gets search_document:, query gets search_query:. Easy to miss; silent failure mode.
Matryoshka truncation needs L2-normalization first. If you truncate from 768 to 256 dims, normalize the truncated vector — not the full one. Most libraries handle this; verify in a sanity test.
The Ollama embeddings endpoint is rate-limited differently than chat. High-throughput indexing should batch through the Python ollama package rather than the chat-bridge proxy.
Don't mix prefix-aware and prefix-naive corpora in one index. Re-index from scratch when changing the prefix convention.

Risks

Single-vendor research-shop output. Nomic AI is a small company. Apache-licensed weights mean you keep them even if Nomic disappears, but a model upgrade path depends on them continuing to ship.
Quality ceiling vs frontier embedders. text-embedding-3-small and bge-large both edge it out on some benchmarks. For a markdown corpus where retrieval quality dominates (e.g., scientific literature), worth A/B before locking in.

Alternatives · 5 substitutesPick nomic-embed-text unless one of these wins on your specific brief.

01
OpenAI text-embedding-3-small
Hosted embedding, 1536-dim, $0.02 per 1M tokens.
Wins when ▸the project is already on the OpenAI API surface and adding another local model isn't worth it. Edges nomic on some benchmarks; you pay per call.
02
bge-large-en-v1.5 · BAAI
Open-weight 1024-dim embedder; sits at the top of MTEB.
Wins when ▸retrieval quality is the bottleneck and an extra 4–6 points of MTEB matter more than smaller vectors. ~335M params, slower; uses more storage per vector. Same Apache-2.0 license; same local-deploy story.
03
mxbai-embed-large · Mixedbread AI
Open-weight 1024-dim, Matryoshka-trained, available on Ollama.
Wins when ▸you want the Ollama deploy story but want a beefier embedder than nomic. Trades model size for a marginal quality edge. Worth A/B if retrieval quality is bottlenecking the build.
04
Cohere embed-v3
Hosted, multilingual-strong; native input-type-aware API.
Wins when ▸multilingual retrieval is core to the build. English-only? nomic is competitive at zero spend. Cohere's strongest argument is the rerank API alongside the embed API.
05
e5-mistral-7b-instruct
Instruction-tuned 7B-parameter embedder; top of MTEB for English.
Wins when ▸you have GPU headroom and retrieval quality is the bottleneck. 50× the parameters of nomic; serious infra ask. For a corpus the size of builddaily.io it's overkill.

LlamaIndex — the framework that hosts this model in the retrieval pipeline. OllamaEmbedding wires the two together.
bge-reranker — the natural companion downstream. Embeddings retrieve top-k=20; reranker collapses to top-n=5 before answer synthesis.