Farel Ganlaky
Articles
All Articles

From Keywords to Meaning: A Complete Guide to Text Search

A practical map for engineers who want to build text search, from exact matching to semantic, hybrid, and conversational systems.

Search map

One problem, different tools

1 / 7
shoes for rainy daysLiteral match
shoesrainydays
literal token foundTrail shoes

Lightweight trail shoes with storm-ready traction.

good product, wrong wordsWaterproof sneakers

Sealed mesh upper, dry feet, grippy sole.

good product, wrong wordsWater-resistant trainers

Built for wet pavement and everyday walking.

Literal matchExact keyword search checks whether the words appear as written. It is great for IDs and known items, but it misses products when the catalog uses different words.

I am writing this article for the engineer I was nine months ago, as a harness for everything I have been learning and building in production. I want it to be the ultimate resource for anyone who wants to build a platform around text search and does not know where to start or which step to take.

Let's say you want to build a search engine for an ecommerce platform, just for fun. A catalog, a search bar, product pages, filters, rankings. Then, on top of that, you want a chatbot so the user can simply talk and receive what they want.

A user comes in and looks for shoes for rainy days. In the catalog, you have a lot of shoes, but the names do not exactly match that. They say things like waterproof sneakers, water-resistant trainers, and trail shoes.

That tiny mismatch is the whole problem of search.

The mismatch
same need, different words
What you mean
shoesrainy days
shoes for rainy days
shoes for rainy days
What the catalog says
waterproof sneakerswater-resistant trainerstrail shoes
waterproof sneakers
waterproof sneakers
water-resistant trainers
water-resistant trainers
trail shoes
trail shoes
Search's job

Translate the words you use into the words the system has, without losing the intent in between.

What do you do? Where do you start? How do you ensure the user gets exactly what they are looking for?

This is exactly what we are going to figure out in this article. A good search system does not merely find text. It helps a person move from what they typed to what they meant.

This seems easy until you start listing everything that can go wrong. The user may misspell a brand. The product may use a synonym. A query may contain an exact product code, a vague desire, a price constraint, and a taste preference at the same time. The best result may not contain the query words. The top lexical match may be irrelevant. The most semantically similar result may ignore a crucial constraint. The user may need an answer, not a list.

0. Before Search: What Are We Building?

Platform shape

Two interfaces, one retrieval core

Surfaces

Search barChat interfaceProduct pages

Shared search layer

Parse intentRetrieve evidenceRank candidates

User value

Ranked resultsGuided answersProduct comparisons

The interface changes, but both paths need the same reliable retrieval foundation.

Before writing code, define the shape of the problem.

We are using ecommerce as the example, but the techniques apply to almost every kind of search: documentation search, support search, marketplace search, code search, enterprise search, legal search, and even RAG systems that need to retrieve the right context before generating an answer.

In ecommerce, relevance is not only about text. A good result may need to be relevant, in stock, available in the user's size, deliverable to their country, within budget, well reviewed, and visually close to what they imagined.

We also want a chatbot on top. That changes the interface, but it does not remove search. The chatbot still needs a retrieval system underneath it. If retrieval is weak, the chatbot becomes a fluent narrator of bad results.

So the platform has two main interfaces:

  • Search bar: the user types a query and expects ranked results.
  • Chat interface: the user describes a need and expects guidance, comparison, recommendations, or a direct answer.

Under both surfaces is the same core question: given this user intent, what evidence in the catalog should we retrieve, rank, and show?

Deep dive

One retrieval core behind two interfaces

Search barshoes for rain
intentretrievalranking
Chatfind me rain shoes

That is the thread we will follow.

1. Exact Matching: The Smallest Search Engine

Exact matching

The first search engine is just a loop

Query

shoesrainydays

Scan every product

contains shoes?contains rainy?contains days?

Returned

products with shoesexact SKU matchesmissed synonyms

Exact matching is precise for codes and names, but it misses intent when vocabulary changes.

The first thing we can do is embarrassingly simple.

Take the query. Scan every product. Return the products whose text contains the query string.

If the user searches Nike, return products that contain Nike. If the user searches invoice-8392 in an internal tool, return the document that contains invoice-8392. If they search a product SKU, return the matching product.

In our case, the user searches shoes for rainy days. A naive loop might split the query into words and return every product containing shoes. It may also look for rainy and days. If those words do not appear, the product is ignored.

This is exact keyword search. It is not fashionable, but it is still essential.

Exact search is what you want for IDs, SKUs, invoice numbers, usernames, error codes, filenames, quoted phrases, legal references, product model numbers, and anything where interpretation would be a bug.

If a developer searches for ERR_AUTH_4017, they do not want a conceptually similar authentication article. They want the exact page, log line, issue, or source file containing that code. If a shopper searches for a specific model number, the system should not get creative. It should find the model.

So exact matching is not bad. It is narrow. It answers one question: does this text contain that text?

Deep dive

Exact matching literally asks: does the string appear?

product.text.includes("shoes")waterproof sneakers
product.text.includes("shoes")city shoes
product.text.includes("shoes")trail shoes
product.text.includes("shoes")rain jacket

The problem is that our ecommerce user does not speak like the catalog.

They ask for shoes for rainy days, but the catalog says waterproof sneakers, water-resistant trainers, and trail shoes. Exact matching may return products containing shoes, but miss products that would actually be perfect.

That is our first limitation: exact matching is precise, but it has poor recall when language changes.

Precision means the results you return are likely correct. Recall means you found all the things that should have been returned. Exact search can be very precise. It often fails at recall.

Now we need the next layer.

2. The Inverted Index: Stop Reading Every Product

Indexing

The inverted index stops reading the whole catalog

Terms

shoeswaterprooftrail

Posting lists

shoes -> p1, p8waterproof -> p2, p8trail -> p5, p8

Fast lookup

intersect listsmerge candidatesskip full scan

Search becomes a prepared memory of where words appear.

A loop is easy to understand, but it does not scale.

If your catalog has 100 products, scanning all of them is fine. If it has 10 million products, scanning every title, description, review, and attribute for every query is not a search engine. It is a slow script.

The classic solution is the inverted index.

Instead of storing only products and scanning them one by one, the system builds a map from terms to the products that contain those terms.

A tiny index might look like this:

  • shoes: product1, product3, product8, product12
  • sneakers: product2, product4, product9
  • waterproof: product2, product8
  • trail: product5, product8, product11
  • nike: product3, product9

Now when the user searches, you do not read the whole catalog. You look up posting lists. A posting list is the list of documents or products where a term appears.

This is the first real architectural jump. Search becomes an indexing problem.

Deep dive

An inverted index flips products into term lookups

shoesp1 p3 p8
waterproofp2 p8
trailp5 p8
nikep3 p9

Real inverted indexes store more than product IDs. They may store how many times a term appears, where it appears, which field it appears in, and the positions of words. That information later helps ranking, phrase search, highlighting, and debugging.

For ecommerce, field information matters a lot. A match in a product title should usually count more than a match buried in a long description. A match in a brand field may count differently from a match in a review.

At this stage, you can build something fast. But fast is not enough. The system still needs to decide what counts as a searchable term.

3. Analysis: Turning Product Text Into Searchable Terms

Analysis

Raw product text becomes searchable terms

Raw title

Nike Pegasus TrailGORE-TEXWaterproof Shoes

Analyzer pipeline

character filterstokenizertoken filters

Terms

nikepegasustrailwaterproof

If analysis loses the signal, ranking cannot recover it later.

Before text can enter an index, it must be analyzed.

Analysis is the pipeline that turns raw text into searchable terms. This is where a lot of search quality is won or lost.

Deep dive

Analysis decides what the engine is allowed to find

Raw titleGORE-TEX Shoes
TokenizerGORE | TEX | Shoes
Lowercasegore | tex | shoes
Synonymswaterproof | shoes
Index termswaterproof shoes

A typical analyzer has three parts:

  1. Character filters clean the raw text before tokenization. They can remove HTML, normalize punctuation, or handle special characters.
  2. Tokenizers split text into tokens. They decide where words begin and end.
  3. Token filters transform tokens. They can lowercase, remove stop words, stem words, fold accents, apply synonyms, or create n-grams.

Take a product title like:

Nike Pegasus Trail 5 GORE-TEX Men's Waterproof Running Shoes

An analyzer may turn it into terms like nike, pegasus, trail, 5, gore, tex, men, waterproof, running, shoes.

But every decision matters.

  • If you split GORE-TEX into gore and tex, will users searching goretex still find it?
  • If you lowercase everything, will brand and model matching still behave correctly?
  • If you remove stop words, will phrases like to be or not to be still work in other domains?
  • If you stem running to run, will runner and running become useful matches or noisy ones?
  • If you treat sneakers and shoes as unrelated, will the rainy-day query miss good products?

There is no perfect analyzer. There is only an analyzer that fits the domain.

For ecommerce, you usually care about product names, brands, categories, attributes, sizes, colors, materials, and user language. Your analyzer must respect exact identifiers while still being flexible with normal words.

This is the second lesson: search quality starts before ranking. If analysis loses the meaning you need, the ranker cannot recover it later.

4. Ranking: Matching Is Not Enough

Ranking

BM25 turns matches into an ordered list

Candidates

p8: waterproof shoesp2: trail sneakerp4: shoe care

Scoring signals

rare termsfield boostslength normalization

Top results

p8 score 9.8p2 score 7.1p4 score 3.2

Matching finds candidates. Ranking decides what deserves attention first.

Now the system can find candidates quickly. But it still has to order them.

If 500 products contain shoes, which one should appear first?

The simplest ranking is count-based: products with more query terms rank higher. If a product contains shoes and rainy, it beats a product containing only shoes.

That is a start, but it is not enough.

A word that appears in every product description is not very informative. A rare word is more useful. A match in the title is more important than a match in a review. A short product title matching the query may be more relevant than a huge description that happens to contain the same word once.

This is where TF-IDF and BM25 enter.

Deep dive

BM25 balances rarity, repetition, and document length

rare termwaterproof
term freq2 mentions
length normshort title

TF-IDF combines two intuitions: terms that appear often in a document may matter, and terms that are rare across the collection matter more than common terms.

BM25 is one of the most durable ranking functions built from that family of ideas. You do not need the formula to understand the instinct:

  • Rare query terms should matter more than common query terms.
  • Repeating a term helps, but only up to a point.
  • Document length matters because matching a word in a short title is different from matching it somewhere in a giant text.

BM25 is strong because it is boring in the best way. It is fast, explainable, and hard to beat as a lexical baseline.

For our query, BM25 can help rank products that contain shoes, waterproof, rainy, trail, or related indexed terms. But if the product never uses the same words as the query, BM25 cannot invent the connection by itself.

So we need to teach the system more language.

5. Query Forgiveness: Phrase, Proximity, Typos, and Synonyms

Query forgiveness

Close words, typos, and domain language all help recall

Messy query

rainy shoesnik shooeswet streets

Forgiveness layer

phrase proximityedit distancedomain synonyms

Expanded candidates

waterproof sneakersNike shoeswater-resistant trainers

Forgiveness should recover intent without making every product match every query.

Once lexical search works, the next job is forgiveness. People do not type like clean databases, and products do not describe themselves with one perfect vocabulary.

First, word position matters. A product that says waterproof running shoes is probably more relevant than a product where waterproof appears in one review, running appears in another paragraph, and shoes appears somewhere else. Phrase and proximity search reward words that form one idea.

Second, spelling is messy. If the user types nik waterproof shooes, the system should probably understand Nike waterproof shoes. Fuzzy search uses edit distance to recover from insertions, deletions, substitutions, and transpositions.

Third, vocabulary differs. The user says shoes. The catalog says sneakers or trainers. The user says rainy days. The catalog says waterproof or water-resistant. Synonyms connect those words, but they should come from the domain, not from a random thesaurus.

Deep dive

Query forgiveness combines closeness, typos, and domain language

Phrase: waterproof shoes stay together
Typo: shooes becomes shoes
Forgiven query
Synonym: rainy maps to waterproof
Autocomplete: rain suggests waterproof sneakers

These tools are powerful because they solve different kinds of mismatch:

  • Phrase and proximity protect meaning when word order and closeness matter.
  • Fuzzy search forgives typos while keeping exact IDs, SKUs, and model numbers protected.
  • Synonyms teach the system that sneakers, trainers, and shoes may belong together.
  • Autocomplete can gently guide users toward searchable language before they submit the query.

The danger is over-expansion. If everything becomes related to everything, search gets generous but confused. Good query forgiveness improves recall without destroying precision.

6. Filters And Facets: Constraints Are Not Vibes

Facets and filters

Constraints should be enforced, not guessed

Constraints

under $120size 42in stock

Structured filters

price <= 120variant has size 42inventory > 0

Valid candidates

relevantavailablebuyable

Ranking should not be asked to fix constraints that filters can enforce.

A user rarely wants only "shoes". They want size 42, under $120, black, in stock, deliverable this week, suitable for rain, maybe from Nike or Salomon.

That is not just text search. That is search plus structured constraints.

Deep dive

Filters enforce constraints before ranking

all shoeswaterproofsize 42under $120in stock

Filters let the system restrict results by structured fields. Facets let users refine results and understand the result set.

For our ecommerce platform, useful fields might include:

  • category
  • brand
  • price
  • size
  • color
  • material
  • waterproof or water-resistant
  • terrain
  • rating
  • availability

Facets are important because they move some work out of the query. The user does not need to type waterproof Nike trail shoes under 120 size 42 black. They can type the rough intent, then use filters.

For the chatbot, structured fields matter even more. If the user says "under $120", the system should not treat that as vague text. It should apply a price filter. If they say "available in size 42", it should check inventory.

This is a major lesson: not everything should be solved by ranking. Some things should be solved by filtering.

7. Semantic Search: Meaning, Not Just Words

Semantic search

Meaning can be modeled in more than one way

Natural language

shoes that will not get soakedwalking in wet streets

Meaning models

knowledge graphontologyembeddings

Intent matches

waterproof sneakerscity walking trainersrain-ready shoes

Semantic search is the goal; vectors are the most common modern implementation.

At this point, lexical search is fairly strong. We have exact matching, an inverted index, analysis, BM25, query forgiveness, filters, and facets.

But there is still a problem: users can describe needs in ways your rules never anticipated.

They may search:

  • shoes that will not get soaked
  • something for walking in the rain
  • comfortable sneakers for wet streets
  • what should I wear for a rainy commute?

You cannot manually write synonyms for every possible phrasing.

Semantic search tries to solve this by matching meaning instead of only matching words.

Semantic search is broader than vector search. At this level, semantic search names the goal: retrieve by meaning, not only by surface words. You can model meaning with knowledge graphs, ontologies, taxonomies, rules, structured domain relationships, or embeddings.

Deep dive

Semantic search is broader than vectors

Knowledge graphrain -> weather -> waterproof
Ontologytrail shoe is-a shoe
Embeddingsmeaning as geometry

Semantic search is the goal. Vectors are one implementation, not the definition.

The most common modern tool is an embedding model because it scales well, works with natural language, and can retrieve paraphrases without hand-writing every connection. An embedding turns text into a vector: a list of numbers representing something about meaning. Queries and products become points in a high-dimensional space. Similar meanings should land near one another.

Now shoes for rainy days can be close to waterproof sneakers even if the words are different.

This is powerful for vague intent, natural language, paraphrases, and discovery. It is also useful for the chatbot because users naturally speak in full sentences.

But semantic search is not magic.

It may blur details that should remain sharp. It might think hiking boots and waterproof sneakers are similar, even if the user wanted lightweight city shoes. It may ignore exact constraints like Nike, size 42, under $120, or a specific SKU. It may retrieve conceptually similar products that are wrong in practice.

So semantic search improves recall over meaning, but it can hurt precision over constraints.

That is why modern systems rarely choose pure lexical or pure semantic search. They combine them.

8. Vector Indexes: Making Semantic Search Fast

Vector index

Approximate nearest neighbors make meaning search fast

Embedding

query vectorproduct vectorsmetadata

ANN graph

shortcutsneighbor hopscandidate recall

Fast semantic pool

top 100 vectorslow latencytunable recall

Vector search is also a systems problem: memory, latency, updates, and recall.

If every product has an embedding, searching means finding vectors near the query vector.

For a small catalog, you can compare the query vector to every product vector. For a large catalog, that becomes expensive.

Vector indexes solve this problem. They use approximate nearest neighbor search to find close vectors quickly without comparing against everything.

Deep dive

Vector search embeds text, then compares geometry

cos(q, p) = q * p / ||q||||p||
query[0.12, 0.83, 0.44]
waterproof sneakers0.91
trail shoes0.84
umbrella0.38

Systems may use algorithms and structures such as HNSW graphs, IVF indexes, product quantization, or GPU search. The details can get deep, but the product question is simple: can we retrieve semantically relevant candidates fast enough?

Vector search introduces its own engineering decisions:

  • Which embedding model should we use?
  • Should we embed titles, descriptions, attributes, reviews, or all of them?
  • How do we represent product variants?
  • How often do we refresh embeddings?
  • How do we handle deleted or out-of-stock products?
  • How much latency and memory can we afford?

For ecommerce, product representation matters. A product title alone may be too thin. A full description may be too noisy. Often you build a clean text representation from title, brand, category, attributes, and a short description, then embed that.

9. Hybrid Search: The Practical Default

Hybrid retrieval

Lexical precision plus semantic flexibility

Mixed query

waterproofNiketrailunder $120

Parallel retrievers

BM25 candidatesvector candidatesfilters

Merged pool

exact brand keptintent recoveredconstraints respected

Hybrid search works because lexical and semantic retrieval fail differently.

Hybrid search combines lexical retrieval and semantic retrieval.

Deep dive

Hybrid retrieval merges different evidence streams

BM25: Nike, trail, shoes
Vector: rainy-day intent
RRF merge
Filters: under $120, in stock
Hybrid top results

This is usually where search starts feeling production-grade.

Our query may contain multiple kinds of intent:

waterproof Nike trail shoes under $120

Here is what each layer should do:

  • Lexical search respects exact words like Nike and trail.
  • Semantic search connects waterproof to rainy-day intent.
  • Filters enforce under $120 and availability.
  • Business signals may consider rating, stock, margin, or shipping speed.

Hybrid search can run BM25 and vector search in parallel, then merge the results. It can also retrieve a candidate pool from both systems and send that pool to a reranker.

One common merging approach is reciprocal rank fusion. Instead of trying to compare raw BM25 scores with vector similarity scores directly, it combines rank positions. A product that appears high in both lists gets promoted.

Hybrid search works because lexical and semantic systems fail differently. Lexical search is precise but literal. Semantic search is flexible but sometimes soft. Together, they cover more ground.

10. Reranking: Retrieve Broadly, Then Think Carefully

Reranking

Retrieve broadly, then think carefully

Candidate pool

top 200 productsmixed signalsrough order

Second pass

query + productrelevance modelreorder top K

Sharper top results

best match #1near miss demotedaccessory removed

Rerankers improve the top of the list, but only if retrieval found the right candidates.

Most serious search systems are multi-stage.

The first stage retrieves candidates quickly. The second stage reranks a smaller set more carefully.

Deep dive

A reranker reads query and product together

candidate #8rerank #2
candidate #2rerank #1
candidate #14rerank #4

A reranker looks at the query and each candidate product together, then predicts relevance. This can be more accurate than comparing separate embeddings because the model sees the interaction between the query and the product.

For example, a first-stage hybrid search may retrieve 200 products. A reranker then sorts the top 50 by how well they answer the user's actual intent.

Reranking is useful because it can understand details like:

  • Does this product actually satisfy rainy-day use?
  • Is this a shoe or only an accessory?
  • Is the waterproof feature central or just mentioned casually?
  • Does the product match the user's constraints?

But reranking has trade-offs. It adds latency and cost. It cannot fix a bad candidate pool. If the right product never appears in the first-stage results, the reranker cannot rescue it.

The pattern is: retrieve broadly, then think carefully.

11. Search as an AI Interface: Chat, RAG, and Agents

AI interface

Chat, RAG, and agents still depend on retrieval

User says

rainy city walkingnot hikingnot too expensive

Search tools

retrieve evidenceapply filterscompare candidates

Grounded answer

recommendationstrade-offsfollow-up if unclear

The model can narrate the answer, but search has to supply trustworthy evidence.

Now let us return to the chatbot. The user does not type a short query. They say:

I need shoes I can wear on rainy days, mostly for walking in the city, not hiking, and I do not want anything too expensive.

A chatbot has to parse intent before it searches. It should identify the product type, use case, terrain, negative preference, and price preference. Then it should turn those into retrieval actions: semantic search for rainy-day intent, lexical matching for shoes, filters for category and price, and maybe a reranking penalty for hiking boots.

This is where search becomes the foundation for the AI interface. In RAG, the language model writes the answer, but retrieval supplies the evidence. In agentic search, the system may need several steps: extract constraints, search, filter, compare, ask a follow-up question, and recommend.

Deep dive

Chat, RAG, and agents are interfaces over retrieval

parse requestretrieve productsapply constraintsgenerate grounded answerplan another step

But the rule stays the same: the model should not invent products. It should use search tools, inspect product data, compare candidates, and answer with grounded recommendations.

If retrieval is weak, RAG gives fluent bad answers. If the catalog data is messy, the chatbot becomes confident about the wrong things. If filters are ignored, the answer may sound helpful while recommending products the user cannot buy.

The chatbot is not separate from search. It is another interface over search.

12. Operating the System: Evaluation, Debugging, and Build Order

Operations

Measure, trace, and build in the right order

Query set

head queriestyposintent querieschat queries

Operating loop

measure qualitytrace failuresship next layer

Search platform

better rankingfewer dead endsgrounded chat

Evaluation and debugging are what turn a demo into a search system.

Search quality is not something you feel once and declare solved. You measure it, debug it, and improve it layer by layer.

Start with a query set: head queries, exact product codes, typo queries, synonym queries, filter-heavy queries, no-result queries, and chat-style requests. For each query, define what good results look like.

Useful metrics include precision, recall, MRR, NDCG, recall@K, latency, zero-result rate, and reformulation rate. For ecommerce, also watch add-to-cart rate, conversion, filter usage, and abandonment. For chat, watch grounded answer rate and whether users keep correcting the assistant.

Deep dive

Evaluation, debugging, and build order form one loop

query setmetricstracefix layershipmonitor

When search fails, trace the pipeline instead of blaming ranking immediately:

  1. Did the product enter the system?
  2. Did analysis create the right terms?
  3. Did lexical or semantic retrieval find it?
  4. Did a filter remove it?
  5. Did the scorer or reranker demote it?
  6. Did the UI or chatbot distort the evidence?

If I were building this from zero, I would start with clean product data, exact and lexical search, analyzers, BM25, filters, typo tolerance, autocomplete, and synonyms. Then I would add evaluation sets, semantic search, hybrid retrieval, reranking, and finally chat, RAG, or agentic flows.

Learning to rank can come later, when you have enough reliable judgments or behavior data to learn from. It is powerful, but behavior data is biased, so it should improve a strong system rather than replace judgment.

The practical order is simple: build the foundation first, add intelligence when the earlier layer exposes a real limitation, and keep measuring.

13. The Core Mental Model

Mental model

Each layer translates one kind of mismatch

User intent

wordsconstraintsmeaninggoals

Search stack

lexicalstructuredsemanticconversational

Useful evidence

right candidatesright ordergrounded answer

Do not ask whether a technique is the future. Ask which failure mode it solves.

The whole story can be compressed into one sentence:

Search is the engineering of translation between user intent and stored evidence.

Deep dive

Every layer solves one failure mode

characterstermsrankingconstraintsmeaninganswers

Exact matching translates characters. Analyzers translate text into terms. BM25 translates term evidence into lexical relevance. Query forgiveness translates messy language. Filters translate constraints. Embeddings translate meaning. Hybrid search translates across retrieval methods. Rerankers translate candidate sets into better ordering. RAG translates retrieved evidence into answers. Agents translate user goals into multi-step retrieval plans.

Each layer exists because the previous layer was useful but incomplete.

So when you hear about a new search technique, do not ask: is this the future of search?

Ask: which failure mode does this solve?

That question will keep you grounded.

The future of search is not simply semantic, or conversational, or agentic. It is layered. Exact when exactness matters. Lexical when words carry the signal. Structured when constraints matter. Semantic when vocabulary diverges. Hybrid when real queries mix all of it. Conversational when the user needs help deciding.

The user types a few words. The system has to understand the need, find the evidence, respect the constraints, rank the candidates, and present something useful.

That is the craft.

Stay up to date

Get notified when I publish something new.