Search map
One problem, different tools
Lightweight trail shoes with storm-ready traction.
Sealed mesh upper, dry feet, grippy sole.
Built for wet pavement and everyday walking.
Literal matchExact keyword search checks whether the words appear as written. It is great for IDs and known items, but it misses products when the catalog uses different words.
I am writing this article for the engineer I was nine months ago, as a harness for everything I have been learning and building in production. I want it to be the ultimate resource for anyone who wants to build a platform around text search and does not know where to start or which step to take.
Let's say you want to build a search engine for an ecommerce platform, just for fun. A catalog, a search bar, product pages, filters, rankings. Then, on top of that, you want a chatbot so the user can simply talk and receive what they want.
A user comes in and looks for shoes for rainy days. In the catalog, you have a lot of shoes, but the names do not exactly match that. They say things like waterproof sneakers, water-resistant trainers, and trail shoes.
That tiny mismatch is the whole problem of search.


Translate the words you use into the words the system has, without losing the intent in between.
What do you do? Where do you start? How do you ensure the user gets exactly what they are looking for?
This is exactly what we are going to figure out in this article. A good search system does not merely find text. It helps a person move from what they typed to what they meant.
This seems easy until you start listing everything that can go wrong. The user may misspell a brand. The product may use a synonym. A query may contain an exact product code, a vague desire, a price constraint, and a taste preference at the same time. The best result may not contain the query words. The top lexical match may be irrelevant. The most semantically similar result may ignore a crucial constraint. The user may need an answer, not a list.
0. Before Search: What Are We Building?
Platform shape
Two interfaces, one retrieval core
Surfaces
Shared search layer
User value
The interface changes, but both paths need the same reliable retrieval foundation.
Before writing code, define the shape of the problem.
We are using ecommerce as the example, but the techniques apply to almost every kind of search: documentation search, support search, marketplace search, code search, enterprise search, legal search, and even RAG systems that need to retrieve the right context before generating an answer.
In ecommerce, relevance is not only about text. A good result may need to be relevant, in stock, available in the user's size, deliverable to their country, within budget, well reviewed, and visually close to what they imagined.
We also want a chatbot on top. That changes the interface, but it does not remove search. The chatbot still needs a retrieval system underneath it. If retrieval is weak, the chatbot becomes a fluent narrator of bad results.
So the platform has two main interfaces:
- Search bar: the user types a query and expects ranked results.
- Chat interface: the user describes a need and expects guidance, comparison, recommendations, or a direct answer.
Under both surfaces is the same core question: given this user intent, what evidence in the catalog should we retrieve, rank, and show?
Deep dive
One retrieval core behind two interfaces
That is the thread we will follow.
1. Exact Matching: The Smallest Search Engine
Exact matching
The first search engine is just a loop
Query
Scan every product
Returned
Exact matching is precise for codes and names, but it misses intent when vocabulary changes.
The first thing we can do is embarrassingly simple.
Take the query. Scan every product. Return the products whose text contains the query string.
If the user searches Nike, return products that contain Nike. If the user searches invoice-8392 in an internal tool, return the document that contains invoice-8392. If they search a product SKU, return the matching product.
In our case, the user searches shoes for rainy days. A naive loop might split the query into words and return every product containing shoes. It may also look for rainy and days. If those words do not appear, the product is ignored.
This is exact keyword search. It is not fashionable, but it is still essential.
Exact search is what you want for IDs, SKUs, invoice numbers, usernames, error codes, filenames, quoted phrases, legal references, product model numbers, and anything where interpretation would be a bug.
If a developer searches for ERR_AUTH_4017, they do not want a conceptually similar authentication article. They want the exact page, log line, issue, or source file containing that code. If a shopper searches for a specific model number, the system should not get creative. It should find the model.
So exact matching is not bad. It is narrow. It answers one question: does this text contain that text?
Deep dive
Exact matching literally asks: does the string appear?
product.text.includes("shoes")waterproof sneakersproduct.text.includes("shoes")city shoesproduct.text.includes("shoes")trail shoesproduct.text.includes("shoes")rain jacketThe problem is that our ecommerce user does not speak like the catalog.
They ask for shoes for rainy days, but the catalog says waterproof sneakers, water-resistant trainers, and trail shoes. Exact matching may return products containing shoes, but miss products that would actually be perfect.
That is our first limitation: exact matching is precise, but it has poor recall when language changes.
Precision means the results you return are likely correct. Recall means you found all the things that should have been returned. Exact search can be very precise. It often fails at recall.
Now we need the next layer.
2. The Inverted Index: Stop Reading Every Product
Indexing
The inverted index stops reading the whole catalog
Terms
Posting lists
Fast lookup
Search becomes a prepared memory of where words appear.
A loop is easy to understand, but it does not scale.
If your catalog has 100 products, scanning all of them is fine. If it has 10 million products, scanning every title, description, review, and attribute for every query is not a search engine. It is a slow script.
The classic solution is the inverted index.
Instead of storing only products and scanning them one by one, the system builds a map from terms to the products that contain those terms.
A tiny index might look like this:
- shoes: product1, product3, product8, product12
- sneakers: product2, product4, product9
- waterproof: product2, product8
- trail: product5, product8, product11
- nike: product3, product9
Now when the user searches, you do not read the whole catalog. You look up posting lists. A posting list is the list of documents or products where a term appears.
This is the first real architectural jump. Search becomes an indexing problem.
Deep dive
An inverted index flips products into term lookups
Real inverted indexes store more than product IDs. They may store how many times a term appears, where it appears, which field it appears in, and the positions of words. That information later helps ranking, phrase search, highlighting, and debugging.
For ecommerce, field information matters a lot. A match in a product title should usually count more than a match buried in a long description. A match in a brand field may count differently from a match in a review.
At this stage, you can build something fast. But fast is not enough. The system still needs to decide what counts as a searchable term.
3. Analysis: Turning Product Text Into Searchable Terms
Analysis
Raw product text becomes searchable terms
Raw title
Analyzer pipeline
Terms
If analysis loses the signal, ranking cannot recover it later.
Before text can enter an index, it must be analyzed.
Analysis is the pipeline that turns raw text into searchable terms. This is where a lot of search quality is won or lost.
Deep dive
Analysis decides what the engine is allowed to find
A typical analyzer has three parts:
- Character filters clean the raw text before tokenization. They can remove HTML, normalize punctuation, or handle special characters.
- Tokenizers split text into tokens. They decide where words begin and end.
- Token filters transform tokens. They can lowercase, remove stop words, stem words, fold accents, apply synonyms, or create n-grams.
Take a product title like:
Nike Pegasus Trail 5 GORE-TEX Men's Waterproof Running Shoes
An analyzer may turn it into terms like nike, pegasus, trail, 5, gore, tex, men, waterproof, running, shoes.
But every decision matters.
- If you split GORE-TEX into gore and tex, will users searching goretex still find it?
- If you lowercase everything, will brand and model matching still behave correctly?
- If you remove stop words, will phrases like to be or not to be still work in other domains?
- If you stem running to run, will runner and running become useful matches or noisy ones?
- If you treat sneakers and shoes as unrelated, will the rainy-day query miss good products?
There is no perfect analyzer. There is only an analyzer that fits the domain.
For ecommerce, you usually care about product names, brands, categories, attributes, sizes, colors, materials, and user language. Your analyzer must respect exact identifiers while still being flexible with normal words.
This is the second lesson: search quality starts before ranking. If analysis loses the meaning you need, the ranker cannot recover it later.
4. Ranking: Matching Is Not Enough
Ranking
BM25 turns matches into an ordered list
Candidates
Scoring signals
Top results
Matching finds candidates. Ranking decides what deserves attention first.
Now the system can find candidates quickly. But it still has to order them.
If 500 products contain shoes, which one should appear first?
The simplest ranking is count-based: products with more query terms rank higher. If a product contains shoes and rainy, it beats a product containing only shoes.
That is a start, but it is not enough.
A word that appears in every product description is not very informative. A rare word is more useful. A match in the title is more important than a match in a review. A short product title matching the query may be more relevant than a huge description that happens to contain the same word once.
This is where TF-IDF and BM25 enter.
Deep dive
BM25 balances rarity, repetition, and document length
TF-IDF combines two intuitions: terms that appear often in a document may matter, and terms that are rare across the collection matter more than common terms.
BM25 is one of the most durable ranking functions built from that family of ideas. You do not need the formula to understand the instinct:
- Rare query terms should matter more than common query terms.
- Repeating a term helps, but only up to a point.
- Document length matters because matching a word in a short title is different from matching it somewhere in a giant text.
BM25 is strong because it is boring in the best way. It is fast, explainable, and hard to beat as a lexical baseline.
For our query, BM25 can help rank products that contain shoes, waterproof, rainy, trail, or related indexed terms. But if the product never uses the same words as the query, BM25 cannot invent the connection by itself.
So we need to teach the system more language.
5. Query Forgiveness: Phrase, Proximity, Typos, and Synonyms
Query forgiveness
Close words, typos, and domain language all help recall
Messy query
Forgiveness layer
Expanded candidates
Forgiveness should recover intent without making every product match every query.
Once lexical search works, the next job is forgiveness. People do not type like clean databases, and products do not describe themselves with one perfect vocabulary.
First, word position matters. A product that says waterproof running shoes is probably more relevant than a product where waterproof appears in one review, running appears in another paragraph, and shoes appears somewhere else. Phrase and proximity search reward words that form one idea.
Second, spelling is messy. If the user types nik waterproof shooes, the system should probably understand Nike waterproof shoes. Fuzzy search uses edit distance to recover from insertions, deletions, substitutions, and transpositions.
Third, vocabulary differs. The user says shoes. The catalog says sneakers or trainers. The user says rainy days. The catalog says waterproof or water-resistant. Synonyms connect those words, but they should come from the domain, not from a random thesaurus.
Deep dive
Query forgiveness combines closeness, typos, and domain language
These tools are powerful because they solve different kinds of mismatch:
- Phrase and proximity protect meaning when word order and closeness matter.
- Fuzzy search forgives typos while keeping exact IDs, SKUs, and model numbers protected.
- Synonyms teach the system that sneakers, trainers, and shoes may belong together.
- Autocomplete can gently guide users toward searchable language before they submit the query.
The danger is over-expansion. If everything becomes related to everything, search gets generous but confused. Good query forgiveness improves recall without destroying precision.
6. Filters And Facets: Constraints Are Not Vibes
Facets and filters
Constraints should be enforced, not guessed
Constraints
Structured filters
Valid candidates
Ranking should not be asked to fix constraints that filters can enforce.
A user rarely wants only "shoes". They want size 42, under $120, black, in stock, deliverable this week, suitable for rain, maybe from Nike or Salomon.
That is not just text search. That is search plus structured constraints.
Deep dive
Filters enforce constraints before ranking
Filters let the system restrict results by structured fields. Facets let users refine results and understand the result set.
For our ecommerce platform, useful fields might include:
- category
- brand
- price
- size
- color
- material
- waterproof or water-resistant
- terrain
- rating
- availability
Facets are important because they move some work out of the query. The user does not need to type waterproof Nike trail shoes under 120 size 42 black. They can type the rough intent, then use filters.
For the chatbot, structured fields matter even more. If the user says "under $120", the system should not treat that as vague text. It should apply a price filter. If they say "available in size 42", it should check inventory.
This is a major lesson: not everything should be solved by ranking. Some things should be solved by filtering.
7. Semantic Search: Meaning, Not Just Words
Semantic search
Meaning can be modeled in more than one way
Natural language
Meaning models
Intent matches
Semantic search is the goal; vectors are the most common modern implementation.
At this point, lexical search is fairly strong. We have exact matching, an inverted index, analysis, BM25, query forgiveness, filters, and facets.
But there is still a problem: users can describe needs in ways your rules never anticipated.
They may search:
- shoes that will not get soaked
- something for walking in the rain
- comfortable sneakers for wet streets
- what should I wear for a rainy commute?
You cannot manually write synonyms for every possible phrasing.
Semantic search tries to solve this by matching meaning instead of only matching words.
Semantic search is broader than vector search. At this level, semantic search names the goal: retrieve by meaning, not only by surface words. You can model meaning with knowledge graphs, ontologies, taxonomies, rules, structured domain relationships, or embeddings.
Deep dive
Semantic search is broader than vectors
Semantic search is the goal. Vectors are one implementation, not the definition.
The most common modern tool is an embedding model because it scales well, works with natural language, and can retrieve paraphrases without hand-writing every connection. An embedding turns text into a vector: a list of numbers representing something about meaning. Queries and products become points in a high-dimensional space. Similar meanings should land near one another.
Now shoes for rainy days can be close to waterproof sneakers even if the words are different.
This is powerful for vague intent, natural language, paraphrases, and discovery. It is also useful for the chatbot because users naturally speak in full sentences.
But semantic search is not magic.
It may blur details that should remain sharp. It might think hiking boots and waterproof sneakers are similar, even if the user wanted lightweight city shoes. It may ignore exact constraints like Nike, size 42, under $120, or a specific SKU. It may retrieve conceptually similar products that are wrong in practice.
So semantic search improves recall over meaning, but it can hurt precision over constraints.
That is why modern systems rarely choose pure lexical or pure semantic search. They combine them.
8. Vector Indexes: Making Semantic Search Fast
Vector index
Approximate nearest neighbors make meaning search fast
Embedding
ANN graph
Fast semantic pool
Vector search is also a systems problem: memory, latency, updates, and recall.
If every product has an embedding, searching means finding vectors near the query vector.
For a small catalog, you can compare the query vector to every product vector. For a large catalog, that becomes expensive.
Vector indexes solve this problem. They use approximate nearest neighbor search to find close vectors quickly without comparing against everything.
Deep dive
Vector search embeds text, then compares geometry
Systems may use algorithms and structures such as HNSW graphs, IVF indexes, product quantization, or GPU search. The details can get deep, but the product question is simple: can we retrieve semantically relevant candidates fast enough?
Vector search introduces its own engineering decisions:
- Which embedding model should we use?
- Should we embed titles, descriptions, attributes, reviews, or all of them?
- How do we represent product variants?
- How often do we refresh embeddings?
- How do we handle deleted or out-of-stock products?
- How much latency and memory can we afford?
For ecommerce, product representation matters. A product title alone may be too thin. A full description may be too noisy. Often you build a clean text representation from title, brand, category, attributes, and a short description, then embed that.
9. Hybrid Search: The Practical Default
Hybrid retrieval
Lexical precision plus semantic flexibility
Mixed query
Parallel retrievers
Merged pool
Hybrid search works because lexical and semantic retrieval fail differently.
Hybrid search combines lexical retrieval and semantic retrieval.
Deep dive
Hybrid retrieval merges different evidence streams
This is usually where search starts feeling production-grade.
Our query may contain multiple kinds of intent:
waterproof Nike trail shoes under $120
Here is what each layer should do:
- Lexical search respects exact words like Nike and trail.
- Semantic search connects waterproof to rainy-day intent.
- Filters enforce under $120 and availability.
- Business signals may consider rating, stock, margin, or shipping speed.
Hybrid search can run BM25 and vector search in parallel, then merge the results. It can also retrieve a candidate pool from both systems and send that pool to a reranker.
One common merging approach is reciprocal rank fusion. Instead of trying to compare raw BM25 scores with vector similarity scores directly, it combines rank positions. A product that appears high in both lists gets promoted.
Hybrid search works because lexical and semantic systems fail differently. Lexical search is precise but literal. Semantic search is flexible but sometimes soft. Together, they cover more ground.
10. Reranking: Retrieve Broadly, Then Think Carefully
Reranking
Retrieve broadly, then think carefully
Candidate pool
Second pass
Sharper top results
Rerankers improve the top of the list, but only if retrieval found the right candidates.
Most serious search systems are multi-stage.
The first stage retrieves candidates quickly. The second stage reranks a smaller set more carefully.
Deep dive
A reranker reads query and product together
A reranker looks at the query and each candidate product together, then predicts relevance. This can be more accurate than comparing separate embeddings because the model sees the interaction between the query and the product.
For example, a first-stage hybrid search may retrieve 200 products. A reranker then sorts the top 50 by how well they answer the user's actual intent.
Reranking is useful because it can understand details like:
- Does this product actually satisfy rainy-day use?
- Is this a shoe or only an accessory?
- Is the waterproof feature central or just mentioned casually?
- Does the product match the user's constraints?
But reranking has trade-offs. It adds latency and cost. It cannot fix a bad candidate pool. If the right product never appears in the first-stage results, the reranker cannot rescue it.
The pattern is: retrieve broadly, then think carefully.
11. Search as an AI Interface: Chat, RAG, and Agents
AI interface
Chat, RAG, and agents still depend on retrieval
User says
Search tools
Grounded answer
The model can narrate the answer, but search has to supply trustworthy evidence.
Now let us return to the chatbot. The user does not type a short query. They say:
I need shoes I can wear on rainy days, mostly for walking in the city, not hiking, and I do not want anything too expensive.
A chatbot has to parse intent before it searches. It should identify the product type, use case, terrain, negative preference, and price preference. Then it should turn those into retrieval actions: semantic search for rainy-day intent, lexical matching for shoes, filters for category and price, and maybe a reranking penalty for hiking boots.
This is where search becomes the foundation for the AI interface. In RAG, the language model writes the answer, but retrieval supplies the evidence. In agentic search, the system may need several steps: extract constraints, search, filter, compare, ask a follow-up question, and recommend.
Deep dive
Chat, RAG, and agents are interfaces over retrieval
But the rule stays the same: the model should not invent products. It should use search tools, inspect product data, compare candidates, and answer with grounded recommendations.
If retrieval is weak, RAG gives fluent bad answers. If the catalog data is messy, the chatbot becomes confident about the wrong things. If filters are ignored, the answer may sound helpful while recommending products the user cannot buy.
The chatbot is not separate from search. It is another interface over search.
12. Operating the System: Evaluation, Debugging, and Build Order
Operations
Measure, trace, and build in the right order
Query set
Operating loop
Search platform
Evaluation and debugging are what turn a demo into a search system.
Search quality is not something you feel once and declare solved. You measure it, debug it, and improve it layer by layer.
Start with a query set: head queries, exact product codes, typo queries, synonym queries, filter-heavy queries, no-result queries, and chat-style requests. For each query, define what good results look like.
Useful metrics include precision, recall, MRR, NDCG, recall@K, latency, zero-result rate, and reformulation rate. For ecommerce, also watch add-to-cart rate, conversion, filter usage, and abandonment. For chat, watch grounded answer rate and whether users keep correcting the assistant.
When search fails, trace the pipeline instead of blaming ranking immediately:
- Did the product enter the system?
- Did analysis create the right terms?
- Did lexical or semantic retrieval find it?
- Did a filter remove it?
- Did the scorer or reranker demote it?
- Did the UI or chatbot distort the evidence?
If I were building this from zero, I would start with clean product data, exact and lexical search, analyzers, BM25, filters, typo tolerance, autocomplete, and synonyms. Then I would add evaluation sets, semantic search, hybrid retrieval, reranking, and finally chat, RAG, or agentic flows.
Learning to rank can come later, when you have enough reliable judgments or behavior data to learn from. It is powerful, but behavior data is biased, so it should improve a strong system rather than replace judgment.
The practical order is simple: build the foundation first, add intelligence when the earlier layer exposes a real limitation, and keep measuring.
13. The Core Mental Model
Mental model
Each layer translates one kind of mismatch
User intent
Search stack
Useful evidence
Do not ask whether a technique is the future. Ask which failure mode it solves.
The whole story can be compressed into one sentence:
Search is the engineering of translation between user intent and stored evidence.
Deep dive
Every layer solves one failure mode
Exact matching translates characters. Analyzers translate text into terms. BM25 translates term evidence into lexical relevance. Query forgiveness translates messy language. Filters translate constraints. Embeddings translate meaning. Hybrid search translates across retrieval methods. Rerankers translate candidate sets into better ordering. RAG translates retrieved evidence into answers. Agents translate user goals into multi-step retrieval plans.
Each layer exists because the previous layer was useful but incomplete.
So when you hear about a new search technique, do not ask: is this the future of search?
Ask: which failure mode does this solve?
That question will keep you grounded.
The future of search is not simply semantic, or conversational, or agentic. It is layered. Exact when exactness matters. Lexical when words carry the signal. Structured when constraints matter. Semantic when vocabulary diverges. Hybrid when real queries mix all of it. Conversational when the user needs help deciding.
The user types a few words. The system has to understand the need, find the evidence, respect the constraints, rank the candidates, and present something useful.
That is the craft.