KB User's Guide - Understanding the New Semantic Search Engine

This document describes how the new semantic search engine works and how it compares to the traditional keyword/title search engine.

Overview

A semantic search engine is based on the words we use and their relation to one another. By understanding how we naturally write and speak, it can then process and interpret the overall meaning behind search queries that are phrased the same way. This is commonly referred to as "Natural Language Processing", or NLP.

Where the traditional keyword/title search works by finding literal matches to the words you search, the new semantic search engine first tries to understand the broader meaning and intent behind your query, and then find the most likely matches. Additionally, rather than only returning matches based on just the title and keywords field for a KB document, it factors in the full text of the documents being searched.

This is particularly helpful for users who do not already know what content is published on a site and what keywords are used. Users who are unfamiliar with a site's content may frame their queries differently from the author. In these cases, their search may not have any literal matches, but semantic search will still be able to return documents with language that is semantically related to the query. 

How the KB's traditional keyword/title search engine works

To understand the differences between the two search engines, it helps to first understand how the traditional search engine works. The traditional KB search engine can be considered a "strict" or "literal" search. This means that every word entered in the search needed to have a match in the fields being searched. As a result, searching a greater number of terms would result in few results (i.e. more documents would be filtered out).

Traditional search process

When a user performs a search in the keyword/title search engine, the following happens:

  1. Certain common "noise" words (like "how", "the", "is", etc.) are filtered out and ignored.

  2. Synonyms for your search terms are brought in from in our a manually-maintained synonyms list, if applicable.

    • E.g. entering word "delete" in your search results in the words "deleting", "deletes", "remove", "removing", etc. also being searched.
  3. The resulting set of search terms (i.e. your original search with noise words removed and synonyms added as alternatives) is run against the title and keywords fields of the KB site's documents.

  4. If at least one result is found, only the documents that match ALL of the entered search terms or their synonyms are returned as results.

  5. If no results are found, the same set of terms is searched against the body of the KB site's documents (i.e. a "fulltext" search).

  6. If no results are found again, the same set of terms is searched against the contents of any text-indexed attachments in the site's KB documents (i.e. a "fulltext + attachments" search).

  7. If there are still no results, the user sees a message stating that no matching documents could be found. At this point, they would need to refine their search themselves to try to find results. 

Put another way, the keyword/title search depends on all documents for a given KB site having an appropriate number of keywords. If the majority of documents have a large number of keywords, but a few do not, those few documents will be very difficult to find. This is because most traditional searches will not look at the full text of the document, so those documents that are lacking keywords will frequently fail to match searches containing multiple keywords.

Another important note is that the keyword/title search only requires partial matching for longer words. In cases where a user searches "email" and still gets results for "emails", this is useful. However, it can result in scenarios where short words yield many unrelated results; for example, searching "bus" will also return results for "busy" and "business".

How the semantic search engine works

The new semantic search engine uses a large language model to understand the relationships between words. Where the traditional keyword/title search bluntly removes noise words and incorporates pre-defined synonyms, the semantic search engine uses all of the searched words to interpret the overall meaning of the search. It also bases results off of all text fields, rather than just the title and keyword fields. This has the following advantages:

  • It encourages searching with "natural language", i.e., phrasing searches as questions or statements. This is more in line with how we tend to use larger search engines, like Google.

  • Synonyms are handled on a much broader scale than the original search engine, as they are based on general word usage in the English language rather than manually populated sets of synonyms.

  • Typos and common misspellings will usually be understood and processed as the intended word, so they will still produce search results.

  • When a word with multiple meanings is searched, other words included in the search will influence the results towards one of those meanings.

    • E.g. searching "How do I submit an application" will yield more results referencing job or scholarship applications, whereas searching "How do I install an application" will yield more results pertaining to software applications.
  • You are less likely to accidentally filter out a relevant result when you use more search terms.

  • Documents that are lacking a robust set of keywords will be returned as results more often than the traditional keyword/title search.

Additionally, the KB's new search engine is technically a hybrid between semantic search and the traditional search. In effect, there is a limit on how many documents will be returned by the semantic search engine (with the default limit being 20). Any documents with literal keyword/title matches that were not returned by semantic search will then be added to the results.

Overall, the semantic search engine will return, on average, a larger number of results. Compared to the keyword/title search, where it is not uncommon to get fewer than ten results, semantic search will more often return 15+ results.



Keywords:
understanding the differences between searching new versus old original strict literal nlp natural language processing hybrid keywords title fulltext comparison 
Doc ID:
142365
Owned by:
Leah S. in KB User's Guide
Created:
2024-09-12
Updated:
2024-10-07
Sites:
KB User's Guide