A Fuzzy query is a search engine feature that lets website users find the correct search results even when they make mistakes in the spelling for certain search queries. Since we don’t have any control over how your website users type in the search box, the design of the fuzzy query search engine has to be pretty robust to handle. Fuzzy queries usually work on the basis of mathematical formulae that work by finding the distance between two search queries.
How Fuzzy query searches work?
Consider the word that the user types like A and the word that matches in the database as B1, B2, B3, and so on.
The word that has the least distance between is returned as result1, result 2, result 3 and so on in search results. Levenshtein distance is a common method to calculate the distance between two words.
Levenshtein Distance
The Levenshtein Distance is a measure of how different two words are. It measures how many changes are required to change one word into another.
Mathematically, , the Levenshtein distance between two strings a
and b
(of length |a|
and |b|
respectively) is given by where
and is the distance between the first i
characters of a
and the first j
characters of b
.
To visualize this, have a look at the image below. There ate two wordsS2 from S1.
Calculate Fuzziness
To calculate the Levenshtein distance, it is pretty easy to do so if you are on a Linux operating system by using the following function. All you have to do is store the two strings you want to compare in strings w1 and w2.
The code to use is stringdist.levenshtein(w1,w2) which will return the levenshtein distance between the two words.
Fuzzy search in Python
Here is a sample code that does fuzzy search in python
import numpy as np
def levenshtein_ratio_and_distance(s, t, ratio_calc = False):
""" levenshtein_ratio_and_distance:
Calculates levenshtein distance between two strings.
If ratio_calc = True, the function computes the
levenshtein distance ratio of similarity between two strings
For all i and j, distance[i,j] will contain the Levenshtein
distance between the first i characters of s and the
first j characters of t
"""
# Initialize matrix of zeros
rows = len(s)+1
cols = len(t)+1
distance = np.zeros((rows,cols),dtype = int)
# Populate matrix of zeros with the indeces of each character of both strings
for i in range(1, rows):
for k in range(1,cols):
distance[i][0] = i
distance[0][k] = k
# Iterate over the matrix to compute the cost of deletions,insertions and/or substitutions
for col in range(1, cols):
for row in range(1, rows):
if s[row-1] == t[col-1]:
cost = 0 # If the characters are the same in the two strings in a given position [i,j] then the cost is 0
else:
# In order to align the results with those of the Python Levenshtein package, if we choose to calculate the ratio
# the cost of a substitution is 2. If we calculate just distance, then the cost of a substitution is 1.
if ratio_calc == True:
cost = 2
else:
cost = 1
distance[row][col] = min(distance[row-1][col] + 1, # Cost of deletions
distance[row][col-1] + 1, # Cost of insertions
distance[row-1][col-1] + cost) # Cost of substitutions
if ratio_calc == True:
# Computation of the Levenshtein Distance Ratio
Ratio = ((len(s)+len(t)) - distance[row][col]) / (len(s)+len(t))
return Ratio
else:
# print(distance) # Uncomment if you want to see the matrix showing how the algorithm computes the cost of deletions,
# insertions and/or substitutions
# This is the minimum number of edits needed to convert string a to string b
return "The strings are {} edits away".format(distance[row][col])
Source- https://www.datacamp.com/community/tutorials/fuzzy-string-python
Fuzzy search in Javascript
https://github.com/bevacqua/fuzzysearch
Tiny and blazing-fast fuzzy search in JavaScript
Fuzzy searching allows for flexibly matching a string with partial input, useful for filtering data very quickly based on lightweight user input.
Demo
To see fuzzysearch
in action, head over to bevacqua.github.io/horsey, which is a demo of an autocomplete component that uses fuzzysearch
to filter out results based on user input.
Fuzzy search with no coding
- Go to Fuzzy query search engine creator.
- Enter your website URL.
- If you have a sitemap, enter the URL.
- Once the crawl completes, add the code to your website and take live.
- Expertrec is a paid fuzzy search engine for websites that costs 9$ per month.