Ngram, bigram, trigram are methods used in search engines to predict the next word in a incomplete sentence. If n=1 , it is unigram, if n=2 it is bigram and so on….

What is Bigram

This will club N adjacent words in a sentence based upon N

If input is “ wireless speakers for tv”, output will be the following-

N=1 Unigram- Ouput- “wireless” , “speakers”, “for” , “tv”

N=2 Bigram- Ouput- “wireless speakers”, “speakers for” , “for tv”

N=3 Trigram – Output- “wireless speakers for” , “speakers for tv”

BiGram example

Imagine we have to create a search engine by inputting all the game of thrones dialogues.

bigram example

If the computer was given a task to find out the missing word after valar ……. The asnwer could be “valar morgulis” or “valar dohaeris” . you can see it in action in the google search engine. How can we program a computer to figure it out?

bigram example

By analyzing the number of occurrences in the source document of various terms, we can use probability to find which is the most possible term after valar.

BiGram Mathematics

The below image illustrates this- The frequency of words shows hat like a baby is more probable than like a bad

bigram example

Lets understand the mathematics behind this-

bigram example

this table shows the bigram counts of a document. Individual counts are given here.

bigram example

It simply means

  • “i want” occured 827 times in document.
  • “want want” occured 0 times.

Now lets calculate the probability of the occurence of ” i want english food”

We can use the formula P(wn | wn−1) = C(wn−1wn) / C(wn−1)

This means Probability of want given chinese= P(chinese | want)=count (want chinese)/count (chinese)


p(i want chinese food)

= p(want | i)* p(chinese | want) *p( food | chinese)

= [count (i want)/ count(i) ]*[count (want chinese)/count(want)]*[count(chinese food)/count(chinese)]



You can create your own N gram search engine using expertrec from here

muthali ganesh

Muthali loves writing about emerging technologies and easy solutions for complex tech issues. You can reach out to him through chat or by raising a support ticket on the left hand side of the page.

You may also like