Home » [SOLVED] | What is Bigram with examples

Others

[SOLVED] | What is Bigram with examples

Muthali

Mar 06, 2023

Rate this article

Share this article

Introduction: What Are N-Grams in NLP?

If you have ever used a search engine and noticed it predicting what you are about to type, you have seen n-grams at work. An n-gram is a contiguous sequence of n items (usually words) extracted from a given text or speech sample. N-grams form one of the foundational concepts in Natural Language Processing (NLP), computational linguistics, and modern search engine technology.

Search engines like Google rely heavily on n-gram analysis to power features such as autocomplete, spell correction, and query understanding. When Google suggests “wireless speakers for tv” after you type just “wireless sp…”, it is using statistical models built on billions of n-grams extracted from web pages and past search queries.

In this guide, we will explain unigrams, bigrams, and trigrams with clear definitions, practical examples, a Python code tutorial, and a look at how n-gram techniques improve search relevance.

N-Gram Definitions: Unigram, Bigram, and Trigram

The value of n in “n-gram” simply refers to the number of consecutive words (or tokens) grouped together. Here are the three most common types:

Unigram (n = 1)

A unigram is a single word taken from a text. Unigram analysis treats each word independently without considering context. For the sentence “wireless speakers for tv”, the unigrams are:

“wireless”
“speakers”
“for”
“tv”

Bigram (n = 2)

A bigram (also called a digram) is a pair of two consecutive words. Bigrams capture basic word-pair context, making them far more useful than unigrams for understanding phrases. The bigram meaning is simple: slide a window of size 2 across the text, one word at a time. For “wireless speakers for tv”, the bigrams are:

“wireless speakers”
“speakers for”
“for tv”

Trigram (n = 3)

A trigram groups three consecutive words together, providing even richer context. For the same input “wireless speakers for tv”, the trigrams are:

“wireless speakers for”
“speakers for tv”

Bigram Example: Predicting Words in Game of Thrones

Imagine we have to create a search engine by inputting all the Game of Thrones dialogues.

If the computer was given a task to find out the missing word after Valar ……. The answer could be “Valar Margulis” or “Valar dohaeris”. You can see it in action in the Google search engine. How can we program a computer to figure it out?

By analyzing the number of occurrences in the source document of various terms, we can use probability to find which is the most possible term after “Valar.” This is exactly how bigram-based language models work — they estimate the probability of a word given the word that came before it.

Bigram Mathematics: How Probability Works

The below image illustrates this — the frequency of words shows that like a baby is more probable than like a bad:

Let’s understand the mathematics behind this —

This table shows the bigram counts of a document. Individual counts are given here:

It simply means:

“I want” occurred 827 times in the document.
“want want” occurred 0 times.

Now let’s calculate the probability of the occurrence of “I want English food”.

We can use the formula P(wn | wn−1) = C(wn−1wn) / C(wn−1)

This means the Probability of “want” given “Chinese” = P(Chinese | want) = count(want Chinese) / count(Chinese)

p(I want Chinese food)

= p(want | I) × p(Chinese | want) × p(food | Chinese)

= [count(I want) / count(I)] × [count(want Chinese) / count(want)] × [count(Chinese food) / count(Chinese)]

= (827/2533) × (6/927) × (82/158)

= 0.00109

Practical Applications of Bigrams and N-Grams

N-gram analysis is used across many areas of technology. Here are the most important real-world applications:

Search Engines

Autocomplete & Search Suggestions: When you start typing a query, the search engine uses bigram and trigram frequency data to predict and suggest the most likely completions.
Spell Correction: Character-level bigrams help detect and correct misspellings. If a user types “wirless spekers,” the engine compares character bigrams against known correct words to suggest “wireless speakers.”
Query Understanding: Bigram analysis helps search engines distinguish between different meanings of the same word by looking at surrounding context. For example, “apple pie” vs. “apple store” are identified as different intents through bigram patterns.

Natural Language Processing (NLP)

Language Modeling: Bigram and trigram models estimate the probability of word sequences, enabling applications like machine translation and speech recognition.
Sentiment Analysis: Bigrams such as “not good” or “very bad” carry meaning that individual unigrams miss, making them critical for accurate sentiment classification.
Text Classification: Spam filters and topic classifiers use n-gram features to categorize documents more accurately than single-word analysis alone.

Text Analysis & Data Science

Keyword Extraction: Extracting the most frequent bigrams from a document reveals key topics and themes.
Plagiarism Detection: Comparing n-gram fingerprints between documents can detect copied content efficiently.

N-Grams in ExpertRec Search

ExpertRec uses n-gram-based techniques to power its AI-powered site search features, including smart autocomplete suggestions and typo-tolerant search. These n-gram models analyze your site’s content to deliver relevant suggestions as users type, improving both search accuracy and user experience.

Python Code: How to Generate Bigrams from Text

Here is a simple Python example that shows how to generate unigrams, bigrams, and trigrams from a sentence:

def generate_ngrams(text, n):
    """Generate n-grams from input text."""
    words = text.lower().split()
    ngrams = []
    for i in range(len(words) - n + 1):
        ngram = ' '.join(words[i:i + n])
        ngrams.append(ngram)
    return ngrams

# Example text
text = "wireless speakers for tv"

# Generate unigrams, bigrams, and trigrams
unigrams = generate_ngrams(text, 1)
bigrams  = generate_ngrams(text, 2)
trigrams = generate_ngrams(text, 3)

print("Unigrams:", unigrams)
# Output: ['wireless', 'speakers', 'for', 'tv']

print("Bigrams:", bigrams)
# Output: ['wireless speakers', 'speakers for', 'for tv']

print("Trigrams:", trigrams)
# Output: ['wireless speakers for', 'speakers for tv']

You can also use the NLTK library for a more concise approach:

from nltk import ngrams

text = "wireless speakers for tv"
words = text.split()

bigram_list = list(ngrams(words, 2))
print("Bigrams:", [' '.join(bg) for bg in bigram_list])
# Output: ['wireless speakers', 'speakers for', 'for tv']

Comparison Table: Unigram vs Bigram vs Trigram

Type	Window Size (n)	Example (from “I love coding in Python”)	Primary Use Case
Unigram	1	“I”, “love”, “coding”, “in”, “Python”	Bag-of-words models, basic word frequency analysis
Bigram	2	“I love”, “love coding”, “coding in”, “in Python”	Autocomplete, phrase detection, sentiment analysis
Trigram	3	“I love coding”, “love coding in”, “coding in Python”	Language modeling, machine translation, context-rich predictions

Get Started with N-Gram Powered Search

You can create your own n-gram powered search engine using ExpertRec from here. ExpertRec’s site search plans start at $49/mo and include AI-powered autocomplete, typo tolerance, and relevance tuning built on n-gram techniques.

Add Expertrec Search to your website

What is a bigram in NLP?

A bigram is a sequence of two consecutive words or characters extracted from a text. For example, in the sentence ‘I love coding’, the bigrams are ‘I love’ and ‘love coding’. Bigrams are widely used in language modeling, autocomplete systems, sentiment analysis, and search engines to capture word-pair context that single words miss.

What is the difference between unigram, bigram, and trigram?

A unigram (n=1) is a single word, a bigram (n=2) is a pair of two consecutive words, and a trigram (n=3) is a sequence of three consecutive words. These are all types of n-grams used in natural language processing. Bigrams and trigrams capture more context than unigrams, making them better for tasks like phrase detection and language modeling.

How are bigrams used in search engines?

Search engines use bigrams to power autocomplete suggestions, correct spelling errors, and understand query intent. By analyzing the frequency of word pairs across billions of documents, search engines can predict what a user is likely searching for and return more relevant results. Bigram analysis also helps distinguish between different meanings of ambiguous queries.

What is the bigram definition?

A bigram is defined as a contiguous sequence of two items (typically words) from a given sample of text or speech. In the context of NLP and computational linguistics, bigrams are used to build statistical language models that predict the probability of a word based on the preceding word. The bigram model is one of the simplest and most widely used n-gram models.

How do you generate bigrams in Python?

You can generate bigrams in Python by splitting text into words and then pairing consecutive words using a loop or list comprehension. For example: words = text.split(); bigrams = [words[i] + ‘ ‘ + words[i+1] for i in range(len(words)-1)]. Alternatively, you can use the nltk.ngrams() function from the NLTK library for a more concise solution.

Muthali

Are you showing the right products, to the right shoppers, at the right time? Contact us to know more.

[SOLVED] | What is Bigram with examples

Introduction: What Are N-Grams in NLP?

N-Gram Definitions: Unigram, Bigram, and Trigram

Unigram (n = 1)

Bigram (n = 2)

Trigram (n = 3)

Bigram Example: Predicting Words in Game of Thrones

Bigram Mathematics: How Probability Works

Practical Applications of Bigrams and N-Grams

Search Engines

Natural Language Processing (NLP)

Text Analysis & Data Science

N-Grams in ExpertRec Search

Python Code: How to Generate Bigrams from Text

Comparison Table: Unigram vs Bigram vs Trigram

Get Started with N-Gram Powered Search

Muthali

Products

Get Started

Company

Company

Follow Us