In this article, we will see what is indexing and why it is important in a search engine.
Every search engine has 3 main components-
- Query processing.
What is indexing / an index?
Let us imaging you have a website that sells laptops online. One of you tasks is to create a search engine that can search through your inventory of laptops. We will also assume you have a list of laptops, their names, price etc in a csv file.
Before you can search through this data, you have to create a search engine index. Having an index helps in getting search results faster and quicker. If not, the search engine will have to search across every product one by one which will take a large amount of time. (lesser processing time).
This is similar to an index that you would see at the end of a book that helps you find content faster.
What is an inverted index?
In an inverted index, each indexed term points to a list of documents that contain the term. Here is an example that shows how a inverted index looks like.
Compare this with a regular book index . Can you see the similarity
How does the Indexer get the data –
- XML, JSON feed
- Web crawl
- RSS or ATOM feeds.
How to index data?
The following open source tools will let you create an index for free. (you will need to have coding knowledge)
- ELastic search
Expertrec is a paid solution that takes care of indexing once you upload any document in the above mentioned formats. (no coding required).
How to increase indexing speed?
- Reduce the number of fields to be indexed.
- Use SSDs.
- Increase RAM of the machines that are indexing.