An intranet search engine is actually a special case of enterprise search, though it is rarely treated as such. Enterprise search, while itself a vague term, deals with indexing data from multiple sources or software and providing a search on top of it. Add to more features like access control and other very specific requirements, it is no wonder that enterprise search tools are so expensive.
The users of an intranet more often than not require a search tool that can provide a search on their private documents. These documents may themselves be structured or unstructured, but are generally not spread over multiple software and databases. In a sense, it requires a more toned-down and specialized version of enterprise search.
From my first two paragraphs, it is quite clear that there is no well-defined set of requirements out there for an intranet search tool. Corporate intranets overcome this issue by using enterprise search tools that often overkill for their requirements and end up burning a lot of money in the process.
The Tried and Tested Way
Building a Google-like search experience behind a firewall is no easy task. Google Search Appliance used to be the go-to product for this purpose until it was shut down in 2018. Turning to open source projects like Solr or Elasticsearch is another option. Although general IT staff should be capable of conducting a basic implementation of an intranet search engine, they typically do not have the specialist knowledge to tune the search system. Hence, many intranet search experiences fundamentally work, in that they return results in answer to queries, but they lack relevancy and do not provide a comprehensive search service over intranet content.
On top of this, developing such a system takes time and resources and can get expensive pretty fast. Even if you manage to build it, now you have the overhead of maintenance and will have to keep updating features with time. A SaaS solution is the cheaper and more efficient way to go. The tricky part here is finding a search provider that caters to your specific needs.
P.S. This is where ExpertRec comes in.
The Road to Building an Intranet Search Tool
Having spent a lot of time on internal search engines, we at ExpertRec discovered that the data on the internet is not as structured as it should be. And because of this, our crawlers have over time gained the ability to index content that does not conform to the standards of the web and still provide a search for them.
One such non-standard implementation is our crawling behind login pages feature where we are able to provide a search even if the content is protected by authentication. This is very close to crawling private intranets, but we are still in the internet territory. This became a more popular feature than we initially anticipated and our systems evolved to support numerous types of authentication like form auth, NTLM auth, cookie-based auth, IP whitelisting, etc.
This in turn opened up new capabilities for our crawler to index data that is otherwise difficult to access and set up a search just as quickly as we do with any regular website. The first of these was building a search for cloud intranets. This was the case of company intranets that can be accessed through the web but require some form of authentication to access. The data to build the search on ranged from a simple set of PDFs to more complicated and unstructured data coming from multiple sources.
But behind the corporate firewall, documents are seldom written to be found and typically lack useful metadata. These issues of content normalization and metadata improvement can be addressed. Content manipulation and analysis prior to indexing were required where multiple data sources were involved. On top of this, documents were not always connected to each other by links and content discovery remained a major challenge. Failure to connect in a sustainable manner means that content is not made available to be searched.
But the experience of dealing with a wide variety of websites and their search requirements, our systems were already equipped to handle the chaotic nature of this data. When our automatic extraction fails, we still had the fallback option of setting up custom rules with our manual extraction feature. Will a few extra rules, the crawler could extract any content and build the search immediately.
The next capability was (or what we thought was) the tricky part. For truly private intranets that could not be accessed over the internet. But we quickly realized that the only limitation was establishing a channel through which our crawler can enter the private network. Beyond this point, all the steps and pitfalls for building a search remained exactly the same.
Beyond this point, we started to develop features specific to private intranet searches like providing a filtered set of results depending on a user’s access level. There are, of course, more features but discussing all of them is beyond the scope of this article.
Building a Search For Your Private Intranet Using ExpertRec
We want to make building an intranet search engine with ExpertRec something you can do yourself easily. But I admit that at this stage there may be one or two places where you can get stuck. I encourage you to try it out by signing up here for our 14-day free trial.
There is a good chance that the crawl will fail a few times before setting up the authentication. But no worries, our awesome support team is always available on live chat. On top of that, you can raise a support ticket for assistance and we’ll get back to you ASAP.