Enterprise search engines explained in detail. Big data has transformed the way search works and you need to know how to tap into all the data you have to harness the real power of search. In this article, we look into how enterprise search has transformed over the years and the best practices that go into building one.
ENTERPRISE SEARCH ENGINES
Enterprise search is the organized retrieval of stored business data within an organization so that users can securely enter and find data across enterprise databases. This type of software cleans and structures data to make information that is usually spread across a variety of repositories easier to find.
Enterprise search software “Google-izes” enterprise data: it circumvents the time and effort that went into tagging, filing, sharing, and retrieving information regardless of size and media type, creating a secure and powerful, easy-to-use search function.
This type of software usually integrates with business intelligence and data management solutions which are used to clean and structure data in order to make information easier to find.
While integration is not required, enterprise search software can pull information from various sources such as
- HR Management Suites,
- Supply Chain Management Suites, or
- Product Lifecycle Management.
To qualify for inclusion in the Enterprise Search category, a product must:
- Collect and update information from different data sources, types, and formats
- Index or archive data
- Provide intelligent search options to auto-complete, find similar, or rank by relevance
- Create an interface to search and retrieve data
- Allow users to refine their search using advanced filters
- Define user permissions to access information
MAIN CHALLENGES IN ENTERPRISE SEARCH ENGINES
- Lack of Access to Content: If it is not indexed, then it cannot be found. Gaining timely and complete access to content-sets can be tough, especially where document-level security, remote locations, or very large content sets are involved.
- Lack of Search Relevancy: A great algorithm alone is not sufficient to produce great relevancy. Content cleansing, metadata capture and automated creation, the normalization of both content and metadata, and application-specific user query enhancement, are all important to the achievement of search relevancy.
HOW ENTERPRISE SEARCH ENGINES WORK?
Enterprise search is made up of several sub-systems.
- What happens first is that a “crawler” crawls directories and websites, and extracts content from databases and other repositories, and arranges for content to be transferred to it on a regular basis so it can notify the search engine that new information is available.
- Next, a searchable index is created, and other value-added processing, such as metadata extraction and auto-summarization, may take place. These functions group information into logical categories that in turn can be searched and return results to users based on how the particular search engine has categorized them.
- Query processing–
- Once this index is created, queries can then be accepted. Queries aren’t necessarily questions, as they can also be just terms or phrases that represent whatever you’re looking for, type into the search box. At this point, the search engine processes the query by passing over the index, finding the information that matches the particular term or subject entered, and sending that information to some sort of processor, which then sorts the information by relevancy or other measures, clusters is based on the categorization, apply some other logic (such as “best bets” or “recommended best”).
- Last comes the formatting, which presents the results page that you’re used to seeing, in whatever format you’ve chosen.
Turning a database into a series of results is a five-part process. An enterprise search solution encompasses all steps of the process. The first three steps are completed before any “search” is actually made.
- Content Awareness: The search has to know which databases it can access, this is a process known as “content awareness.”
- Processing: The content has to be processed so that it can be quickly and efficiently recalled. The source content is converted to the same type of document so that it can be quickly searched by the search solution.
- Indexing: The processed content is sorted into an index that keeps track of the frequency of a term.
- Query: A user makes a query – or search. The query is a combination of what the user is looking for as well as directions to certain parts of the index. For example, if a user is searching for “Marketing Statistics” they make a query.
- Matching: The search compares the query to the index and returns any matching entries. The search will return any entries that include “Marketing Statistics,” but may also return similar results.
The last two steps are what most people think of when they think of a “search.” A request is made to the enterprise search engine to find a certain term and the engine returns results that relate to that term based on the tuning and optimization criteria.
ADDITIONAL TERMS IN ENTERPRISE SEARCH
- Federated Search Results: A federated search allows a single query to search multiple databases. Each database sends back its results and they are combined into a single list of results for the user.
- Faceted Search: A faceted search allows a user to filter out results from their list of search responses.
- Custom Result Templates: Web searches are designed to generate ad revenue or a custom “look and feel”, with a customized result template that the end-user will either find more visually appealing, convert better, or minimize distractions caused by items such as advertisements.
Traditional Enterprise Search
Big data changed a lot of things. Enterprise search is one of them. The era of old enterprise search lies not before the advent of big data, but before that of big data management.
With huge chunks of data, comes the task of managing it. That too in a way that it’s easy to search and fetch relevant results. When you hear the word ‘search’, it represents the kind of technology that is able to index not millions or billions of pages across your enterprise but unlimited content sources.
Whenever B2B leaders are asked about the challenges they face, one of the common responses is, “The search doesn’t work that good. Like it’s not like Google but it should be!” It’s obvious for the end-users to expect Google or Amazon-like search as they have experienced its power.
To understand the modern enterprise search that uses AI-based algorithms and powerful machine learning to provide contextual and personalized results, you need to first understand the evolution of search technology and enterprise search
THE EVOLUTION OF ENTERPRISE SEARCH ENGINES
The old traditional search was based upon the client-server model. The majority of data (structured) storage was based on this model, centrally managed by one or more physical servers, which were accessed by multiple end-users (clients).
Traditional search solutions, therefore, were structured around this architecture. Cut to a few years later… The data exploded. Old models couldn’t keep up with indexing the entire documentation within an enterprise as the volume of data kept on growing to mammoth levels.
Below are a few of the problems that began to surface:
- At the technical level, the existing technologies were unable to index millions of pages quickly
- The search crawled too many URLs or wrong URLs
- At the strategic level, the vendors started charging per page indexed making it extremely expensive for businesses to carry on
This is when the new search was born to combat all these challenges. Clearly, the data storage and indexing technologies upgraded so did search as it leveraged these cutting-edge technologies.
CUTTING-EDGE SEARCH ENGINE TECHNOLOGIES
The advent of Solr indicated the feasibility and availability of a scalable search solution for anyone who was willing to take care of hardware management, data ingestion, and UI development on their own.
Solr initiated the extinction of the old client-server search vendors. For them, it was getting difficult to keep up with the modern requirements and scale. So, they vanished from the market.
Players like FAST, Autonomy, and Endeca were acquired by bigger players – Microsoft, HP, Oracle. Those who survived changed their product focuses from enterprise search. New solutions emerged, but the Solr ecosystem continued to rule the market of search technology.
Modern Enterprise Search FOR Businesses
This is where third-party search providers come into the picture. An advanced AI-algorithm-based search has the capability to index data from the content sources you provide, be it in or out of the cloud.
And with new startups sprouting, the competition has driven them to create search engines that advocate for unlimited content sources with out-of-the-box support and at no additional cost. It eliminates information silos across diverse platforms to give users a seamless experience.
With powerful in-built connectors, you can connect to most of the content sources across your enterprise or make custom integrations for a tailored solution.
ENTERPRISE SEARCH ENGINE USER EXPECTATIONS
Google and Amazon have spoilt users for a simple search. This is what users expect during the online transaction of information: • They want not just better results but also want them without having to type in their complete search queries. They want to start typing the first letter and let technology do the rest by giving apt suggestions to click on. • When viewing the solutions it really frustrates them if their solution is not presented as a top result or top 3 and they have to scroll down to the answer which is at the 15th point. • They do not want to switch consoles or sites for getting answers, they want all the information in one place There are two things working at the core of modern cognitive search – Artificial Intelligence and Machine Learning. A smart AI-driven cognitive search engine is the answer to all the above user concerns. Deep analytics embedded in it taps all the information related to a particular user. For instance, the number of minutes spent by the user over a page, number of clicks after logging into your portal/site, number of searches with no results, cases logged for a particular topic, etc. Now it’s up to your analytics team to follow the breadcrumbs left by users and create more personalized experiences for them.
PERSONALIZATION IN Enterprise SEARCH
Google has been the most successful of all in giving users personalized results. It has happened with you too, you might not realize it, yet.
The last time you were taking your car to your favorite restaurant, using Google’s navigation, it displayed the route with less traffic.
It used your geographical location to entice you into watching the latest movie with a variety of show timings. This is the level of personalization we have reached with deep analytics and rich insights into user behavior.
Now, to achieve personalization requires access to a huge amount of user data that can be analyzed. This is where the role of an advanced enterprise search comes in. Rather this is where Enterprise search transforms into cognitive search. It’s no longer being used to just find stuff but to gain rich insights into user behavior
What is Next in Enterprise Search ENGINES?
Search has come a long way from indexing data on-premises to indexing that on the cloud. More and more data has been moving to the cloud.
However, there is a lot of data that’s still on-premises, and it will be, and businesses need to index all this data. Search as a solution, therefore, will continue with the cloud-hybrid model. On the next page are a couple of things that will prevail in the enterprise search market.
It’s not going to be long that we will search without having to type, we will just have to say what we need, and Voila! Actually, we already are – Siri and Alexa are classic examples of voice-controlled search headed our way. It surely solves the problems mentioned above of users having to type full query or even type it. It would actually be like talking to a real person who understands your intent and gives you exactly what you need. Think about it, you are driving and you need to find something, you won’t have to stop the car at the side to search for it or even look down at your phone’s screen to call someone. Just say it and the work is done. Nothing will stand in your way! Digital personal assistants will be integrated seamlessly with the products you use daily and it’s already started with your TV getting voice commands from your remote to browse to your favorite channel or open Netflix or browse the internet or giving you a reminder of an already set appointment in your calendar. Voice search uses various things like speech patterns, personal preferences, and user intent to identify the context of the search to help us get the best results.
ComScore says that by 2020, 50 percent of all searches will be voice searches.
How many times have you said it, “I am not sure? I do have a picture in my mind. I will know it when I see it.”
There’s a huge number of people who understand and remember things via pictures. It comes as no surprise because humans by nature are hard-wired to respond to images more than words.
And a search based on visual cues has already started, a very good example of which is the Pinterest app. When you make an account on it, it asks various questions to set your preferences and based on images you like or click on, sends you more of them. Just imagine, your car piece broke and you click its picture and put it on a search engine, like Google Lens, to get all the information related to it.
You would be presented with how to fix the part, nearby mechanics, etc. Won’t life be so easy?
What to Look for When Evaluating an Enterprise Search Solution
Base Technology and Fit
The first area to understand is to dig into the base or underlying technology of the solution. This includes the following areas:
- What technology stack is the search solution built on, and what programming languages would be used to implement and extend it? Is this the same as the technology used within your organization?
- Where is data stored? What technology used for storing data?
- Is any or all part of the solution open source? Or is it completely proprietary? Some mix of the two?
- Does it fit and work within the Content Management Solution or the application that will be exposing the search?
- What parts of the solution are essentially “off-limits” vs. what is customizable if necessary
- What skills are necessary to do customization?
Evaluating the base technology behind the solution is important to understand how much it will take to run and support the solution, including what would happen if the organization decides to “go it alone” and support the solution with internal resources. While open source solutions could provide licensing advantages (more on licensing below) and also possibly provide access to the source code (if necessary), it also could lead to support considerations that an organization is not ready for.
For instance, choosing an entirely open-source option without a real business behind it and then building a solution in-house would mean that the organization is signing up to be a software developer that is essentially competing with the existing enterprise search software vendors out there already. This is still possible and maybe the right decision in some situations, presumably in circumstances that cause extreme customization anyway.
But, the constant software development process, testing, implementation, and support necessary to keep up with the changes in the market may not make sense for organizations that just want a product and solution that works and is a truly cloud-based environment (which would be difficult to make happen in-house as well). Choosing a solution that meets the organization’s current technology stack is an important consideration.
Connectors are pre-built code to integrate systems together. Many are built for such things as Content Management Systems and CRM systems, but they could be any environment that the enterprise search solution provider felt was necessary or would provide them a marketing advantage. Commercial applications typically would have a stronger eye toward marketing and would naturally provide more connectors, while open source solutions would tend to give the tools necessary for developers to create their own connectors. These connectors need to be evaluated according to the following questions to assess the appropriateness:
- How many connectors are currently available?
- Are the necessary connectors available for the organization’s immediate needs? What about growth?
- If a particular connector is not available, is it possible to create a custom connector? How difficult is this process?
- How deep do the connectors go? Do they provide the right level of integration to be effective or just brush the surface to check a marketing box off that they have the connector? If incomplete in some way, how difficult is it to shore it up to get what is needed?
- Does the provider seem dedicated to the continual development of additional connectors?
Vision and Architecture Philosophy
Understanding what the enterprise search solution was created for and where it is headed in the future is important. Some solutions were created and optimized for specific systems, applications, or use cases such as CRM or customer service or knowledge bases. While it may not be important to know how a particular vendor is going to handle predictive analytics or machine learning in detail, for instance, it might be important to an organization to consider a particular solution where the provider is working on artificial intelligence capabilities for automating taxonomy management. This could show that the vendor is thinking about the future and has the same vision of where the organization wants to go.
Other considerations of vision and philosophy include how data is extracted from the source systems. Then how the search engine solution processes that data and merges data from all of the sources together, commonly called federation. It is important to understand how data is joined from different data sources and normalized to create a common structure, such as one data source having a full name in one field while another source has first and last name in different fields. There are many ways to do this, and being on the same page is critical. In addition, the way that a system handles taxonomy is important. Taxonomy is the categorization capabilities or methods of creating a context to the data and structuring the data such including creating filters, facets, and other users interface features. All of these different areas can affect the evaluation of a search engine solution.
Most search engines today need to be able to handle very large databases and index sizeable quantities of data, sometimes into the millions and even billions of records. This also means the system needs to be built in order to provide response times to queries in an efficient manner With the amount of data that needs to be processed, the search engine solution needs to process data in order to not frustrate users that are accustomed to Google-like response times. Areas to consider under scalability include the following:
- Number of data sources
- The number of records within each data source? What is the expected growth of data?
- Frequency of updates and how much of the data needs to be updated with each update
- How many queries will be performed? What is the expected growth?
Indexing is the method for gathering the data. It describes whether (and how) a crawler is used, how often data is captured (time between indexes), how fast the actual indexing process takes, and whether some or all fields need secondary processing for creating metadata in order to use the data. All of these are important to consider because if the system is down while the indexing occurs because of how long the process takes or the way that the system is built. If the system is unavailable during this time or slows from the user’s perspective, it is a concern. It is also an issue if the data is very old (stale) because of the time between indexing. Often the processing can be handled offline with a separate server such as a staging server and intelligent means of data capture can be maintained such as only getting the data has changed rather than the entire data set from every data source. All of these architecture decisions should be evaluated when making a selection of a search engine solution.
Search Features and User Experience
The core query functionality of the system is critical to look at. At this point within the search industry, there are quite a few search features that should be expected in a modern search solution. The features and functionality that should be in most systems including sorting, filtering, faceting, stemming, keyword searches, Boolean searches, the use of wildcards, field searches, range searches, synonyms, “did you mean” type features, auto-suggesting, and auto-completion. If any of these are missing, it should be cause for concern.
The search solution needs to also provide flexibility to allow for providing the means to create a world-class user interface and experience. In many ways, the user experience is just as important, or more important, than the back-end functionality. The system should have the ability to create modern user interface components such as responsive designs (mobile), filters, facets, keyword highlighting, etc. Establishing the user interface can be expensive and care should be taken to make sure to understand how easy it is to make changes if the requirements change.
Search relevancy is the process of determining which search results end up at the top of any particular results list based on how relevant the data is to the search that was performed. Search relevancy is a constant process of optimization of the search algorithm to the needs of the individual system and the ability of the system to determine the user intent. Indexing and architecture can heavily influence search relevancy and how data is processed. The search engine solution should be graded on how easy and flexible it can be tuned to the needs of the organization, how search scoring is handled and its accuracy and ability to tweak, as well as the system’s ability to boost relevancy either manually by an administrator or by additional criteria that are added to the algorithm.
Measurement of search relevance should include aggregation and analysis of search logs, keyword information, results in logs, click information, abandon statistics, and possibly even conversion statistics if they are available, particularly if they can be traced back to the search data. All of this information will help get a more clear picture of the user’s needs and intent with the goal of continual tuning. Eventually, it could lead to a personalization of each search that is performed to each individual user that is performing the search. In order to get to this ultimate goal of a complete understanding of the user and their intent, it is appropriate to use big data techniques and tools, machine learning methodologies and technologies, as well as predictive analytics to help improve the relevancy scores and continual improvement of the search results.
Licensing Models and Cost
How the license for the solutions works and pricing works is an important criterion in deciding on the right application for an organization. Licensing can be very complex and have many components. These components may not be linear either, with potential hidden costs that aren’t immediately obvious. For instance, although a purely open-source solution could look inexpensive with no direct license expense, the on-going support and additional development expense could end up being cost-prohibitive, particularly if the organization doesn’t have the skills to manage an open-source solution properly. Some questions to ask on pricing include the following:
- Is the solution SaaS? On-premise? Hybrid?
- If on-premise, how would hosting be handled? Is there a flat fee or tiered pricing? Are there maintenance costs? Is the license price contingent upon the number of servers or processors?
- If SaaS or some type of hybrid is there a base cost? Is it per month? Is there some additional volume-based expense per month (most commonly based on the number of queries)? Is there additional pricing based on per person?
- How is support handled? Is this an extra expense? Is there maintenance expense for additional years?
- How is training handled? What training expenses are necessary?
Security and Authentication
Protection of data continues to be one of the biggest challenges for modern organizations. Sensitive and proprietary documents needed to be secured from individuals and systems that should not have access. Some areas to consider security include the following:
- How is authorization provided?
- Is a single sign-on available?
- Can the system provide document-level security?
- What other security capabilities does the system provide?
Administration and Skills Necessary
Modern enterprise search solutions provide reporting and administrative capabilities to employees in order to understand more about how the system is operating and allowing optimization of search results. An evaluation of the reporting as well as gaining some understanding of what options are available to tune the results is necessary to understand the breadth of the solution. Some considerations for administration include:
- Are there tools for synonyms? Is there an administrative interface to manage synonyms?
- How are misspellings handled? Is there an automated system to detect misspellings?
- What skills or employees are necessary to administer the system?
- Is there a way to boost favored content within the search results?