What is a google search appliance?
The Google Search Appliance is a search and index solution for organizations of all sizes. Using a search appliance, organizations can quickly deploy search on a web site or intranet. The Google search appliance comes with Google search software installed on powerful hardware (which looks like a box). This simplifies the going live process because you do not need to choose a hardware platform or go through a complicated software configuration process.
Two Models of the Google search appliance-
Google search appliance had two different models based on the number of documents that could be indexed.
- The “G100” – can index up to 20,000,000 documents. (20 million)
- The “G500” – can index up to 100,000,000 documents.(100 million)
What is an index/indexing?
Indexing is nothing but finding pages and reading their content and pre-arranging it in a systematic fashion similar to the index found at the end of a book so that these contents can be found faster when a search query is performed.
A search index looks similar to the image below.
What is Crawling?
Crawl is the process by which the Google Search Appliance discovers content and creates a master search index. It can be imagined as a way of feeding the search appliance with data to search upon.
Crawling happens through software known as spider bots that visit pages of your website by following link after link until all the pages on your website are visited, their data extracted and stored. The resulting index consists of all of the words, phrases, and meta-data in the crawled documents.
What can Google search appliance crawl?
The Google Search Appliance crawls content on web sites or file systems according to crawl patterns that you specify by using the Admin Console. As the search appliance crawls public content sources, it indexes documents that it finds. To find more documents, the crawler follows links within the documents that it indexes. The search appliance does not crawl content that you exclude from the index.
The Google Search Appliance can crawl:
- Public content – Public content is not restricted in any way; users don’t need credentials to view it.
- Controlled-access content (see Crawling and Serving Controlled-Access Content) this content is secure and is restricted so that not all users have access to it. To access such content, users need to enter a username and password. To crawl such pages, you need to provide your google search appliance with a username and password in your admin console.
The Google Search Appliance is also capable of indexing:
- Content in non-web repositories, such as content management systems (see Indexing Content in Non-Web Repositories)
- Hard-to-find content, such as content that cannot be found through links on crawled web pages (see Indexing Content in Non-Web Repositories)
What Content Is Not Crawled by google search appliance?
- Do not follow and crawl URLs that have been specified in the admin console
- Rules mentioned in the robots.txt file–The Google Search Appliance always obeys the rules in robots.txt
- nofollow robots META tags that appear in content sources
What file formats can Google search appliance index?
How much does Google search Appliance cost?
Google published a paper on Google search appliance’s pricing. This paper shows the pricing of the Google search appliance. It costs around 45000$ per year.
Google search appliance features-
- Automatic spell-check-This feature helps users finding correct search results even while they make spelling errors. The search appliance automatically suggests spelling corrections accurately, even on company-specific words and phrases. The spell checker can suggest corrections in multiple languages.
- Sorting search results based on relevance– The search appliance finds the highest quality and most relevant documents for a search query; Google factors in more than 100 variables for each query. There is also an option to sort search results based on date and other parameters.
- Automatic filtering of duplicate snippets– If multiple documents contain identical titles, as well as the same information in their snippets, only the most relevant document of that set is displayed in the results.
- Automatic filtering of duplicate directories– If there are many results in a single web directory, then only the two most relevant results for the directory are displayed. An output flag indicates that more results are available from that directory.
- Automatic filtering of languages– Limits search to a specified language, as determined by the majority language used in the web document body.
- Dynamic page summaries-Users can judge the relevance of results more easily with dynamically generated snippets showing a query in the context of the page.
- Results grouping-Users can navigate search results easily and clearly using intelligent grouping of documents residing in the same narrow subdirectories.
- Cached pages-Users can view search results even when the sites are down by using cached copies of pages included in the search results.
- Highlighted query terms-User can quickly find the most relevant section of a document by using the highlighted query terms displayed on cached documents.
- View as HTML-Users can display documents without needing the original client application of the file format because the search appliance automatically converts over 220 file formats into HTML.
- Sort by date-User can access time-sensitive information first by using date sorting.
- Advanced Search page-Users can perform complex and sophisticated queries with over 10 special query terms, including Boolean AND, OR, and NOT searches.
- Wildcard Search– Wildcard search is a feature that enables your users to search by entering a word pattern rather than the exact spelling of a term. The search appliance supports two wildcard operators:
- * –matches zero or more characters
- ? –matches exactly 1 character
Using wildcards can simplify queries for long names, technical data, pharmaceutical information, or strings where the exact spelling varies or is unknown. A user can search for all words starting with a particular pattern, ending with a particular pattern, or having a particular substring pattern.
Google Search Appliance End of Life
Google Search Appliance will reach its end of life in 2019. The exact date depends on your license agreement. Google’s engineering teams will continue to support the product by providing technical support, fixing bugs, providing security updates, and offer usability improvements.
You will continue to receive customer and technical support through the duration of your license agreement. However, now that the GSA is deprecated, future feature development will be limited.
What happens after Google’s support ends? Would my GSA still run?
Once your final license expires, your GSA will soon cease to work. The critical date for a GSA customer is your license expiration date.
Is Google Cloud Search a replacement for Google Search Appliance?
Google Cloud Search is not a replacement for the Google Search Appliance. It is important to note that Google Cloud Search was launched to G Suite Business and Enterprise edition customers in February 2017. While it offers powerful search and assist capabilities, it addresses a different set of search requirements, relative to the GSA.
Features to keep in mind while migrating GSA-
- Security– When you are migrating to a third-party solution, you might want to be sure that your data and website is secure. Adding external solutions to your websites could open a new window for data security threats. Make sure you choose a well-tested solution provider.
- Search relevance– Google’s search relevance was great and is difficult to match. Test out your google search appliance providers search relevance by asking for a demo and also make sure that you have certain controls over the search relevance so that in case of a wrong search result, you can fine-tune it to match your website’s theme.
- Search features– Make sure that the search features mentioned above such as query complete, spell correct, pdf, doc indexing, image search, and other basic search features are available.
- Search UI– Most search providers have a default UI for their search. Not all have a UI editor. make sure this box is checked before you narrow in on a search provider. This is one feature you cannot live without.
- Crawler– A robust crawler is a must-have feature for all search engines. Certain questions like whether these crawlers put an extra load on your website must be asked since some poorly optimized crawlers can slow down your website
- Connector integration- The new search engine should be able to integrate currently available and supported connectors.
- Analytics and reporting– Insights into search queries that users make on your website can provide new avenues to increase sales and improve website content.
Is there a search engine that can replicate all of my GSA features?
Expertrec search appliance is one of the best replacement for google search appliance. It caters to all the search features provided by google search appliance.
You can create your own search appliance within a few minutes from here.