oracle endeca

[Oracle] Endeca Search

Rate this article

Share this article

Search and navigation is the most important in online shopping. When a user searches for a given it is important that relevant products are shown to the user within a short period of time. To do this, the managers in the online shopping company should be given a powerful tool to manage the different aspects of the search engine.

This is what Endeca search does. It provides e-commerce site owners tools to help manage the search experience on their website. 

oracle endeca

Add search to your website

 

What is Oracle Endeca?

The word Endeca means “to Discover”. Endeca, the company was founded in 1999, and it focusses mainly on –

  1. Ecommerce.
  2. Enterprise search.
  3. Business intelligence.

In traditional e-commerce inventory query systems, you had to start at the top by selecting, say, men’s clothes or women’s clothes, then select from men’s trousers, men’s shirts, men’s coats and so on. Eventually, you’d get to the 36″ men’s trousers in black, but it was a very linear, and “guided” route through the data. Websites that used Endeca’s search technology, by contrast, presented a list of dimensions and attributes down one side, and the user could make any selection from them to narrow down their search. All of this happened lightening-fast, and with a back-end that was very easy for the customer to maintain.

So Endeca focused on this e-commerce market first, and developed the MDEX engine to support this, marketing it as a column-store, rapid-development query engine that allows “faceted searches” across lots of different, “jagged” data sets (i.e. data sets that don’t have the same data model but with some commonality between them).

A brief introduction about the terminologies in Endeca-

Endeca wanted to give its users a simple and easy way to interact with and analyze data with simple to use User interface, that scales to large scales for both structured and unstructured data.

Endeca search caters to the need of users to search, navigate and analyze data of all sizes from multiple data sources. It also helps to slice and dice across dimensions and drilling down to the finest details or having a macroscopic view of the data. Also, users should be able to perform complex search queries easily.

In addition to giving search results for a query, Oracle Endeca guided navigation could tell users, the next steps like refining and exploring and also avoiding “no results found”. These suggestions are re-ranked and re-organized with each click which helps in delivering a much better user navigation experience.

Oracle Endeca Guided Search components

Oracle Endeca Guided Search has three major components.

These components are:

  • Endeca Information Transformation Layer (ITL)
  • Endeca MDEX Engine
  • Endeca Application Tier

Oracle endeca guided search components

 

The Endeca information transformation layer (ITL) reads your raw source data and converts into Oracle Endeca MDEX engine indices. The ITL consists of

  1. The content acquisition system.
    1. Endeca CAS server and console
    2. CAS API
    3. Endeca web crawler.
  2. Data Foundry
    1. Forge (a data manipulation program).
Oracle Endeca MDEX engine-

What is MDEX, and how does it compare to Oracle products such as Oracle Database and Oracle Essbase?

First of all, it’s worth understanding the design goals behind MDEX compared to, say, an Essbase cube or an Oracle relational database. Oracle databases are designed to store lots of detail-level data in the most space-efficient way possible, and with fast retrieval times for individual rows of data; Essbase cubes are designed to pre-compute and aggregate lots of detail-level data and then provide slices of it quickly, making strong assumptions about the query paths that users will take. MDEX though was designed to support Endeca search and discovery uses cases, where the user can search and filter arbitrarily, and get fast aggregated views returned back to them. As such, Endeca position MDEX as a hybrid search/analytical database designed for analysis of diverse, and fast-changing, data.

The Oracle Endeca MDEX engine is the query engine of the Oracle Endeca Guided search. It contains-

  1. Indexer (Dgidx).
  2. Dgraph.
  3. Agraph

The indices which are generated by the ITL layer are loaded through the MDEX engine.

After the index is loaded, the MDEX engine receives search queries from the application tier, it matches them against the index and returns relevant results to the user’s web browser application.

The application tier provides an interface to the MDEX engine.

The Application Tier provides an interface to the MDEX Engine. The two default interfaces, which can be used in the same application, are the Presentation API and the Web services interface.

The Presentation API and the web services interface are used to query the MDEX engine and modify the results. The ITL components such as Forge run offline at specific time periods depending on your business needs. The MDEX engine and Endeca application tier have to be online as and when you want your clients to access your data.

These interfaces are used to query the MDEX Engine and manipulate the results. The Endeca ITL components, such as Forge, are run offline at intervals that are appropriate for your business requirements. The Endeca MDEX Engine and Endeca Application Tier are both online processes, meaning they must remain running as long as you want clients to have access to your data set.

Endeca MDEX Engine query results

The Endeca MDEX engine returns two types of information.

  1. Results for a query ( A recordset or a single record).
  2. The supporting information for building follows on queries. (This information helps users to refine or broaden their search queries using facets and filters.)

All query results returned from the Endeca MDEX Engine contain two types of information. These information types are

  • The appropriate results for the query (for example, a recordset or an individual record)
  • The supporting information for building follow-on queries The follow-on query information allows users to refine or broaden their query and, correspondingly, their query results.

The MDEX engine computes search results in a way so as to prevent dead ends such as “no results found” by providing suitable next step refinement options.

This is a key feature that differentiates Endeca from other search solutions.

Two types of queries

Oracle Endeca Search supports two types of search queries: navigation queries and keyword search queries.

  • Navigation queries return a set of records based on application-defined record characteristics (such as laptop type or region in an online laptop store), plus any follow-on query information.
  • Keyword search queries return a set of records based on a user-defined keyword, plus any follow-on query information.

Navigation queries and keyword search queries are complementary. In fact, a keyword search query is a special kind of navigation query, and the data structures for the results of the two queries are identical: a set of records and follow-on query information.

Users can execute a combination of navigation queries and keyword search queries to navigate to their desired record set in the way that works best for them. For example, users can execute a keyword search query to retrieve a set of records, then use a follow-on navigation query to refine that set of records. The reverse situation is also valid.

What are Endeca records

Endeca records contain the data that users navigate to or search for.

Endeca records are based on traditional records in a source database. Source database records typically contain information such as the bottles of wine in a wine store, the customer records in a CRM application, or the mutual funds in a fund evaluator.

Source database records store this information in one or more key/value pairs, known as properties. This information becomes available to your application when you transform the source database records into Endeca records. To transform the source database records into Endeca records, you must map the source record properties to properties of Endeca records.

Thus, dimensions and Endeca records correspond to the properties of source database records. Like source record properties, Endeca properties are key/value pairs. The following figure illustrates key/value pairs in a simple Endeca record:

endeca recordA single Endeca record can correspond to any number of source records. For example, suppose that four different source records refer to the same book in different formats: hardcover, paperback, large print, and audio. You can configure your Guided Search application to combine the information in these four source records into a single Endeca record.

What are Endeca dimensions and dimension values

Dimensions are logical categories that make it possible to organize your Endeca records into structures that customers can navigate through to find information about products or services that they might want to purchase.

A dimension is a hierarchy of dimension values. A dimension as a whole typically corresponds to a general category of products or services. Dimension values contain increasingly specific information about products and services, the lower they are in the hierarchy.

The top-most dimension value in a dimension is known as the dimension root. A dimension root serves as the name of its dimension. Each dimension value can have one or more child dimension values; a dimension value with child dimension values is known as a parent dimension value.

A child dimension value can have only one parent dimension value. Dimension values that are children of the same parent dimension value are known as sibling dimension values. Sibling dimension values cannot be identical. However, dimension values that are not siblings can be identical, even within the same dimension.

The dimension values that have no children are known as leaf dimension values. Leaf dimension values typically contain information about particular products and services. For example, a non-leaf dimension value might represent a range of prices and the leaf dimension values — its children — might represent individual products whose prices fall within that range. The following figure illustrates a simple dimension named “Wine Type”:

Endeca search

Records can be organized into searchable hierarchies by tagging them with dimension values. Records are typically tagged with leaf dimension values but can be tagged with non-leaf dimension values for special purposes.

Tagging a record with a dimension value does the following things:

  •  It specifies the location of the record within the associated dimension. In the example below, the Endeca records for Bottles A and B are tagged with the Red dimension value in the Wine Type dimension, while the Endeca records for Bottles C and D are tagged with the White dimension value, and so on.
  •  It identifies the record as a valid result when that dimension value is selected in a navigation query. In the example below, a navigation query on the Red dimension value produces a result set that contains Bottles A and B.

endeca navigation query

Endeca search best practices-

Search engines require regular maintenace like cars.

  1. Accurate search results
    1. Generate a list of daily searched keywords for which Endeca reported zero results. The list can be extracted from Endeca engine request log.
    2. Make all text fields searchable without making all of them part of the existing Endeca’s search interface. Only the fields/dimensions in the search interface will be searched; those that are not part of the search interface will never participate in search even though they are indexed. 
    3. Use the search terms that produced zero results we’ve identified to verify against all searchable text fields. 
    4. Generate a report that shows the result count discrepancy between the text field and the search interface – if the search interface returns no results but individual text fields return results greater than 0, we’ve successfully identified the cases in which Endeca had mistakenly provided users with zero results.zero result search terms endeca
    5. The following table records the output of the above step. For all search terms (column 1) that did NOT truly produce zero results (column 5), further analysis and action are needed. Those search terms were in fact not supposed to have produced zero results.
    6. The following table records the output of the above step. For all search terms (column 1) that did NOT truly produce zero results (column 5), further analysis and action are needed. Those search terms were in fact not supposed to have produced zero results.endeca zero search results verification
    7. Based on the above findings, we can use one of the following approaches to solve the problem:
      • Add the text fields that returned results to the existing search interface, or
      • Copy the value of the text field that returned results to one of the existing fields in the search interface.

      In addition, partial match configurations can also contribute to zero-hit scenarios. Consider cases in which users search for “powerful dishwasher” and “quiet refrigerator” on a home appliance website. The default Endeca partial match configuration dictates that results have to match at least 2 words (see screenshot below), which effectively turns all search terms with two keywords into “match all keywords.” As a result, if the retailer’s website doesn’t include “powerful” or “quiet” in its product descriptions or titles, no refrigerator or dishwasher would show up on the user’s search results page. Retailers can consider tuning partial match to “match at least 1 word” to reduce zero-hit rates.endeca search interface

  2. Efficient search results-Endeca uses engine cache to store results that were already processed in previous requests, which helps improve search performance because it avoids processing the same requests repeatedly. While it is advantageous to leverage engine cache to boost performance, there are several things to consider:
    • Identify the queries of which results can be cached from the engine request log. These queries will be used to warm up the engine. For example, Endeca-powered top navigation menu items are generally common across all pages. This is a good candidate for cached results rather than hitting the engine for every request. Another good candidate for caching is popular search queries. For example, for an electronics retailer or a department store, some popular holiday search queries could include “Xbox,” “Amazon Echo,” or “black Friday deals.”
    • The engine cache memory will have to be big enough to hold cached results.
    • The engine cache is validated after every baseline update (full refresh of the index), at which point the cache needs to be populated (warmed) using the queries identified above.
  3. Search relevance- Endeca search relevancy is tightly influenced by two major components:
    1. Endeca search interface – consists of a list of searchable fields from each record in the index. The more searchable fields included in a search interface, the wider the search. The opposite results in a narrow search. 
    2. Relevance ranking modules – out-of-the-box ranking algorithms that, when placed one after another, produce the desired ranking orders. The most frequently used modules are:
        • Number of terms – ranks results based on the number of matched terms matched.

        Search term: “leaking kitchen sink”

        Matching records: “my kitchen sink does not leak anymore after I fixed it” and “I have not yet installed asink in my kitchen yet”

        Ranking: Record 1 is ranked higher than record 2 because it matched all three keywords.

        • Single match vs. cross field match – single field match of all search terms has a higher score than those matched cross field.
      1. Search term: “popular spring break destinations”

        Matching records:

        Record 1:

        Title: Popular destinations for spring break!”

        Description: “Discounted airfare, hotel for spring break…”

        Record 2:

        Title: “What’s popular for spring break?”

        Description: “These are everyone’s dream destinations!”

        Ranking: Record 1 is ranked higher than record 2 because its title matched all keywords in the search term.

      2. sort by field values in ascending/descending order. Popularity is a good example to which you can apply this algorithm. The following diagram demonstrates how the relevancy components work together to produce the desired ranking order.endeca search relevance

Why did the usage of Oracle Endeca reduce?

Oracle has lost its way. It has fallen behind the technology curve and failed to provide a compelling roadmap to address the new demands of digital retail. Endeca was originally an innovative, open platform built by some of the brightest engineers in the industry, but over the years Oracle has turned Endeca into a massive, rigid “black box” that’s limited in functionality, painfully hard to change, slow to deploy, expensive to maintain, and darn near impossible to innovate with.

If you are looking for an alternative to Endeca, you can have a look at Expertrec’s Endeca alternative.

Endeca search alternative - expertrec

Use an Endeca Alternative

Are you showing the right products, to the right shoppers, at the right time? Contact us to know more.
You may also like