Consumer Oriented Search In Oracle Endeca Information Discovery – Part 1
By Bob Zurek on Mar 16, 2012
Information Discovery, a core capability of Oracle Endeca Information Discovery, enables business users to rapidly search, discover and navigate through a wide variety of big data including structured, unstructured and semi-structured data. One of the key capabilities, among many, that differentiate our solution from others in the Information Discovery market is our deep support for search across this growing amount of varied big data. Our method and approach is very different than classic simple keyword search that is found in may information discovery solutions. In this first part of a series on the topic of search, I will walk you through many of the key capabilities that go beyond the simple search box that you might experience in products where search was clearly an afterthought or attempt to catch up to our core capabilities in this area. Lets explore.
The core data management solution of Oracle Endeca Information Discovery is the Endeca Server, a hybrid search-analytical database that his highly scalable and column-oriented in nature. We will talk in more technical detail about the capabilities of the Endeca Server in future blog posts as this post is intended to give you a feel for the deep search capabilities that are an integral part of the Endeca Server.
The Endeca Server provides best-of-breed search features aw well as a new class of features that are the first to be designed around the requirement to bridge structured, semi-structured and unstructured big data. Some of the key features of search include type a heads, automatic alphanumeric spell corrections, positional search, Booleans, wildcarding, natural language, and category search and query classification dialogs. This is just a subset of the advanced search capabilities found in Oracle Endeca Information Discovery.
Search is an important feature that makes it possible for business users to explore on the diverse data sets the Endeca Server can hold at any one time. The search capabilities in the Endeca server differ from other Information Discovery products with simple “search boxes” in the following ways:
The Endeca Server Supports Exploratory Search.
Enterprise data frequently requires the user to explore content through an ad hoc dialog, with guidance that helps them succeed. This has implications for how to design search features. Traditional search doesn’t assume a dialog, and so it uses relevance ranking to get its best guess to the top of the results list. It calculates many relevance factors for each query, like word frequency, distance, and meaning, and then reduces those many factors to a single score based on a proprietary “black box” formula. But how can a business users, searching, act on the information that the document is say only 38.1% relevant? In contrast, exploratory search gives users the opportunity to clarify what is relevant to them through refinements and summaries. This approach has received consumer endorsement through popular ecommerce sites where guided navigation across a broad range of products has helped consumers better discover choices that meet their, sometimes undetermined requirements. This same model exists in Oracle Endeca Information Discovery. In fact, the Endeca Server powers many of the most popular e-commerce sites in the world.
The Endeca Server Supports Cascading Relevance.
Traditional approaches of search reduce many relevance weights to a single score. This means that if a result with a good title match gets a similar score to one with an exact phrase match, they’ll appear next to each other in a list. But a user can’t deduce from their score why each got it’s ranking, even though that information could be valuable. Oracle Endeca Information Discovery takes a different approach. The Endeca Server stratifies results by a primary relevance strategy, and then breaks ties within a strata by ordering them with a secondary strategy, and so on. Application managers get the explicit means to compose these strategies based on their knowledge of their own domain. This approach gives both business users and managers a deterministic way to set and understand relevance.
Now that you have an understanding of two of the core search capabilities in Oracle Endeca Information Discovery, our next blog post on this topic will discuss more advanced features including set search, second-order relevance as well as an understanding of faceted search mechanisms that include queries and filters.