Tuesday Jun 30, 2015

Oracle Commerce, Data Ingest for Indexing: the Content Acquisition System Forge-less Pipeline

Written by Phil Franklin, Principal Instructor, Oracle University

Summary

This article is intended to give Oracle Commerce platform users information about the Forge-less pipeline design for indexing and configuring data for the Endeca Guided Search MDEX engine.

The Content Acquisition System (CAS) is a key component in the indexing architecture, whether you're a full Oracle Commerce platform user or you're using Endeca Guided Search as a stand-alone product.

History

First, let’s take a look at the history of data ingest into the Endeca MDEX engine.

NOTE: RS=record store, LMC=last-mile-crawl, ECR=Endeca Config Repository
Blue arrow= pull operation Green arrow=push operation


Referring to the numbering on the diagram above:

1. Forge/Dgidx pipeline (Information Transformation Layer) - this is the ‘traditional’ way data and config for the MDEX index was done. Forge data design was done with Developer Studio. This tool allowed visual modification of the configuration files contained within the config\pipeline folder for the indexing project. Forge is a 32-bit process which can only pull data from data sources for transformation.

It is also single-threaded so there can be challenges in processing large data sets. This sometimes meant using multiple instances of forge or writing multithreaded Java manipulators. Forge does have the capability of joining data from multiple data sources and generating Endeca record structures as a result of the join.

2. CAS-Forge-Dgidx - the Content Acquisition System was originally created to acquire data from ‘unstructured’ data sources like file systems and various third party CMS systems where large text fields or documents could be found. Document conversion can be plugged in to convert documents into metadata source fields and text for indexing. First uses of CAS pulled data into the Forge process via a custom adapter which could connect to the CAS service recordstores. 

CAS was later extended to cover data sources that originally could only be imported directly into Forge while retaining its unstructured data capabilities. After the Oracle acquisition the Third Party CMS adapters were removed and CAS became the recommended method to ingest data. It managed dimension-value configuration items for indexing.

The most sophisticated usage of this data ingest design in Oracle Commerce is represented by the Product Catalog integration template used to integrate the core platform (ATG) with Endeca up to the release of version 11-1 (see diagram above).

The latter indexing project template uses two instances of Forge: one to generate configuration files and a second to use the generated files (and some manually managed files) to generate the information for Dgidx. Usage of Developer Studio became difficult in the latter design, as most configuration information was now stored elsewhere. 

For both (1) and (2), Forge can still be a bottleneck and dimension values (classification values) require careful management, as there is no central place of record for them. They are generated to data\state and to forge_output; moving them between environments (staging and production) requires care.

3. CAS-Dgidx - this now the recommended design from release Oracle Commerce 11.1 and completely removes Forge (and the use of Developer Studio for configuration).

The Benefits

  • Data can both be pushed (via command line, scripts or API) into CAS record store, or can be acquired via a pull mechanism (CAS Crawl).
  • CAS is a 64 bit multi-threaded server; can carry out these operations on demand and in parallel.
  • CAS recordstores support both full and incremental updates, which can take place asynchronously to the indexing process.
  • Document conversion and data manipulation in CAS prior to indexing is also supported.
  • Via the ‘last-mile-crawl’ component dedicated to each indexing project, data and configuration is integrated. Configuration is obtained from three main places: managed dimension values from a CAS recordstore, schema and precedence rules from the Endeca Configuration Repository (ECR), along with the indexing project’s config\mdex folder. The ECR also has a programmatic API, so it can be populated automatically. The ‘last-mile-crawl’ writes out all the information required for Dgidx to index the project.
  • The project also has a centralised dimension ID manager, which provides a central place of record for managing dimension values for the indexing project – a big improvement over the Forge-based approach.

Architecture

The diagram above shows the essential elements of a Forgeless pipeline. Note that for the 11-1 Catalog integration project, that data and dimvals are pushed into record stores via the APIs. Only the last-mile-crawl is executed as a crawl via the baseline and partial update scripts.

Note that the CAS design can be extended to include other inputs which may be from Third Party inputs and which need not necessarily be extracted or pushed from a core platform repository.

Possible Futures

The 11-1 release represents the first time the forge-less design became the default approach for data and config integration for indexing. Currently the CAS design cannot join data natively in the CAS system itself, so any data joining must be achieved externally before loading recordstores. We can perhaps expect further developments in this area in future (please note: the opinions are solely those of the author).

What can be said is that the forge-less approach is the product direction, so for any new projects or major updates to existing projects using Endeca Guided Search, we recommend you take a good look at it.

Note that the Oracle University “Oracle Commerce: Implementing Guided Search 11-1 (3 days) class covers this area in detail.

About the Author

Phil Franklin

Phil Franklin has over 25 years experience building, delivering and managing consulting and education services in privately owned and publicly quoted companies, with emphasis on software development and process improvement. He has successfully held CTO and service director management positions over the course of his career. Over the last 10 years, Phil has specialized in technical development education for ecommerce and search solutions, coming to Oracle via the Endeca acquisition. He teaches Oracle Commerce (Endeca/ATG) and Oracle Knowledge, as well as development classes for Cloud and the Java language.

Thursday Apr 23, 2015

Oracle Commerce: Guide to Selecting the Right Training

With the recent availability of Oracle Commerce Release 11.1, Oracle continues to unify and enhance the leading commerce, content and experience technologies on the market.

Given that Oracle Commerce combines ATG Web Commerce and Oracle Endeca Commerce technologies, select courses from the existing ATG and Endeca Commerce curriculum also apply to Oracle Commerce Rel 11.1 customers. For those looking for training, your training needs will vary depending upon your particular situation. 

At Oracle University, we recognize this can be confusing to you. Therefore, we created a Guide to Help You Select the Right Oracle Commerce Training.


This guide addresses multiple customer scenarios, including:
  • New Oracle Commerce customers
  • Customers upgrading from and earlier release of ATG or Endeca Commerce
  • Customers only implementing 'Guided Search' within an existing ATG, Oracle Commerce or other commerce platform environment
  • Customers in need of training to support an existing ATG or Endeca Commerce implementation

View the Oracle Commerce Training Selection Guide now.

Monday Dec 22, 2014

New Oracle Commerce Rel 11.1 Training

With the recent availability of Oracle Commerce 11.1, Oracle continues to make strides in unifying and enhancing the leading web commerce, content and experience technologies on the market.

Courses built on Oracle Commerce 11.1 will continue to become available over the coming months. 

Given that Oracle Commerce combines ATG Web Commerce and Endeca Commerce Technologies, specific courses from the existing ATG and Endeca Commerce curriculum also apply to Oracle Commerce Release 11.1 customers. 

The following is a summary of the recommended training.  At the bottom are learning paths to further guide you.

Oracle Commerce Business User Training

Oracle Commerce Developer Training

The following courses are designed for developers focused on Guided Search and Experience Manager in cases where ATG Commerce may not be implemented.

Oracle Commerce System Administrator Training

Training on Prior Releases of the Oracle Commerce, ATG, or Endeca Commerce

We recommend that customers who are following a learning path on a previous release of the curriculum and intend to take another course or are implementing a prior release of the products, should contact their Oracle University representative to get specific recommendations on how best to meet your training needs.

Oracle Commerce Service Center Training

Due to the highly customized nature of the Oracle Commerce Service Center product, no formal training courses are currently available.  Custom training, however, is available through Oracle University as a private event. Contact your Oracle University representative for details.

See all available Oracle Commerce training.


Wednesday May 21, 2014

Oracle Commerce, ATG and Endeca - Which Training Should I Take?

 by Jim Vonick, Oracle University Market Development Manager

Oracle Commerce Release 11 makes strategic headway in unifying and enhancing ATG and Endeca Commerce technologies. ATG and Endeca Commerce are the leading commerce, content and experience technologies on the market.

But what about available training? 

Does it matter which courses you take if you're an Oracle Commerce 11, ATG Web Commerce or Endeca Commerce customer? 

Take Oracle 11 Commerce Training

There is currently one course that's specifically developed for Oracle Commerce Release 11.

This course, Oracle Commerce: Managing Your Site Using Business Tools Rel 11, is ideal for business users. Training built specifically for Oracle Commerce 11 will continue to become available over the coming months.

Since Oracle Commerce combines ATG Web Commerce and Oracle Endeca Commerce, specific courses from the existing ATG and Endeca Commerce do apply to Oracle Commerce Rel 11 customers. 

Oracle University has created a guide to help you select the best courses so you can simplify your Oracle Commerce 11 implementation or upgrade. 

View the Oracle Commerce: A Guide to Select the Right Training. 

Enroll in ATG and Endeca Training

If you're an ATG or Endeca customer, please note that all of the existing curriculum will maintain its current course titles. Since each course either includes "ATG" or "Endeca" in the title, this will give you some assurance that you're looking at the right courses. 

ATG Web Commerce Courses

If you're implementing or upgrading to ATG Web Commerce Rel 10.1 or 10.2 (or even prior releases), we recommend that you attend the current ATG training. Please visit the ATG training page on the Oracle University website. From there, you'll see three job role categories from which you can select.

Select the job role that best identifies what you need to learn. 

  • ATG Business Users
  • ATG Developers
  • ATG System Administrators 

Endeca Commerce Courses

If you're implementing Endeca Commerce 3.1 or 3.1.2 and are interested in training, consider taking one of the existing Oracle Endeca Commerce courses. Training is available for:

  • Implementation Team Members
  • Developers
  • Business Users 
Please feel free to leave comments or ask questions about any of these courses below. We welcome your feedback!

About

Oracle University is THE trusted provider of quality, expert Oracle training & certification. All training is delivered by our Elite global team of Oracle experts and made available in multiple learning formats for anytime, anywhere training. Delivery methods include:
- Traditional classroom,
- Live Virtual Class,
- Oracle Learning Streams,
- Oracle Learning Cloud and
- Training On Demand.
For buying confidence all our training is backed by the unique 100% student satisfaction guarantee.

Subscribe to the Latest OU Blog Posts by Email

OU Website

Subscribe to OU Training Newsletters

Oracle Certification Blog

Search

Archives
« August 2015
SunMonTueWedThuFriSat
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
     
Today