By srenga on Mar 08, 2007
We were discussing recently why ETL was a Service Engine and why not a Binding component. Would like to post some of the implicit assumptions behind it. First of all, lets see what the spec says about the definition of SE and BC.
• Service Engine (SE). SEs provide business logic and transformation
services to other components, as well
as consume such services. SEs can integrate Java-based applications (and
other resources), or applications
with available Java APIs.
• Binding Component (BC). BCs provide connectivity to services external to a JBI environment. This can involve communications protocols, or services provided by Enterprise Information Systems (EIS resources). BCs can integrate applications (and other resources) that use remote access technology that is not available directly in Java.
The definition that BCs provide connectivity to services external to JBI environment created the argument that "ETL connects to external systems. So it should be rightly called a
BC".Here is the detailed analysis of why it should be an SE and not a BC.
ETL Service qualifies as an SE because it does lot more than connectivity and it also fits well with the definition from the spec as it does Extraction, transformation and Loading Services. \*Of Course, it connects to External systems to do the job, but note that that is not the Service its offering.\*
Its a data integration tool. More often such kind of tool is associated with off line batch processing. The tool can extract data from heterogeneous source like databases, files and xml documents . The usual mechanism used to extract and load data in case we are dealing with databases is using jdbc , while dealing with data in files we directly read the data from the files. we are still working on the xml documents.
Note that the tool requires /\*more than one protocol or transport\*/ in order to fetch data.
ETL as Service:
How do expose such capability as service. The ETL service engine is a /\*JBI(JSR 208) based service engine component \*/. This ETL Service engine exposes such kind of capability as a service, which means that this can be a part of any composite application.
ETL and JBI:
The ETL service engine component could have talk to the external systems in two ways.
1. All access to the data can be done through some binding component like jdbc binding component or file binding component. So the data flow would look like
[external-systems]<--------------->[jdbc-bc or file-bc]<--------------------->[ETL Service engine]
2. ETL service engine can directly access data using jdbc or any other mechanism instead of using some binding component.
[external-systems]<-------(more than one transport protocol)---------->[ETL Service engine]
The reason for taking the second approach is performance. More often ETL tool is involved in extracting and loading ( thousands , even millions of rows) . Any communication between JBI component is mediated through NMR ( Normalized message router) . Note in the first case the communication between the [BC] and [SE] would look like
[jdbc-bc or file-bc]<-----------NMR---------->[ETL Service engine]
Imagine the amount of data the NMR has to handle, when we extract and load millions of rows, through NMR.
Note in case2 the ETL is \*/not trying to /\*\*/to isolate the JBI environment from the particular protocol by providing normalization and denormalization from and to the protocol-specific format/\*, when it is talking to the outside world, which is actual job of a binding component. When it is accessing the external systems its is doing some ETL operations i.e (E and L of ETL)
Acknowledgements: I would like to acknowledge my colleague Sujit Biswas for the Reason2.