Saturday Sep 20, 2008

DataMashup Service Engine available with GlassfishESB

Recently the OpenESB Community announced GlassfishESB. Interesting fact is DataMashup Service Engine part of the open-dm-ds project of Mural community is now available with GlassfishESB. You can  use  colleague Manishs' DataMashup Tutorial as a good starting point to play around.  Will blog more about this component and its features. Stay tuned for some cool stuff.

Monday Sep 15, 2008

Shallow dive into Mural


Mural is an Open Source Community which aims to build the Infrastructure for Master Data Management through its various sub-projects which cater to different aspects of the MDM Life Cycle.

Mural Logo

Before going any further let us be clear on what Master Data Management is all about.  Products, Customers, Partners ... forms the fundamental vectors of a Business Entity. Its extremely critical and fundamental for a business to be able to answer queries like

  • How many unique employees work for the Company?
  • Who are the Customers/Partners I am having relationships with and how are they related?
  • Complete visibility into the Products
  • For a Government, it manifests as Citizen services and complete visibility into Citizen information(Single Citizen View)
  • For a HealthCare Network, it manifests as a Single Patient View etc.
  • Master Data Management is a discipline backed by technology, tools and processes to provide answers to such queries. 
In subsequent entries, we will dive deeper into different aspects of MDM, benefits of OpenSource MDM etc.  







Friday Aug 24, 2007

DataMashup@JavaOne2007

I have been doing bad job with blogging off-late and thought I will do some catch up.  Very reason off-course was probably because I got absorbed into what our team was building here at the Sun Bangalore office. It all started when we realized that the extraction capabilities of the ETL Service Engine has its applications in building virtual view of heterogenous datasources and it fits into the category of Server-Side Datamashup. It felt really good at that point and then no more looking back.

 

So, we built a service engine called EDM Service engine. Here are the useful links to explore

Community corner talk at JavaOne2007 

DataMashup Project Wiki Page 

DataMashup demos showcased @ JavaOne2007

 

If you are interested in getting this demo up and running, you can get all of it here.

DataMashupDemoSetup

 

So what are we doing with it now?

We are building Datamashup as a RESTful webservice. Hoping to bring it out soon.


 


 

Saturday Mar 17, 2007

Where to get started

ETLSE  is a good place to learn more about this project.

Where to get started

ETLSE is a good place to learn more about this product.

Thursday Mar 08, 2007

Why ETLSE and not ETLBC?

We were discussing recently why ETL was a Service Engine and why not a Binding component. Would like to post some of the implicit assumptions behind it. First of all, lets see what the spec says about the definition of SE and BC. 

 • Service Engine (SE). SEs provide business logic and transformation services to other components, as well as consume such services. SEs can integrate Java-based applications (and other resources), or applications with available Java APIs.

• Binding Component (BC). BCs provide connectivity to services external to a JBI environment. This can involve communications protocols, or services provided by Enterprise Information Systems (EIS resources). BCs can integrate applications (and other resources) that use remote access technology that is not available directly in Java. 

 

The definition that BCs provide connectivity to services external to JBI environment created the argument that "ETL connects to external systems. So it should be rightly called a

BC".Here is the detailed analysis of why it should be an SE and not a BC.

 

 

 

Reason1:  

ETL Service qualifies as an SE because it does lot more than connectivity and it also fits well with the definition from the spec  as it does Extraction, transformation and Loading Services. \*Of Course, it connects to External systems to do the job, but note that that is not the Service its offering.\*

 

Reason2:

 


About ETL:
---------------
Its a data integration tool. More often such kind of tool is associated with off line batch processing. The tool can extract data from heterogeneous source like databases, files and xml documents . The usual mechanism used to extract and load data in case we are dealing with databases is using jdbc , while dealing with data in files we directly read the data from the files. we are still working on the xml documents.
Note that the tool requires /\*more than one protocol or transport\*/ in order to fetch data.


ETL as Service:
-----------------------
How do expose such capability as service. The ETL service engine is a /\*JBI(JSR 208) based service engine component \*/. This ETL Service engine exposes such kind of capability as a service, which means that this can be a part of any composite application.


ETL and JBI:
------------------

The ETL service engine component could have talk to the external systems in two ways.

1. All access to the data can be done through some binding component like jdbc binding component or file binding component. So the data flow would look like

[external-systems]<--------------->[jdbc-bc or file-bc]<--------------------->[ETL Service engine]

2. ETL service engine can directly access data using jdbc or any other mechanism instead of using some binding component.

[external-systems]<-------(more than one transport protocol)---------->[ETL Service engine]



The reason for taking the second approach is performance. More often ETL tool is involved in extracting and loading ( thousands , even millions of rows) . Any communication between JBI component is mediated through NMR ( Normalized message router) . Note in the first case the communication between the [BC] and [SE] would look like



[jdbc-bc or file-bc]<-----------NMR---------->[ETL Service engine]


Imagine the amount of data the NMR has to handle, when we extract and load millions of rows, through NMR.


Note in case2 the ETL is \*/not trying to /\*\*/to isolate the JBI environment from the particular protocol by providing normalization and denormalization from and to the protocol-specific format/\*, when it is talking to the outside world, which is actual job of a binding component. When it is accessing the external systems its is doing some ETL operations i.e (E and L of ETL)

Acknowledgements: I would like to acknowledge my colleague Sujit Biswas for the Reason2.  

 

 


Monday Dec 25, 2006

Merry Christmas & Happy New Year!!!!!!!!!!!

Hello World!!!!

I wanted to do something really exciting this New Year. So, here I am creating my first entry. At Sun, I work on the Java CAPS product development. I am also involved in the open-source project ETLSE. My current interests include understanding the way JBI evolves the Data Integration space. I intend to write more about it.

About

This blog is dedicated to writing about my work and work life at Sun.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today