Are Data Laundromats a Waste of Quarters?

There’s been some interesting discussion around what’s next for data quality and the fascinating challenges of cleaning data for data warehouses and business intelligence applications. I am always intrigued by blogs that discuss the challenges of data management and applying cleansing principles for complex data-centric applications across the enterprise. However, I’m dismayed by discussions that quickly jump to the conclusion of out-sourcing data quality as a software-as-a -service model. For example David Rosenberg writes on his blog:

Look for the emergence of third party B2B integration and commerce management service providers that support data entry and validation for all trading partners. Integrated suites of direct system-to-system integration and Web portal services will be supplemented with combined e-mail and smart-form technologies solving the data quality problem associated with paper-based exchanges with small and occasional trading partners.

While it sounds good on paper, I think this is more marketing spin than a realistic use case. I think we can meet the goal of achieving clean, trusted authoritative data without going off-premise. When we ask companies to deliver their most critical asset into 3rd party hands, it’s going to lead to more challenges that aren’t easily solved. Terabytes of data aren’t easily moved like sacks of dirty laundry. here are 5 reasons why the business model of outsourced data quality is ahead of its time:

1) Moving Data is hard – Moving terabytes of information off-premise –once- can be challenging enough, moving it as changes occur is even more challenging.
2) Auditing – to turn bad data good, means lots of changes. Keeping track of these changes and offering roll-back capabilities and full auditing is critical. How can these be easily managed when they are off-premise?
3) Customization– every company is unique how they approach data, even data like address information which would seem commonplace. Most on-premise data quality engine solutions have some type of customizable rules approach whereas many 3rd party solutions are using generic approaches.
4) Profiling – the forgotten aspect of data cleansing is first understanding and seeing the stain. Off-premise data cleansing solutions assume that the data needs cleansing, but the element of profiling needs to be applied on-premise within the enterprise wide data-centric applications. That’s not necessarily in a single source or a single data warehouse.
5)Trust – It is part psychology and part technology. Companies are likely to outsource certain aspects of their data. For example, a bank might outsource check scans, but only to validate what’s already typed into the system at the bank ATM. Companies will chose to keep most of their core data on-premise, so they’re still going to need an on-premise data quality solution to manage it.

If these data Laundromats sound utopian, it is because they are. I believe we may see some type of outsourced data quality, especially when they need to access outside information, for example DUNS, UNSPC, but not for the critical core business assets of the bulk of their data, I would first run them through an on-premise cleansing cycle.

Comments:

A timely post Dain, this is one of the most worrying trends I'm seeing in the data quality industry right now. I firmly believe companies need to be taking more care over more of their data, not shipping it out of the building, if they want to succeed in today's tougher climate. I agree with all your points above but the biggest problem I see lies with data quality rules. DQ rules are the very fabric that drive information services in every organisation on the planet and unfortunately most organisations have little idea of what they mean or how to manage them. (A good starting point is this article which features further tutorials: http://www.dataqualitypro.com/dqr) DQ rules are incredibly complex, most profiling tools only scratch the surface, finding simple stats and relationships are easy but the really complex rules, the ones that really matter to a business are very tough to discover and manage internally let alone outsourced. The problem lies when we ship this data out of the building, we don't ship the years of experience and business nous that keep this data afloat. It may well come back pristine in the context it was delivered but completely wreck an established business process in another part of the business. I've seen this a myriad of times where a company cleans up DQ in one part of the business, ignoring advice about understanding the complete information chain, then gets irate calls from the downstream business as their "cleansed" data plays havoc. What is required is a thorough commitment to data governance and data quality across the enterprise. Only when these are implemented can an organisation even begin to think about outsourcing its data quality improvement work, in my opinion. Sure, small scale bureau work for sorting out names and addresses can work okay in the short term but even here I don't really see the benefits long term, the root-causes should be resolved before cleansing is considered an option. And as for outsourcing data migration, don't even get me started... Interesting post Dain, really glad I came across it and I've added it to the Data Quality Pro blog roundup as it provides some important lessons for all concerned in the industry.

Posted by Dylan Jones on December 02, 2008 at 09:52 PM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
3
5
6
7
8
9
10
12
13
14
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today