We're still in the early days of big data and a lot of the focus continues to be on use cases and figuring out how to get value from the technology and the data. Many of the use cases involve customer data or personal data in some way. And as is typical in these early days, secondary considerations are often glossed over until they become critical. One of those considerations, particularly for big data, is privacy. Companies tend to look at grabbing data and extracting value from data and we technology practitioners are happy to help them. But it's not always clear if the companies collecting personal data should be using all of that data and if they do, how they should involve consumers. I think it's our duty, as practitioners and enablers of big data, to be knowledgeable on privacy issues and address privacy in big data endeavors.
I was encouraged when I attended the Strata+Hadoop conference recently because there was a track on Law, Ethics, and Open Data. It wasn't the hottest topic at the event but it was fairly well attended indicating that others are thinking about this too. Intuit did one of the presentations and talked about how they have brought their legal and data science teams together, and even though it sounds counterintuitive (pun intended), this marriage of legal and tech is helping them drive innovation. Check out their presentation here.
Intuit looked at how they were handling customer data across all of their products and there was considerable inconsistency because they had been making those decisions in isolation for each product. In their new approach, they established their mission to democratize the data for the benefit of the customer and then they defined data stewardship principles that teams could use to guide decision making.
This should be a familiar approach to us in the engineering world. As a former consultant, some of the best projects I have been involved in established guiding principles at the outset that helped the project team stay focused and allowed decisions to be made more quickly and effectively.
That raises the question of what the guiding principles around a big data solution should be. I don't think there is any one answer that fits all situations but I do think there are some common themes that deserve further exploration.