The Biggest Breaches are Yet to Come
Where a few years ago we saw 1 million to 10 million records breached in a single incident, today we are in the age of mega-breaches, where 100 and 200 million records breached is not uncommon.
Combine this with the fact that the 2014 Verizon Data Breach Investigations Report tallied more than 63,000 security incidents—including 1,367 confirmed data breaches. That's a lot of data breaches.
As business and IT executives are learning by experience, big data brings big security headaches. Built with very little security in mind, Hadoop is now being integrated with existing IT infrastructure. This can further expose existing database data with less secure Hadoop infrastructure. Hadoop is an open-source software framework for storing and processing big data in a distributed fashion. Simply put, it was developed to address massive data storage and faster processing, not security.
Top Three Big Data Threats
For big data environments, and Hadoop in particular, today's top threats include:
- Unauthorized access. Built with the notion of “data democratization”—meaning all data was accessible by all users of the cluster—Hadoop is unable to stand up to the rigorous compliance standards, such as HIPPA and PCI DSS, due to the lack of access controls on data. The lack of password controls, basic file system permissions, and auditing expose the Hadoop cluster to sensitive data exposure.
- Data provenance. In traditional Hadoop, it has been difficult to determine where a particular data set originated and what data sources it was derived from. At a minimum the potential for garbage-in-garbage-out issues arise; or worse, analytics that drive business decisions could be taken from suspect or compromised data. Users need to know the source of the data in order to trust its validity, which is critical for relevant predictive activities.
- DIY Hadoop. A build-your-own cluster presents inherent risks, especially in shops where there are few experienced engineers that can build and maintain a Hadoop cluster. As a cluster grows from small project to advanced enterprise Hadoop, every period of growth—patching, tuning, verifying versions between Hadoop modules, OS libraries, utilities, user management etc.—becomes more difficult. Security holes, operational security and stability may be ignored until a major disaster occurs, such as a data breach.
Big data security is an important topic that I plan to write more about. I am currently working with MIT on a new paper to help provide some more answers to the challenges raised here. Stay tuned.