X

Cloud Security Perspectives and Insights

Three Big Data Threat Vectors

The Biggest Breaches are Yet to Come

Where a few years ago we saw 1 million to 10 million records breached in a single incident, today we are in the age of mega-breaches, where 100 and 200 million records breached is not uncommon.

According to the Independent Oracle Users Group Enterprise Data Security Survey, 34% of respondents say that a data breach at their organization is "inevitable" or "somewhat likely" in 2015.

Combine this with the fact that the 2014 Verizon Data Breach Investigations Report tallied more than 63,000 security incidents—including 1,367 confirmed data breaches. That's a lot of data breaches.

As business and IT executives are learning by experience, big data brings big security headaches. Built with very little security in mind, Hadoop is now being integrated with existing IT infrastructure. This can further expose existing database data with less secure Hadoop infrastructure. Hadoop is an open-source software framework for storing and processing big data in a distributed fashion. Simply put, it was developed to address massive data storage and faster processing, not security.

With enormous amounts of less secure big data, integrated with existing database information, I fear the biggest data breaches are yet to be announced. When organizations are not focusing on security for their big data environments, they jeopardize their company, employees, and customers.

Top Three Big Data Threats

For big data environments, and Hadoop in particular, today's top threats include:
  • Unauthorized access. Built with the notion of “data democratization”—meaning all data was accessible by all users of the cluster—Hadoop is unable to stand up to the rigorous compliance standards, such as HIPPA and PCI DSS, due to the lack of access controls on data. The lack of password controls, basic file system permissions, and auditing expose the Hadoop cluster to sensitive data exposure.
  • Data provenance. In traditional Hadoop, it has been difficult to determine where a particular data set originated and what data sources it was derived from. At a minimum the potential for garbage-in-garbage-out issues arise; or worse, analytics that drive business decisions could be taken from suspect or compromised data. Users need to know the source of the data in order to trust its validity, which is critical for relevant predictive activities.
  • DIY Hadoop. A build-your-own cluster presents inherent risks, especially in shops where there are few experienced engineers that can build and maintain a Hadoop cluster. As a cluster grows from small project to advanced enterprise Hadoop, every period of growth—patching, tuning, verifying versions between Hadoop modules, OS libraries, utilities, user management etc.—becomes more difficult. Security holes, operational security and stability may be ignored until a major disaster occurs, such as a data breach.
Big data security is an important topic that I plan to write more about. I am currently working with MIT on a new paper to help provide some more answers to the challenges raised here. Stay tuned.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
Oracle

Integrated Cloud Applications & Platform Services