Thursday Sep 10, 2009

"Anonymized" Data Really Isn't

I enjoy watching re-runs of the television drama, NCIS, where a dysfunctional little group of crime-fighting superstars often analyze divergent bits of data to solve seemingly unsolvable mysteries.  Last night, Agent McGee correlated data from phone records, automobile registrations and police station activity records to pinpoint a bad cop in collusion with an international drug lord.  Far fetched?  Perhaps not.

I have been spending much of my time recently preparing a white paper addressing the issues of HIPAA privacy and security compliance, particularly in light of expanded regulations emerging from the “stimulus bill” signed into law earlier this year.  As I have explored privacy issues related to electronic health records, I was particularly intrigued by an article by Nate Anderson entitled “’Anonymized’ Data Really Isn’t and here’s why not”, published in Ars Technica earlier this week.

On the surface, it would seem that removing obvious identifiers such as name, address and Social Security Number from a person’s data record would cause that record to be “anonymous” – not traceable to single individual.  This approach is commonly used by large data repositories and marketing firms to allow mass data analysis or demographic advertising targeting.

However, work by computer scientists over the past fifteen years show that it is quite straightforward to extract personal information by analyzing seemingly unrelated, “anonymized” data sets. This work has “shown a serious flaw in the basic idea behind ‘personal information’: almost all information can be 'personal' when combined with enough other relevant bits of data.” 

For example, researcher Latanya Sweeny showed in 2000 that “87 percent of all Americans could be uniquely identified using only three bits of information: ZIP code, birthdate, and sex."

Professor Paul Ohm of the Colorado School of Law, in his lengthy new paper on "the surprising failure of anonymization, wrote:

As increasing amounts of information on all of us are collected and disseminated online, scrubbing data just isn't enough to keep our individual "databases of ruin" out of the hands of the police, political enemies, nosy neighbors, friends, and spies.

If that doesn't sound scary, just think about your own secrets, large and small—those films you watched, those items you searched for, those pills you took, those forum posts you made. The power of re-identification brings them closer to public exposure every day. So, in a world where the PII concept is dying, how should we start thinking about data privacy and security?

Ohm went on to outline a nightmare scenario:

For almost every person on earth, there is at least one fact about them stored in a computer database that an adversary could use to blackmail, discriminate against, harass, or steal the identity of him or her. I mean more than mere embarrassment or inconvenience; I mean legally cognizable harm. Perhaps it is a fact about past conduct, health, or family shame. For almost every one of us, then, we can assume a hypothetical 'database of ruin,' the one containing this fact but until now splintered across dozens of databases on computers around the world, and thus disconnected from our identity. Re-identification has formed the database of ruin and given access to it to our worst enemies.

I won’t ask what your “blackmail-able facts” might be, and won’t tell you mine.  But it is sobering to think what abuses might emerge from the continued amassing of online data about all of us.  This certainly casts new light on the importance of privacy and security protections for all of our personal data.

Wednesday Dec 10, 2008

Kearns: Faking it Online

In his Network World column this morning, Dave Kearns addressed the issue of online "pseudonymity" - the use of artificial or "fake" identities.  He indicated that the use of a fake identity or fake persona online doesn't automatically make one a criminal.

I acknowledge Dave's reasoning, but propose that any attempt to use a false identity with the intent to defraud, harm another person or otherwise do mischief is at least unethical, if not criminal.  I loathe the practice of people hiding behind the cloak of online anonymity or pseudonymity to do and say things they apparently do not have courage enough to do or say in the open. 

A long time ago, one of my engineering professors told us, in essence, "Always be proud enough of something you produce that you will gladly put your name on it."  That is sage advice that bears repeating, even in the online world.

Technorati Tags: ,

Saturday Sep 22, 2007

Identity Deception ... So Much Cooler Online

I heard a song on the radio today that aptly illustrates the stark difference between Identity reality and Identity fantasy acted out on line by many people. The protagonist in Brad Paisley's hit son Online admits, "I'm 5 foot 3 and overweight ... And I've never been to second base. But there's whole 'nother me that you need to see. Go checkout MySpace. 'Cause online I'm out in Hollywood. I'm 6 foot 5 and I look d\*\*\* good. I drive a Maserati. I'm a black-belt in karate ... So much cooler online."

It sounds like the guy in Brad's song is fairly harmless, but too many participants on the Internet hide behind a mask of anonymity, presenting themselves as someone they are are not. There are too many predators out there posturing as harmless friends, sometimes just lurking, sometimes actively seeking to lure innocents into a web of deceit and destruction. I understand and accept a few valid reasons for anonymity on line, but the deliberate deception practiced by far too many casts a sordid shadow across cyberspace.

Technorati Tags: , ,


Discovering Identity was founded on in May 2005 as a means of documenting my exploration of the field of Identity and Access Management. In February, 2010, I switched to hosting the blog at In March 2012, I began posting Oracle-related information in both places.

Thanks for stopping by.

Please connect with me in cyberspace at LinkedIn or Twitter.

The views expressed on this blog are my own and do not necessarily reflect the views of my employer, Oracle Corporation, or any other person or organization.


« July 2016