Challenges with the classification of content, or, "data labels suck"!
By Simon Thorpe on Jul 15, 2009
A mildly heated debate has arisen regarding whether the classification of content/data (and the use of labels) for security purposes is worth the effort. On one side of the discussion is the position that trying to manually classify and label both content and data is time consuming, prone to error and quickly results in out of date classifications/labels. The opposing opinion is whilst in theory there are problems, in the real world you need to apply some level of classification/labeling manually and it isn't as hard as you think if you keep things simple.
|Simple... this reminds of a key phrase that was drilled into me from day one when working on IRM. Simplicity is the key to effective security. Of course in reality implementing enterprise security is never simple, but the goal should always be to achieve the simplest solution possible. Humans are simple creatures, and the more complex a solution, the more the risk.
Anyway, back on topic. One difference being discussed is between the definition of information classification, e.g. what is the definition of top secret, and the mechanisms for applying the classification, "top secret", to data and documents. The crux of the dispute is that relying on manual application of the "top secret" classification doesn't work. The reasons being;
Technologies such as DLP can however provide some clever real time capabilities to identifying important information, yet often the methods of protection are invasive and limiting. For example you might want to copy an important document to your USB flash drive and the DLP technology stops you. This means frustration on the users part and trying to place restrictions on every possible end point can be complex and expensive.
"The next generation of IRM will see truly dynamic classifications; where a document is categorized by its content and not according to a security classification in place when the document was created."
Andy Peet, product manager
I spoke with Andy Peet, Oracle IRM product manager, who had the following to say on this topic.
"I don’t think that there is any disagreement that sensitive data needs to be classified. The problem faced by security products such as DLP and IRM is that it is easiest for these solutions to label a document with information about its classification. However data classification is a dynamic process: a highly restricted document today may become a public document in a few days time (for example quarterly financial results). So if a document is permanently labelled with a fixed classification then it cannot evolve with the data that it contains."
I've seen and dealt with this sort of discussion at most of our customers. Oracle IRM provides the ability to create classifications, define roles within them (e.g. can someone edit and print a document?), assign those roles to users and then apply the classification to a document. But customers often ask;
- What if someone doesn't apply the classification at all in the first place, we have no security!
- What if someone applies the wrong classification (secures a top secret document to the public classification), this is even worse than no security, the technology is now actively allowing the document to be accessed by the wrong people.
- What happens if my document changes classification? Or if someone joins the company after the point where the content was secured, how do I enable access?
Here is how with Oracle IRM we have often addressed these issues. IRM doesn't solve all the problems, but it does provide a simple and powerful mechanism for addressing a high percentage. Remember, good security is defense in depth. IRM and DLP combined also creates a compelling set of solutions and we are working with the leading DLP companies to have integrations between the technologies.
What if someone doesn't apply the classification at all in the first place, we have no security!
A very common question. IRM doesn't enforce the creation of sealed content for all documents, users have to actively make the decision to classify and then secure the document... or do they? We've been developing IRM for over 10 years and therefore we've had plenty of time to create solutions for this problem. The best way to address this concern is to remove the choice from the end user, simply apply the classification in a way that makes sense from the start. How?
Using sealed Office templates
With Oracle IRM you can seal Word, Excel and PowerPoint templates. Users then use existing work flows for creating documents from templates and therefore the classification choice is removed, it is instead predefined. From the first instance of the document, it is always sealed and protected. But what if the document already exists? Or if the user didn't choose to start from a template?
Protecting the content storage location
We've done many integrations with the location where content gets stored. We have an integration with our own Oracle content server which allows documents to be automatically classified and sealed upon checkin, and we've had customers do integrations with Documentum and Microsoft SharePoint. Again here the user is unaware of making the classification decision, the system automates the protection. We also have a tool called "Hot Folders" which allows for content to be secured when it is stored on a file system, either locally or a network file share. But what if a user stores the content in a location that doesn't have an IRM agent actively classifying and protecting the content?
Integrating with DLP technologies
Another line of defense is to leverage the powerful content scanning and identification functionality offered from DLP. If the user attempts to store that sensitive financial report onto their local USB flash drive, instead of preventing them from performing the copy, simply have DLP call to IRM to encrypt it. The user gets to move the content, securely, because IRM secures access to the content no matter where it resides, and they are still unaware that a decision to classify the content has been made.
In summary IRM, integrated with the storage locations and DLP, provides a much richer solution to classifying and securing content.
What if someone applies the wrong classification (secures a top secret document to the public classification), this is even worse than no security, the technology is now actively allowing the document to be accessed by the wrong people.
This is a tough one. Any technology that provides the user with a choice to secure a document is prone to poor decision making. Consider a network file share, one called "financial reports" which contains sensitive financial documents and has a very limited access control list and one called "financial documents" which has public documents for redistribution and access by a wide number of users. It doesn't take much for a user to drop a document into the wrong location. The same is true with IRM, someone could choose the wrong classification and allow the wrong set of users to potentially have access. Remember IRM is part of a total security solution, so just applying the wrong classification doesn't mean unauthorised users get access, only that they have the potential, they must first get their hands on the document. But this can happen...
Real time auditing with Oracle IRM
IRM may not be able to stop mis-classification, but it can provide audit reporting of both the securing of content and access. IRM audit logs contain information about the file, who is accessing/securing it, from what IP address (both local and firewall) plus other data. This can be combined with other technologies such as DLP, Governance, Risk, and Compliance (GRC) tools and Business Activity Monitoring (BAM).
Obviously there is still risk involved, but at least with IRM you now have the ability to view every single access to your sensitive information. Again, DLP here plays a good role in being able to identify information that isn't just unclassified, but has been misclassified.
What happens if my document changes classification? Or if someone joins the company after the point where the content was secured, how do I enable access?
This is a question at the heart of the dispute. Information changes and therefore so does its classification. Consider the following;
- You have open an email and enter in employee details such as address and social security number.
- You are working on a spreadsheet about a 35nm technology and enter in details about a 22nm technology you are working on.
- You edit a financial document and remove information pertaining to quarterly financial results.
All three are scenarios mean the classification of the document or email changes. Use case one is a document that requires classification, use case two is a change in classification and the third is actually content being declassified. I have spent a lot of time with customers who deal with export control and foreign national compliance regulations which dictate who should have access to what information, often based on the type of technology. For example the US government may decide that the point at which compliance controls take effect moves from 35nm to 22nm and therefore all documents classified as 35nm change classification from controlled, to non-controlled. The subject matter of the document doesn't change, but the classification has.
Separation of rights from the content.
There are different scenarios which breed different methods for addressing this problem. They typically depend on the model used for classification, but fall into two main areas. One where the classification applied reflects the documents content, i.e. in the above example the classification would be "35nm technology" and the other is where the classification is directly mapped to the document, i.e. "L1 Top Secret technology documents"
Now we come to one of the most important aspects of Oracle IRM. The separation of rights from content. This allows for dynamic changes to be made to rights on the server and this affects all content associated with those rights, in the above example imagine.
- 1000 documents classified as "35nm technology - chip designs".
- 1 group on the IRM server called "3E001 Chip designs - top secret"
- The above group assigned a "Contributor" role which allows the engineers to create, edit, print the document.
- 500 engineers who are a member of the above group
The 3E001 is an example of an ECCN number. Lets say that this set of documents is no longer covered by this control and therefore all the 1000 documents are reclassified. With Oracle IRM this is easily handled by reassigning another group, lets say "Declassified information - top secret". Which contains a wider variety of users within the company that can now access this information BUT it still remains classified as top secret to the organization. Because this change is made on the server, the next time someone tries to access a secured document, then the new rights are issued and it all happens dynamically!
Oracle IRM allows the resealing, or reclassification of content.
Another example, depending on the classification model, is the ability to reclassify the documents themselves. Lets say that in the above example we had a document classified as "35nm technology - chip designs" and the document actually had some new content insert which contained details on a 22nm technology. This document now needs to be reclassified. Oracle IRM allows you to re-encrypt the document from "35nm technology - chip designs" to "22nm technology - chip designs". This can either be done by the end user or en mass automatically. It could be a change on meta data in the content repository and all documents get reclassified automatically.
Oracle IRM 11g brings even more dynamic possibilities
Now there are still some limitations here, documents still have some sort of descriptive classification that needs to be managed and is relatively static. The next release of Oracle IRM makes another big leap in this area, i'll let the words of Andy Peet explain.
"The next generation of IRM will see truly dynamic classifications; where a document is categorized by its content and not according to a security classification in place when the document was created. This enables the classification to evolve as the content enters different stages of its lifecycle. Take the simple example of document workflow: a set of marketing collateral for an exciting new campaign is being collaborated on (highly restricted), then the collateral is sent off for wider internal review (company confidential), after review the collateral is shared with trusted external partners (restricted) and eventually at the launch of the campaign the collateral is revealed to consumers (public). This demonstrates how the same content has its classification dynamically changing; it is exactly these processes that Oracle IRM will be supporting in future releases as the technology evolves to match the business requirements."
So yes, there is a huge challenge in trying to apply a classification model to information. In theory it would be fantastic if it was possible to have security always use dynamic classification but in reality this isn't available yet with current technologies. Oracle IRM is close and by far the leading IRM technology which has a lot of the required capabilities. From its creation it has separated the rights from the information which is crucial to an effective, large scale enterprise rights solution.
We see in the real world that IRM is applied at a high level and in simple scenarios, for instance insuring all sensitive content about a large acquisition is protected or a corporate wide classification as a catch all to sensitive data. Our experience is that simplicity is key and we are often advising customers to make compromises between the complex regulatory controls, complex inter-enterprise security requirements and apply IRM with large scale simple classifications.