Matching Content Lifecycle to Storage
By Brian Dirking on Oct 18, 2010
Documents, images, e-mails and other unstructured data comprise 80% of the information in organizations. Yet, this content is scattered across many different repositories, file systems and storage solutions. In many cases, your organization might not be aware of what information you are managing. Without knowing what you are managing, you cannot ensure you are using the right storage for that content. So how can you cut your costs and improve performance by managing this information effectively?
Some interesting facts about content:
- Content tends to get created and stored randomly in most organizations
- It has a tendency to rot - in the short term, it might not get updated, it might not get accessed. In the long term it might get stored on a medium that stops working, in formats that we can't read.
- As it lives in repositories or file shares, we often don't know what we are managing on these systems.
- We can't find the content that is useful, and sometimes we find the content that is detrimental.
- We spend thousands of dollars recreating content that is useful (See Monica Crocker, Corporate Records Manager at Land O' Lakes discuss how as a consultant she was often brought in to create content, only to find it had been created already but no one could find it).
- Sometimes we find content that is detrimental because it is old and out of date, and we act on it, which can also be costly.
- We often have no usage statistics on content. And without comprehensive usage information, it's hard to judge the relative value of content that is not usage tracked.
- Content is often stored on the wrong medium - archived content is stored on magnetic media where it is costly every day to store. Magnetic media is also not a stable medium over the long term.
- Key content cannot be replenished without a lifecycle plan.
Without knowing what we are managing, we can't know the value of any of the content we are storing. Part of this story is knowing usage statistics for content as it is stored and managed, and part of it is setting a lifecycle policy for content when it is created.
One of my favorite parts of Oracle Universal Content Management is Content Tracker. Tracker not only tells me what documents are being visited, it can tell me by who, by which department, and how often. I may find that a document I have written is constantly referred to. I may decide to make it more visible - put it on a page on the intranet. I may decide to add more to it, or I may decide to update it more often. Without it I am shooting in the dark about the content I create and how it is being used.
So we have all this content. The difficulty is in sifting through all the content to find the important stuff. In many cases you can base that upon metadata - the document type, the author - these can be key indicators of important content. But often you have to have other methods. A content use tracking system can be a good way to determine important content - less used content is less important. Some content is important only in the short term. A sales proposal might be important for 30 days. A marketing plan might be important for a year. Then there is some content that is not important on a day to day basis, but is important in the long term. You might not refer to your insurance policy for 40 years, but when you need it, it might be the most important document in the company.
One of my favorite stories about old documents in litigation comes from Simplot. Simplot is one of the largest private companies in the world. They make and sell food products (Sara Lee, McDonald's French fries) as well as fertilizers and pesticides. When the Fresno County water district found traces of a Simplot pesticide in the ground water that appeared to have been introduced in the 1950s, a lawsuit was ensued. Simplot's director of records management, Dave McDermott (former President of ARMA, and an Oracle Records Management customer) pulled all of Simplot's insurance policies. And upon reading them, he found a common clause that stated that "this policy covers this period of time in perpetuity" meaning that any claim against the company for that period of time was still covered. Dave served 52 insurance companies, and in a pre-trial hearing, in front of their attorneys, the judge grilled him on his records management program. The insurance firms challenged the veracity of his documents. At the end of the day, the judge conceded that it was probably the best records management program he had ever seen, and that he saw no reason to stop the suit. The following Monday, 50 of the insurance companies settled, and eventually the other two did as well, resulting in a $23 million payout to Simplot.
One reason I love this story is it depicts records management as a revenue generator. The other reason is because it illustrates how documents can be of no value day to day for decades, and then suddenly hugely important one day. Having a way to ensure you can get to them that day is a key part of our job. And that is where being able to set a lifecycle policy for content is important.
The ability for a content management system to apply its knowledge of the content to storage choices is very powerful. The content management system knows usage statistics, it also knows lifecycle metadata and retention schedules. By applying this knowledge to the storage system it can save valuable resources. If a web page is accessed frequently, it can go on flash storage. An old insurance policy can go on tape, but is still there, searchable, and can be brought back for reference in litigation. No reason to spend the electricity of keeping magnetic hard drives spinning for content that won't be accessed for 40 years.
And because these systems can automate moving content from tape to tape every decade or so, you can avoid the information rot that comes with magnetic storage. So the document will be accessible 40 years later.
To find out more about this topic, check out this week's web seminar Know Where Your Information Lives.