Ten Requirements for Achieving Collaboration #6:Data Accessibility for People and Computers
By billy.cripe on Nov 02, 2009
1. High relevance leads to lean systems.
2. People want relevant information, not potentially relevant hits.
3. Context drives relevancy, delivery drives efficiency.
We are in the midst of a series investigating collaboration. We previously wrote about the two types of collaboration - intentional and accidental.
INTENTIONAL: where we get together to achieve a goal and
ACCIDENTAL: where you interact with something of mine and I am never aware of your interaction
While intentional collaboration is good it is not where the bulk of untapped collaborative potential lies. Accidental collaboration is. But the challenge is to intentionally facilitate accidental collaboration. For the full list of 10 requirements see the original post. Last time I wrote about requirement #5: why data must be referencable and portable. This time we will continue on that theme but discuss why the data we made portable and referencable last time must still be accessible to both people as well as computers.
First remember that the data we're talking about is not nicely contained in a row or cell in a traditional relational database. The data we're interested in and that we have been talking about is the data that exists inside documents, web pages, images and other information artifacts. So in one way at least, the information is already human accessible. It is in a document or other information artifact after all. And those are typically created by people for people. Parsed and extracted data that is referencable is still accessible because we do not fundamentally alter the original container (i.e. the document). Any good enterprise information architecture must include a fully-fledged ECM (enterprise content management) system for this reason. There needs to be a place to store the original source documents, images, videos and web pages.
Also, computers and systems should have no problem accessing the data that we derived from the artifacts in the previous posts. This is because after the data is parsed, extracted and marked up in the ways we've previously described, it gets stored in a computer referencable system like a database or an RDF store or a linked combination of similar stores and indexes. Computers and systems can access that data (of course assuming network connections are established and maintained). Indeed, many SOA and Service Bus integration layers have been doing similar things for some time. They are able to access transaction, web service and request data and attach it to the brokered request while bringing along original documents and other unstructured information files as payload.But did you notice what I just wrote there? The relevant data as well as the containing or supporting unstructured data files are attached to the request and passed around from system to transaction to data store to website. It is the equivalent of carrying around a file cabinet full of stock photos when all I really want is to sort catalog entries on blue shoes. "Blue" is important data that is only accessible by a human looking at a picture. Or, best case, by a computer system that can parse attached metadata assuming that "blue" was entered by a person somewhere further up the line (and not "teal", "aqua", or "navy"). But if a similar SOA request had access to the full complement of parsed and extracted data then it could carry with it only that data that was actually needed rather than the over-full payload it is today. This is point 1: The efficiency advantages in terms of transaction processing, bandwidth usage and parsing overheard accrue when the *relevant* information is available to computer systems. High relevance means low quantity of extraneous information. Low levels of extraneous information is what we want. High relevance leads to lean systems.
But back to the humans. Documents and information artifacts created by us and for us are great. But remember the previous posts in this series. We rarely want to re-use information artifacts in full. Presentations created by my colleagues are great *starting points* for me to do my work but they are not usually the sum-total of what I have to do. Web Sites are wonderful collections of information that usually/hopefully contain what the majority of visitors are looking for. But there are almost no web site visitors who want to see every single page in your website! While the collection of pages and documents and artifacts that make up a large website are very convenient for *browsing* they are really quite terrible for targeted information retrieval. Therefore we add search capability to our sites. We create *micro-sites* - websites that are small and laser focused on a single topic, product or campaign. We add predictive modeling and persuasive content delivery to our sites in attempts to deliver high relevance with low levels of extraneous information.
What this shows is that what we really want is the ability to combine some of what others have done with some of what I have done in the past with some new information, context or data that I have that is uniquely relevant to my task or my desire at hand. I short, I want composite content. I want a mashup. The most important aspect of that mashup is *my intent*. It is my intent that is the one key which makes or breaks the relevance calculation.
This is point 2: I want relevant, data informed information not search "hits" which may be relevant to my intent by varying degrees.
Furthermore, I do not want composite content delivered to me as a whole. Usually, I simply want it available. Remember here we're in a business context. I am not on the public web looking for a composite view of all movie ratings and comments from numerous sites. For that social analytics tools like Glue or OpenPreferences work well. Instead, I have a job or a task to perform and I need the best information possible available to help me do my job. I do not want to repeat the mistakes of others, I do not want to reinvent what they have already done. Systems that force people into such a repetitious model both stifle business agility and are a terrible drag on business momentum. Additionally, if only part of what you have done before is useful to me in my present task, I do not want to be forced to trodge through all of the extraneous information in order to get to the truly useful parts. Remember though that this is not about managing people more effectively or helping them create leaner information artifacts. When we say "extraneous information" this begs the question of "extraneous to what?" Because concepts like "extraneous" are inherently *relative*. Something that I find extraneous to my task at hand would have been of utmost relevance and importance to you when you created that information.
Let me provide a concrete example. Suppose you create a Project Plan. It is a complex and highly valuable information artifact. It contains a project schedule, a resource and staffing plan, contract information, milestones, functional designs of what will be delivered, reference architectures and mockups. If I want to create a presentation that talks about similar functional capabilities, access to the information originating in yourProject Plan is highly relevant to me. But this does not mean that the entire project plan is relevant to me. I do not care about your resource and staffing plan. I may not be provisioned to see contract details. I am not writing a technical specification so your architecture and mockups are useless to me. All that is extraneous information forcing me to spend more time "getting to the good stuff" even though I know it is in your Project Plan somewhere. Instead what I want is the relevant information about functionality. In previous posts we described how a system of text analytics, RDF storage, Semantic Indexing and Ontology Assisted search can extract that kind of information. But if all this does is return me to the first page of the document container then all we have achieved is a slightly more powerful information retrieval system. While not insignificant, a system that combines retrieval with information delivery is vastly more powerful. This is what we mean when we talk about keeping the extracted data accessible to humans.
This is point 3: Context drives relevancy, delivery drives efficiency.
Next time we will continue the series investigating requirement #7 when the importance of tracking the change and evolution of individual information artifacts is discussed. And continue checking back for #8 on the changing patterns of the *relationships* between data to information artifact, information artifact to context and context to behavior, #9 on understanding and leveraging information and data creation patterns and finally #10 on how all of the above must be made available back to the end users be they people or computers in context sensitive and persuasive ways so that, ultimately, intentional and accidental collaboration are achieved in the organization.