Friday Apr 18, 2008

Lovely data, lovely model

The most important step in refining data is writing down what makes it tick. In my last blog entry I outlined the process by which my team refined the data flow for Sun Web sites and mentioned Unified Product Data Model (UPDM). UPDM is a formal means for organizing the information for Sun web sites, with an emphasis on product data. The goal in developing UPDM was to reduce the redundancy and inefficiency in the data sets for the Starlight publishing platform, and then to extend the benefits to e-commerce sites, each of which happen to be hosted on different platforms.

UPDM provides definitions, basic business rules and relationships between facets of product data, including hierarchy, and attributes of each category, product and part. It’s maintained in a simple XML which can be transformed to HTML, spreadsheet, or even UML diagram. The following listing is a snippet from the actual core UPDM 1.0 model used in Starlight.

<?xml version="1.0" encoding="UTF-8"?>
  <label>Data Model Browser - UPDM 1.0</label>
   <p>The UPDM Data Model Browser describes concepts and attributes that are 
      core to the <b>Sun Unified Product Data Model</b>. This 
      version [UPDM 1.0] covers product data elements as they are represented 
      in <b>,, and Sun Catalogue</b>. Product elements 
      that describe transactions, implementation, or presentation are not 
      included in UPDM...</p>

  <concept id="product">
    <explanation>Actual product entity.  Representation of the unit offered 
                 to the market by Sun (i.e. Sun Java System Application Server 
                 Platform Edition 9.0; Sun SPARC Enterprise M5000 Server.)
      Use the id as a stand-in for the product itself
    <association ref="swordfish-id"/>
    <association ref="name">
      <constraint>Strictly syndicated through SwoRDFish</constraint>
    <association ref="description"/>
    <association ref="image"/>
      <concept id="plc-date">
        <label>Product life cycle date</label>
        <explanation>A date on which a change of PLC status occurs</explanation>
          <data type="date"/>
        <comment>Related to the price effectivity date</comment>

  <concept id="industry">
    <explanation>Industry for which suited or targeted</explanation>
    <association ref="swordfish-id"/>
      Use the SwoRDFish ID as a stand-in for the industry itself

We developed UPDM 1.0 at a time when there were not as many good options for expressing such data models. RDF and XMI carried too much baggage, and we wanted something simple and clear, although RDF does play an important role in how things are bound together in the implementation, as I’ll discuss another day. Again we can generate all these other representations as needed. As an example the following picture is a UML class diagram generated from the XML above.

OK, a bit of an eye-chart. But, when you develop a broad data model such as UPDM its a real eye-opener, and you gain more than just the end product. You learn a lot about what business problems and business rules are not really well expressed anywhere, and are only to be found in someone’s head. Sometimes you learn about the key tensions between how different roles and departments interpret and process information.

To take one example, at Sun what we sell for hardware, the actual SKUs, are called “parts” in the marketing department, including e-commerce. In many other departments, and in a lot of the vendor software we use these are called “products”. We don’t really market at this level, though. We market the families of these such as “SunFire T2000”, and these are what we call “products” and what others call “product families”.

UPDM itself doesn’t provide any magic to reconcile such differences. It does provide the best you can hope for – a framework for writing down the knowledge so it’s open, shared, and even accessible through code. Then you have half a chance to build some magic on top of the model.

Sunday Mar 30, 2008

Captain Data Modeler Chronicles: Prologue

It’s been a while since I’ve written here. The key reason is that I’ve made a transition from Captain Data Modeler to directorship of Sun’s Content Management Engineering department. I suppose the nice message is that a lot of the hard work to balance good data architecture and practical business need is what put me on the radar for promotion. Of course the downside is that now I have to peer longingly over a desk piled high with budgets, vendor contracts, and HR priorities in order to catch a glimpse of the bits and bytes in the distance. I do miss those bits and bytes, and how they would always ground me in the comfort of tangible, creative deliverables. There are days when I’m a bit jealous of my engineering team who gets to dive in and immerse themselves in the bits and bytes every day. Ah well, business needs first.

But on the bright side I get to pull the lens back and take a broader look at how we use data on the Web to put our strategies to work. In the past whirlwind year and a half I’ve overseen data flows from legacy data stores, ERP, isolated data silos and files from all sorts of footlockers and broom closets, and I’ve had to conduct that data into new Web site venues and features, low and high-volume e-commerce, unification of product documentation, community sites like BigAdmin, developer resource sites, and much, much more. The first thing that occurs to me, sitting at this lookout point, is that Sun has so much information that we somehow manange to squeeze outside our firewall through various tiny slits. We’re certainly ahead of the marketplace in opening up data to serve customers and partners, but we can do more, and I’m working to see that we do.

We all know that it’s now a much more collaborative marketplace, thanks to the Web. At Sun our marketplace contains some of the best brains in technology, and if we could open up more information in forms that they could easily digest, the possibilities are endless. The most obvious thing we need to provide is more Web feeds, in Atom and RSS, and it would be nice if we offered more data in JSON form, which is now one of the preferred inputs for mash-ups. In general we’d like to provide more content and data in source data formats such as well-defined XML and JSON. Right now too much of what we provide is in presentation formats such as HTML and PDF. And, in some cases, it is still all rolled up with the business rules that govern its current use.

But to get where you’re going it helps to remember where you’ve come from. I think data architecture for Sun’s Web content, while not perfect, is in pretty good shape to expand its usage as ambitiously as I want. There are some interesting lessons in how we tamed the data to that extent. I gave a presentation at XML 2007 (my co-presenter Uche Ogbuji was not able to make it for unfortunate personal reasons), covering some of the work we’d done to in data architecture, and focusing on some of the lessons learned for managing collections of XML. The presentation was very well received, and that gives me the impression that we’re ahead of the curve in what we’ve accomplished behind the scenes, and that this doesn’t manifest enough in what you see on Sun’s Web sites. My experience at XML 2007 encouraged me to discuss such things more often here, not just some of the neat things we’re doing inside the firewall, but more on how we plan to put to the service of Sun’s customer’s, Sun’s community and partners, and ultimately Sun’s strategy.


Passionate about data engineering strategy and solutions for Sun’s external web sites. Happiest when building taxonomies, data models, and high performing teams.

Kristen Harris
Web Data Engineering


« October 2016