Friday Apr 18, 2008

Lovely data, lovely model

The most important step in refining data is writing down what makes it tick. In my last blog entry I outlined the process by which my team refined the data flow for Sun Web sites and mentioned Unified Product Data Model (UPDM). UPDM is a formal means for organizing the information for Sun web sites, with an emphasis on product data. The goal in developing UPDM was to reduce the redundancy and inefficiency in the data sets for the Starlight publishing platform, and then to extend the benefits to e-commerce sites, each of which happen to be hosted on different platforms.

UPDM provides definitions, basic business rules and relationships between facets of product data, including hierarchy, and attributes of each category, product and part. It’s maintained in a simple XML which can be transformed to HTML, spreadsheet, or even UML diagram. The following listing is a snippet from the actual core UPDM 1.0 model used in Starlight.

<?xml version="1.0" encoding="UTF-8"?>
<data-model>
  <label>Data Model Browser - UPDM 1.0</label>
  <explanation>
   <p>The UPDM Data Model Browser describes concepts and attributes that are 
      core to the <b>Sun Unified Product Data Model</b>. This 
      version [UPDM 1.0] covers product data elements as they are represented 
      in <b>Sun.com, shop.sun.com, and Sun Catalogue</b>. Product elements 
      that describe transactions, implementation, or presentation are not 
      included in UPDM...</p>
  </explanation>

  <concept id="product">
    <label>Product</label>
    <explanation>Actual product entity.  Representation of the unit offered 
                 to the market by Sun (i.e. Sun Java System Application Server 
                 Platform Edition 9.0; Sun SPARC Enterprise M5000 Server.)
    </explanation>
    <implementation-guideline>
      Use the id as a stand-in for the product itself
    </implementation-guideline>
    <association ref="swordfish-id"/>
    <association ref="name">
      <constraint>Strictly syndicated through SwoRDFish</constraint>
    </association>
    <association ref="description"/>
    <association ref="image"/>
    <association>
      <concept id="plc-date">
        <label>Product life cycle date</label>
        <explanation>A date on which a change of PLC status occurs</explanation>
        <implementation-guideline>
          <data type="date"/>
        </implementation-guideline>
[...]
        <example>2006-10-10</example>
        <comment>Related to the price effectivity date</comment>
      </concept>
  </concept>

  <concept id="industry">
    <label>Industry</label>
    <explanation>Industry for which suited or targeted</explanation>
    <association ref="swordfish-id"/>
    <implementation-guideline>
      Use the SwoRDFish ID as a stand-in for the industry itself
    </implementation-guideline>
  </concept>
[...]
</data-model>

We developed UPDM 1.0 at a time when there were not as many good options for expressing such data models. RDF and XMI carried too much baggage, and we wanted something simple and clear, although RDF does play an important role in how things are bound together in the implementation, as I’ll discuss another day. Again we can generate all these other representations as needed. As an example the following picture is a UML class diagram generated from the XML above.

OK, a bit of an eye-chart. But, when you develop a broad data model such as UPDM its a real eye-opener, and you gain more than just the end product. You learn a lot about what business problems and business rules are not really well expressed anywhere, and are only to be found in someone’s head. Sometimes you learn about the key tensions between how different roles and departments interpret and process information.

To take one example, at Sun what we sell for hardware, the actual SKUs, are called “parts” in the marketing department, including e-commerce. In many other departments, and in a lot of the vendor software we use these are called “products”. We don’t really market at this level, though. We market the families of these such as “SunFire T2000”, and these are what we call “products” and what others call “product families”.

UPDM itself doesn’t provide any magic to reconcile such differences. It does provide the best you can hope for – a framework for writing down the knowledge so it’s open, shared, and even accessible through code. Then you have half a chance to build some magic on top of the model.

About

Passionate about data engineering strategy and solutions for Sun’s external web sites. Happiest when building taxonomies, data models, and high performing teams.

Kristen Harris
Director,
Web Data Engineering

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today