Implementing Tags in a Database??

I was considering why del.icio.us doesn't offer a cloud for tag based queries, while it does offer the cloud for User based queries, and I received the reply that it was (probably) related to query cost, which set me thinking. What would the del.icio.us database look like?

 entity model

Is this right, look at the para below, the requirement that a tag is used by someone to describe the bookmark is lost in above version of the model.

Surely we have only three entities, User, Tag & Bookmark (also known as URL). Both User & Tag have a many to many relationship with each Bookmark. We can resolve these many2manies in two ways, by using an allocation table, or by adopting a meta model model.

The data model must take into account the fact that a User describes a bookmark using one or more tags. i.e. DaveLevy describes http://www.guardian.co.uk as (UK web World News politics). Other people have used different tags, although only a few associated it with journalism. Each of these user, bookmark, tag relationships must be stored seperately; otherwise we loose the user's relationship and ownership of the tag set. This implies/mandates that somwhere a user/bookmark/tag database object (either table or index is required). This means that the bookmark entity must be related to the user by an allocation. This is required to implement the many-to-many, but also to allow the tags to be owned by a user. The tag must also have an allocation between the Bookmark & itself but is owned by the defining user i.e. again we must have a User/Bookmark/Tag intersection entity. The model immediately below meets this requirement, although implementing the Tag attribute as a membership of the tag - user/bookmark relationship and implementing it as a foreign key is also a possibility.

Dave's refined ERD

I have not documented the name of the relationship between tag, bookmark and their allocations. This becomes quite hard because we have transformed the entity into "operational masters" and they are likely to be implemented as indexes. The difficulty in naming the relationship between these entites and the allocation entities implies that we have modelled the problem well. Neither does the diagram above illustrate mandatory/optional attributes of the relationships. The Knows of relationship is optional. During the period between registration and the first bookmark, a User will have zero user/bookmark allocations. I suppose it is possible to enter a Bookmark without tagging it, which makes the Described by relationship optional as well. If either of the allocations exist though, their masters must exist.

It should be noted that any domain of definition can be applied to the bookmark entity (or at least its key). By stating that the bookmark must be (say) a roller article URL, we have a viable tag model for roller articles. In fact, I have considered opening a new del.icio.us account exclusively to act as a blog index and to provide a tag/cloud map for this blog.

I am next going to look at some queries and the relational algebra that can be applied to this model; the reason I developed the model was to examine the performance implications of different entry points.

tags: ""

Comments:

You may be interested in semantic webs but I think Tim Berners-Lee is better qualified to take you forward than me: http://dig.csail.mit.edu/breadcrumbs/blog/4

Posted by Dominic Kay on January 04, 2006 at 09:59 PM PST #

Post a Comment:
Comments are closed for this entry.
About

DaveLevy

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today