Implementing Tags in a Database??
By DaveLevy on Jan 03, 2006
I was considering why del.icio.us doesn't offer a cloud for tag based queries, while it does offer the cloud for User based queries, and I received the reply that it was (probably) related to query cost, which set me thinking. What would the del.icio.us database look like?
Is this right, look at the para below, the requirement that a tag is used by someone to describe the bookmark is lost in above version of the model.
Surely we have only three entities, User, Tag & Bookmark (also known as URL). Both User & Tag have a many to many relationship with each Bookmark. We can resolve these many2manies in two ways, by using an allocation table, or by adopting a meta model model.
The data model must take into account the fact that a User describes a
bookmark using one or more tags. i.e.
http://www.guardian.co.uk as (UK web
World News politics). Other people have used different tags, although only a
few associated it with journalism. Each of these user, bookmark, tag
relationships must be stored seperately; otherwise we loose the user's
relationship and ownership of the tag set. This implies/mandates that somwhere
a user/bookmark/tag database object (either table or index is required). This
means that the bookmark entity must be related to the user by an allocation.
This is required to implement the many-to-many, but also to allow the tags to
be owned by a user. The tag must also have an allocation between the Bookmark
& itself but is owned by the defining user i.e. again we must have a
User/Bookmark/Tag intersection entity. The model immediately below meets
this requirement, although implementing the Tag attribute as a membership of
the tag - user/bookmark relationship and implementing it as a foreign key is
also a possibility.
I have not documented the name of the relationship between tag, bookmark and
their allocations. This becomes quite hard because we have transformed the
entity into "operational masters" and they are likely to be
implemented as indexes. The difficulty in naming the relationship between these
entites and the allocation entities implies that we have modelled the problem
well. Neither does the diagram above illustrate mandatory/optional attributes of the relationships. The
Knows of relationship is optional. During the period between registration and the first bookmark, a User will have zero user/bookmark allocations. I suppose it is possible to enter a Bookmark without tagging it, which makes the
Described by relationship optional as well. If either of the allocations exist though, their masters must exist.
It should be noted that any domain of definition can be applied to the bookmark entity (or at least its key). By stating that the bookmark must be (say) a roller article URL, we have a viable tag model for roller articles. In fact, I have considered opening a new del.icio.us account exclusively to act as a blog index and to provide a tag/cloud map for this blog.
I am next going to look at some queries and the relational algebra that can be applied to this model; the reason I developed the model was to examine the performance implications of different entry points.