Monday Apr 02, 2007

Optimizing calendar timerange based queries

Whether talking about clients or servers, most calendar queries are based on a timerange (e.g. give me all events in calendar X for April 2007, or what is the freebusy time of user Y for next week ?). Depending on their requirements, some calendar software may need to optimize those queries. Nevertheless, this is not as simple as creating a couple of database indexes.

Timerange based queries depends on multiple inter related ical properties:

This is well summarized in the 4 tables defining the CalDAV time-range query in RFC 4791. Those tables describe the matching rules of a time-range query (defined by a timerange start and end time in UTC) for VEVENT, VTODO, VJOURNAL and VFREEBUSY: what component properties to take into account, how to interpret them, etc... These tables are part of the CalDAV base spec but they could almost appear in the iCalendar spec as they offer guidelines applicable to most calendar applications, regardless of the access protocol.

In any case, the matching rules are not trivial and this makes indexing a complicated task. There are some other properties of calendar components that create even more complexity.

Indexing infinity:

The most obvious obstacle to indexing is the existence of recurring components, especially recurring components that repeat forever (no COUNT or UNTIL parameter in their RRULE definition). Obviously, indexes can contain only a limited set of instances.

Some servers have a hard limit on the number of instances they are willing to store. Access protocols (whether WCAP or CalDAV) offer ways to retrieve this hard limit, so that client software can adjust their UI (or at least warn the end user about the limit).

Some servers recalculate a portion of their index as needed, counting on the fact that, in most cases, the timerange people are interested in varies over time (usually, one month back and a few month ahead).

Indexing moving targets: 

Also problematic is the existence of components expressed in floating time. Per RFC 2445 definition:

Floating DATETIME values are used to represent the same hour, minute, and second
value regardless of which time zone is currently being observed.
For example, an event can be defined that indicates that an individual will be
busy from 11:00 AM to 1:00 PM every day, no matter which time zone the person is

Most events/todos are either expressed in UTC time or reference a particular timezone (and hence each individual instance can be translated into UTC time). Since they share the same absolute time, it is easy to sort and index them, once and for all... or at least until timezone definitions are altered due to DST regulation changes.

When using floating time on the other hand, the corresponding absolute time is not known until a timerange query is made because we need a timezone to translate a floating time into absolute time and this timezone may vary over time:

  • the timezone to use may be part of the query (e.g. in WCAP or CalDAV queries),
  • or it may be a configurable parameter associated with the calendar container,
  • or it may come from the end user's desktop local timezone.
  • ...

To take an example, let's imagine that I have put on my laptop favorite calendar software a recurring event starting at noon to remind me to grab something to eat (no, I'm not suffering from any eating disorder). Let's make it last for 2 hours (I'm French, this is strict minimum). This event is expressed using floating time. I have also a 1 hour "team meeting" at 21:00 CET ( = 19:00 UTC for April 2nd 2007).

When in France, my list of events, ordered by UTC start time would look like:

20070402T100000Z 12:00 CET "Don't forget to eat" (for 2 hours)

20070402T190000Z 21:00 CET "Team meeting" (for 1 hour)

Now if I go visit my colleagues in California, after I change my laptop local timezone to PST, the same list of events will look like:

20070402T190000Z 12:00 PST "Team meeting" (for 1 hour)

20070402T190000Z 12:00 PST "Don't forget to eat" (for 2 hours)

The order has changed + I now have a conflict (you should know by now which one I'm going to decline).

In short, I can not create a 100% valid index containing calendar components with floating times.

The same can be said for calendar components with start/end time expressed as a DATE (20070402) only. They should be treated like floating start/end time.

Some more fun: 

A last example of hard to index components (there are probably others) is VTODO with no start or end property. Those should always be returned, regardless of the timerange requested in the query.




« April 2014