Even More On Cache & VDS
By Mark Wilcox - OVD Product Manager on Feb 15, 2009
Ash asked me to comment on a couple of responses to his blog. This post is in response to the post made by the Identity Data Delivery blog.
First off, while it’s true there are times when cache makes no sense, there are other times when it really does. Cache is used everywhere in your PC, software, servers, EVERYWHERE —so arguing against cache seems completely strange to me.
Nobody is arguing against cache - as I have mentioned - OVD leverages the benefit of EXISTING caches on the data-store and applications. Thus any additional cache benefits are not going to have a large pay-off in the most common non-telco IT case. And this is just talking about a traditional, in-memory cache as opposed to actually copying, synchronizing, securing, getting enough hard-drive space and backups that a persistent cache requires.
Second, I always find it funny to hear the arguments against having more options. Why argue against choice and options? I can cite many projects that have been deployed with caching using virtual directories, and yes, this includes "persistent" cache.
Nobody is arguing on options. The specific question is "do you need a persistent cache in the most common virtual directory scenarios?". And the answer is a definite no. And as I pointed out here - even if you think you might need a type of persistent cache - it's better to make that an enterprise architectural choice so that you pick the right solution.
Third, cache is necessary because when merge multiple tables (join) across different databases (or directories for that matter), the results are just not fast enough for any type of security application. Anyone familiar with databases should understand this quickly. Once you join several objects or tables, the response rate of the source is dramatically reduced. The joins necessary to create views are sometimes too complex to do on the fly for most directory-enabled applications, such as would be common for IdM/security. This, in my mind, is a key functionality of virtual directories after aggregating sources for a common protocol.
Yes - aggregating heterogeneous sources without copying them into a central store is the primary reason to use a virtual directory. And we have many production instances where data is spread across multiple sources and have never needed a persistent cache to meet performance requirements. This is because modern data-sources are very fast, applications are better at keeping session data and OVD can optimize the queries so that if a data-store is not going to contain the answer, it's not searched.
Fourth, 2 to 5 milliseconds can be a big deal, and cache is essential to eliminating that lag. Think about it, if I have to search for a member in a directory, and then search a database table for additional attributes to join to this object — do you really think it will perform at close to the same speed? And that is with just two sources...imagine the performance hit you’d take by adding additional sources and multiple join operators.
I do not know of any production scenario where the overhead of 2-5 milliseconds is an issue. OVD is already being used in production telco scenarios and we're actively involved with several Unified User Profile type applications for the cable/ISP/telco market. And in every case 2-5 milliseconds is not only OK - it's a magnitude lower than the specified requirement (usually around 20-25 ms). Also in scenarios where joins are involved - the data can usually be pre-fetched (for example notification of incoming customer call can prompt a notification to OVD to fetch the customer record prior to any application using the data. Thus it's cached in memory but not stored. Remember the argument here is on "persistent cache" not cache in general).
When your directory is expected to perform at 8000 queries a second, adding 2 to 5 milliseconds can be a VERY big deal. OK, let's keep the math simple and take a closer look what the problem is…
- I have a directory that performs at 5000 q/sec
- That translates into .2 milliseconds per query (5000/1)
- Your "overhead" is 2 milliseconds (the best performance cited)
- My queries now take 2.2 milliseconds (11 times slower)
- Now instead of 5000 q/sec when I access my directory I get only 455 q/sec
The blogger made a fanciful leap - that 2 ms time I mentioned is consistent even around several thousand queries per second. We have a customer in EMEA who authenticates all of their 3G phones via OVD without any cache in OVD. So this is a false response.
Fifth, the idea that you compromise the "freshness" of data for the sake of speed misses the point about what sort of information we’re dealing with here. We’re dealing mostly with identity using directories (people and other objects), and the identities themselves do not change very often in comparison to other data, such as transactions where updates and write operations are more common than search/query. For example, in your bank account, your “identity” information (name, address, phone, pin number, passwords, etc) changes far less your balance and activity.
If the only value a virtual directory provided was real-time access to data - then this would be a valid point. However, the primary value a virtual directory brings is that the data is not data-freshness. Rather it's that you get an aggregated view of identity data without needing to consolidate it into a single data-store. If your virtual directory requires a persistent cache then it is not living up to the promise of a virtual directory and instead is selling you a meta-directory disguised as a virtual directory.
And if you want a virtual directory or similar type of centralized identity store - that is fine. It's a valid architecture that is not even necessarily an either/or problem (e.g. either use meta or use virtual).
And at Oracle we have products that can help you solve that type of problem - it's just that they are not virtual directories.