Friday May 29, 2009

Java DB 10.5.1.1 has been released

Java DB 10.5.1.1 has just been posted on the Java DB site. This release contains the exact same jar files as, and is functionally equivalent to, the recently released Apache Derby 10.5.1.1. In addition, you get

  • installers for Solaris, Linux and Windows
  • the option to register and track your Java DB installations in Sun Inventory
  • the option to buy support from Sun
For more information about what's new in Java DB 10.5.1.1, see my previous blog entries about Apache Derby 10.5.1.1, the Apache Derby site and the Java DB site.

Sunday May 03, 2009

Derby 10.5.1.1 is out

Apache Derby 10.5.1.1 was released two days ago. See the full announcement here.

In addition to lots of bug fixes and documentation fixes, there are a number of new features in this release, including

SQL roles
Read more in my previous blog entry
SQL generated columns
Read more in my previous blog entry
LOB improvements
Many performance and usability improvements for BLOBs and CLOBs
Replication improvements
Replication is now also possible with encrypted databases
SQL OFFSET/FETCH (aka limit functionality)
Read more in my previous blog entry
In-memory storage engine
Read more in my previous blog entry
Update Statistics Stored Procedure
Refresh outdated index cardinality statistics in an easier fashion. More info here and here

Also, the index split deadlock frequently seen in previous releases has been fixed.

Monday Apr 27, 2009

Derby 10.5 preview: SQL roles

Derby has supported SQL authorization with GRANT/REVOKE since version 10.2. In the upcoming 10.5 release, this is taken one step further with the addition of the concept of roles defined by the SQL:2003 standard.

Before, with SQL authorization enabled, you'd need to grant each required privilege explicitly to each user. With roles, this can be simplified by creating different roles which are granted sets of privileges, and granting roles instead of privileges to the users.

So instead of granting the same four privileges to three different users, you grant the privileges to a role and grant the role to those users.

ij> create role ordinary_user;
0 rows inserted/updated/deleted
ij> grant select on t1 to ordinary_user;
0 rows inserted/updated/deleted
ij> grant select on t2 to ordinary_user;
0 rows inserted/updated/deleted
ij> grant update on t2 to ordinary_user;
0 rows inserted/updated/deleted
ij> grant execute on procedure p1 to ordinary_user;
0 rows inserted/updated/deleted
ij> grant ordinary_user to username1, username2, username3;
0 rows inserted/updated/deleted

Later, if you want to take back the update privilege on T2 from those users, you simply revoke the privilege from the role, which is a simpler and less error-prone process than revoking the privilege from each single user.

ij> revoke update on t2 from ordinary_user;
0 rows inserted/updated/deleted

A role can also inherit privileges from another role. To create a superuser role which has all the privileges of the ordinary_user role, plus the privilege to insert rows into T1, do this:

ij> create role superuser;
0 rows inserted/updated/deleted
ij> grant ordinary_user to superuser;
0 rows inserted/updated/deleted
ij> grant insert on t1 to superuser;
0 rows inserted/updated/deleted

More detailed information about how to use SQL roles in Derby can be found in this section of Derby's reference manual.

Wednesday Apr 22, 2009

Derby 10.5 preview: FETCH/OFFSET

One of the most frequently requested features for Derby is the LIMIT/OFFSET syntax supported by many other databases. That syntax allows you to skip the first N rows from the result and return the next M rows. In the upcoming 10.5 release, it will finally be available to Derby's users too, except that it's called FETCH/OFFSET and uses the somewhat more verbose syntax dictated by the SQL:2008 standard.

The new clauses can be added to any SELECT statement, and they're defined like this:

[ OFFSET integer-literal {ROW | ROWS} ]
[ FETCH { FIRST | NEXT } [integer-literal] {ROW | ROWS} ONLY ]

(No, I'm not kidding, the standard actually requires you to type all those seemingly redundant keywords...)

You can use FETCH and OFFSET both alone and in combination with each other. For example, to select the three persons with the highest score from a table, you'd use a FETCH clause and no OFFSET clause, like this:

ij> SELECT NAME, SCORE FROM RESULTS ORDER BY SCORE DESC
> FETCH FIRST 3 ROWS ONLY;
NAME      |SCORE      
----------------------
John      |33         
Anne      |28         
Sue       |21         

3 rows selected

If you want to select all persons except the two with the highest score, you could use an OFFSET clause and skip the FETCH clause:

ij> SELECT NAME, SCORE FROM RESULTS ORDER BY SCORE DESC OFFSET 2 ROWS;
NAME      |SCORE      
----------------------
Sue       |21         
Peter     |19         
Bob       |7          

3 rows selected

And if you're only interested in the fourth person, you could use a SELECT statement that combines OFFSET and FETCH like this:

ij> SELECT NAME, SCORE FROM RESULTS ORDER BY SCORE DESC
> OFFSET 3 ROWS FETCH NEXT ROW ONLY;
NAME      |SCORE      
----------------------
Peter     |19         

1 row selected

The keywords FIRST and NEXT are synonyms, and so are ROW and ROWS, so all of these these four clauses are allowed and have the same meaning:

FETCH FIRST 10 ROWS ONLY
FETCH NEXT 10 ROWS ONLY
FETCH FIRST 10 ROW ONLY
FETCH NEXT 10 ROW ONLY

SQL:2008 also allows you to have ORDER BY/OFFSET/FETCH clauses in subqueries, but Derby doesn't implement that yet, so you'll be limited to using them on the top level of your query for now.

If you're interested in helping out testing the Derby 10.5 release candidate, more information (including download links) can be found here.

Tuesday Apr 21, 2009

Derby 10.5 preview: In-memory backend

The upcoming Derby 10.5 release will contain a long-awaited feature: an in-memory storage backend. With this backend, your entire database will be stored in main memory instead of on disk.

But isn't the whole point of using a database that the data should be stored safely on disk or some other kind of persistent storage? Normally, yes, but there are cases where you don't really care if you lose the database when the application crashes.

For instance, if you are running unit tests against your application, it's probably more important that the tests run fast and that it's easy to clean up after the tests. With the in-memory storage backend, you'll notice that many database operations (like database creation, inserts and updates) are a lot faster because they don't need to access the disk. Also, there's no need to clean up and delete the database files after the tests have completed, since the database goes away when the application terminates.

So how is the in-memory backend enabled? That's simple, you just add the memory subprotocol to the JDBC connection URL, and no other changes should be needed to your application. If you normally connect to the database with the URL jdbc:derby:MyDB you should instead use jdbc:derby:memory:MyDB (and of course add any of the connection attributes needed, like create=true to create the database). Here's an example in IJ, Derby's command line client:

$ java -jar derbyrun.jar ij
ij version 10.5
ij> connect 'jdbc:derby:memory:MyDB;create=true';
ij> create table my_table(x int);
0 rows inserted/updated/deleted
ij> insert into my_table values 1, 2, 3;
3 rows inserted/updated/deleted
ij> exit;

After exiting, you can verify that no database directory was created:

$ ls MyDB
MyDB: No such file or directory

More or less everything you can do with an ordinary database should be possible to do with an in-memory database, including taking a backup and restoring it. This can be useful, as it allows you to dump the database to disk before you shut down your application, and reload it into memory the next time you start the application. Looking again at the example above, you could issue this command before typing exit in IJ:

ij> call syscs_util.syscs_backup_database('/var/backups');
0 rows inserted/updated/deleted

Later, when you restart the application, you can load the database back into the in-memory store by using the createFrom connection attribute:

$ java -jar derbyrun.jar ij
ij version 10.5
ij> connect 'jdbc:derby:memory:MyDB;createFrom=/var/backups/MyDB';
ij> select \* from my_table;
X          
-----------
1          
2          
3          

3 rows selected

Thursday Apr 16, 2009

Derby 10.5 preview: Generated Columns

In the Apache Derby community we're currently working on a new feature release (version 10.5) which we hope to have ready very soon now. While we're waiting for it to be properly tested and approved, I'll give a sneak preview of some of the features that we've added since 10.4. In this entry, I'll take a look at generated columns.

A generated column is a column whose value is defined as an expression. The expression may refer to other columns in the same table, and the value of a generated column will be automatically updated if any of the columns on which it depends are modified.

Let's look at a simple example. The statement below will create a table with three integer columns, where the last one is automatically generated as the sum of the first two columns.

ij> CREATE TABLE T(COL1 INT, COL2 INT, COL3 GENERATED ALWAYS AS (COL1+COL2));
0 rows inserted/updated/deleted

Now insert a couple of rows, and see that the value of the third column is calculated automatically:

ij> INSERT INTO T (COL1, COL2) VALUES (1,2), (3,4), (5,6);
3 rows inserted/updated/deleted
ij> SELECT \* FROM T;
COL1       |COL2       |COL3       
-----------------------------------
1          |2          |3          
3          |4          |7          
5          |6          |11         

3 rows selected

Also, any updates of COL1 or COL2 will immediately trigger an update of COL3:

ij> UPDATE T SET COL2 = 100 WHERE COL1 = 1;
1 row inserted/updated/deleted
ij> UPDATE T SET COL1 = NULL WHERE COL1 = 5;
1 row inserted/updated/deleted
ij> SELECT \* FROM T;
COL1       |COL2       |COL3       
-----------------------------------
1          |100        |101        
3          |4          |7          
NULL       |6          |NULL       

3 rows selected

A somewhat more useful use case for generated columns is case-insensitive search. Traditionally, this has been solved by converting the column in question to upper case on the fly in the select statement. For example like this to find all the values starting with a 'c' regardless of case: SELECT ... WHERE UPPER(WORD) LIKE 'C%'

The traditional approach has some disadvantages:

  • The upper case values must be regenerated each time the query is performed
  • Derby doesn't currently let you create an index on an expression (like UPPER(WORD)), so the query must always go through the entire table to find the rows you are looking for

With generated columns, the upper case values are only generated once when the value is inserted or modified. You're also allowed to create an index on a generated column to speed up queries that access it. (Yes, you could achieve the same by using insert/update triggers to keep an ordinary, indexed column updated, but it's a lot easier with generated columns, and insert/update performance should also be better.) To make it possible to perform an efficient case-insensitive search on a column, add a generated column which contains an upper case copy of it and create an index on that column, as shown in the code below:

ij> CREATE TABLE WORDS(WORD VARCHAR(20), UWORD GENERATED ALWAYS AS (UPPER(WORD)));
0 rows inserted/updated/deleted
ij> CREATE INDEX IDX_UWORD ON WORDS(UWORD);
0 rows inserted/updated/deleted
ij> INSERT INTO WORDS(WORD) VALUES 'chocolate', 'Coca-Cola', 'hamburger', 'carrot';
4 rows inserted/updated/deleted
ij> SELECT WORD FROM WORDS WHERE UWORD LIKE 'C%';
WORD                
--------------------
chocolate           
Coca-Cola           
carrot              

3 rows selected

If you want to try it out yourself, you can download the latest Derby 10.5 release candidate (RC2 at the time of writing this).

Wednesday Dec 10, 2008

Java DB as a service on OpenSolaris 2008.11

The recently released OpenSolaris 2008.11 comes with Java DB 10.4.2.1. One of the new features compared to the Java DB version in OpenSolaris 2008.05 is support for SMF (Service Management Facility). Now it's very easy to start a Java DB Network Server and make it run as a service. SMF takes care of starting and stopping the server when the computer is booted or shut down. It will even attempt to restart the server process if it crashes. To enable the Java DB service, make sure the Java DB package is installed and then enable the service with svcadm(1M):

kah@opensolaris:~ % pfexec pkg install SUNWjavadb
...
kah@opensolaris:~ % svcadm enable javadb

Now check the status of the service. It should be online:

kah@opensolaris:~ % svcs -xv javadb
svc:/application/database/javadb:default (Java DB)
 State: online since Wed Dec 10 14:35:53 2008
   See: http://developers.sun.com/javadb/
   See: /var/svc/log/application-database-javadb:default.log
Impact: None.

Configuration options may be changed with svccfg(1M). For example, if you want the Java process in which the server is running to be started with -server -Xmx1024M, do this:

kah@opensolaris:~ % svccfg
svc:> select javadb
svc:/application/database/javadb> setprop javadb/java_args=("-server" "-Xmx1024M")
svc:/application/database/javadb> quit
kah@opensolaris:~ % svcadm refresh javadb
kah@opensolaris:~ % svcadm restart javadb

And finally, use ij to connect to the server:

kah@opensolaris:~ % java -jar /opt/SUNWjavadb/lib/derbyrun.jar ij
ij version 10.4
ij> connect 'jdbc:derby://localhost:1527/MyDB;create=true';
ij> select count(\*) from sys.systables;
1          
-----------
20         

1 row selected
ij> 

Friday Dec 05, 2008

User-defined ordering of strings in Apache Derby

Apache Derby (and Java DB) has supported language-based ordering of strings (aka territory-based collation) since version 10.3.1.4. This means for instance that you can get an ORDER BY clause in an SQL query to sort the results alphabetically according to the rules of the language of your choice, instead of sorting the strings by the value of their Unicode codepoints, which is the default.

So when you're using the default codepoint-based ordering, your results would be ordered like this (note that 'a' comes after 'B'):

ij> connect 'jdbc:derby:TestDB;create=true';
ij> select \* from (values 'a','A','B') as s(x) order by x;
X
-
A
B
a

3 rows selected

Using language-based ordering for the en_US locale would give you a more natural ordering of the results (both 'a' and 'A' come before 'B'):

ij> connect 'jdbc:derby:EnglishDB;create=true;territory=en_US;collation=TERRITORY_BASED';
ij> select \* from (values 'A','B','a') as s(x) order by x;
X
-
a
A
B

3 rows selected

This feature is very useful as long as you're happy with the ordering provided by the locales supported by your Java Runtime Environment. But sometimes you may want to sort your strings based on some other rules than those in the languages supported by the JRE. The most common example is that users want case-insensitive string matching. Another example came up recently in this thread on derby-user, where a user wanted to modify the German collation rules to make Greek characters sort near their Latin equivalents ('α' near 'a', 'β' near 'b', and so on).

Derby currently (as of 10.4) doesn't allow you to define your own collation rules, it merely relies on the collators returned by java.text.Collator.getInstance(java.util.Locale). Fortunately, the JRE allows you to define new locales with their own collation rules. And it turns out that it's quite easy, just perform these three simple steps:

  • Create a class that extends java.text.spi.CollatorProvider and returns a collator that orders strings the way you want it to
  • Create a text file named META-INF/services/java.text.spi.CollatorProvider which contains one line with the name of your collator provider class
  • Put the compiled class file and the text file in a jar file which you drop into your JRE's lib/ext directory or in one of the directories specified by the java.ext.dirs property

Let's look at an example. Collators can have different strengths, which tells which differences to regard as significant when comparing strings. Case difference is a tertiary difference, so to get a case-insensitive collator we could set the strength to secondary and the case differences will be ignored. If we want to define a variant of the en_US locale, say en_US_caseinsensitive, whose collators follow the ordering rules of the English language and ignore case differences, we could write this small CollatorProvider:

public class MyCollatorProvider extends CollatorProvider {
    public Locale[] getAvailableLocales() {
        return new Locale[] {
            new Locale("en", "US", "caseinsensitive")
        };
    }
    public Collator getInstance(Locale locale) {
        Collator c = Collator.getInstance(Locale.US);
        // Ignore tertiary differences (case differences)
        c.setStrength(Collator.SECONDARY);
        return c;
    }
}

Now compile the class and package it in a jar, together with a text file in META-INF/services as described above. In the directory where the jar is stored, start ij (Derby's SQL client) with the following command:

$ java -Djava.ext.dirs=. -jar /path/to/derbyrun.jar ij
ij version 10.4
ij> 

Then create a database which uses the new locale that you just defined, and insert some test data and see that the string matching really is case-insensitive:

ij> connect 'jdbc:derby:NoCaseDB;territory=en_US_caseinsensitive;collation=TERRITORY_BASED;create=true';
ij> create table t (x varchar(12));
0 rows inserted/updated/deleted
ij> insert into t values 'hi!','Hi!','hello','Hello','HELLO','HeLlO';
6 rows inserted/updated/deleted
ij> select \* from t where x = 'hi!';
X           
------------
hi!         
Hi!         

2 rows selected
ij> select distinct \* from t order by x;
X           
------------
hello       
hi!         

2 rows selected
ij> select x, count(\*) from t group by x;
X           |2          
------------------------
hello       |4          
hi!         |2          

2 rows selected

If you want to define your own custom ordering, like having Greek characters in between Latin characters, you could define another locale with a CollatorProvider that returns a RuleBasedCollator with whichever rules you want. See its class javadoc for details about how you specify rules. In its simplest form, a set of rules might look like "a,A < b,B < c,C", which means more or less that a and A should be sorted before b and B, which should be sorted before c and C. So to get the Greek characters sorted near similar Latin characters, define a CollatorProvider that looks like this one:

public class MyCollatorProvider extends CollatorProvider {
    public Locale[] getAvailableLocales() {
        return new Locale[] {
            new Locale("en", "US", "greek")
        };
    }
    public Collator getInstance(Locale locale) {
        StringBuilder rules = new StringBuilder();
        // alpha should go between a and b
        rules.append("< a,A < \\u03b1 < b,B");
        // beta should go between b and c
        rules.append("& b,B < \\u03b2 < c,C");
        // add more rules here ....
        try {
            return new RuleBasedCollator(rules.toString());
        } catch (ParseException pe) {
            throw new Error(pe);
        }
    }
}

Again, put the compiled class and META-INF/services/java.text.spi.CollatorProvider in a jar file, and start ij with -Djava.ext.dirs=. in the directory where the jar file is located. Create a database which uses the new locale and insert some data with both Greek and Latin characters:

ij> connect 'jdbc:derby:GreekDB;territory=en_US_greek;collation=TERRITORY_BASED;create=true';
ij> create table t (x varchar(12));
0 rows inserted/updated/deleted
ij> insert into t values 'a', 'b', 'c', 'α', 'β';
5 rows inserted/updated/deleted
ij> select \* from t order by x;
X           
------------
a           
α           
b           
β           
c           

5 rows selected

Look, the ordering was just like we wanted it to be, with the Greek characters in between the Latin ones, and not at the end where they would normally be located!

One final word of caution: If you ever update your custom CollatorProvider so that the ordering is changed after you have created a database, you will need to recreate the database. This is because the indexes in the database are ordered, and you may see strange results if the indexes are ordered with a different collator than the one your database is currently using.

Want to try this out yourself? Here's a Java class that implements the CollatorProvider for the two examples. Put it in a directory together with this Ant script, and execute ant to create a jar file with the required file in META-INF/services (requires Apache Ant 1.7.0 or later).

Wednesday Sep 10, 2008

Memcached UDF for Java DB on OpenSolaris

Trond Norbye recently blogged about how he installed Memcached user-defined functions for MySQL on OpenSolaris. Of course, I told him I could do the same thing in Java DB in less than half an hour, so now he won't stop nagging me until I can show him the code. So, here's how you can get Memcached user-defined functions for Java DB on OpenSolaris:

First, you need to make sure you have Java DB and Memcached installed. The versions in OpenSolaris should work fine. Just install them from pkg.opensolaris.org like this:

kah@tecra:~ % pfexec pkg install SUNWjavadb SUNWmemcached SUNWmemcached-java 
DOWNLOAD                                    PKGS       FILES     XFER (MB)
Completed                                    3/3       61/61     0.28/0.28 

PHASE                                        ACTIONS
Install Phase                                118/118 
PHASE                                          ITEMS
Reading Existing Index                           8/8 
Indexing Packages                                3/3

Next, you need a Java class with a couple of static methods that allow you to manipulate data on Memcached servers. I wrote four very simple methods, init(), set(), get() and delete(), to get access to the basic functionality:

import com.danga.MemCached.Logger;
import com.danga.MemCached.MemCachedClient;
import com.danga.MemCached.SockIOPool;

public class MemcachedUDF {
    public static void init(String servers) {
        SockIOPool pool = SockIOPool.getInstance();
        pool.setServers(servers.split(","));
        pool.initialize();
        // Make MemCachedClient less chatty
        Logger.getLogger(MemCachedClient.class.getName(), Logger.LEVEL_WARN);
    }
    public static void set(String key, String value) {
        MemCachedClient client = new MemCachedClient();
        client.set(key, value);
    }
    public static String get(String key) {
        MemCachedClient client = new MemCachedClient();
        return (String) client.get(key);
    }
    public static void delete(String key) {
        MemCachedClient client = new MemCachedClient();
        client.delete(key);
    }
}

Then compile the class:

kah@tecra:~/src % javac -classpath /usr/share/lib/java/java_memcached-release_2.0.1.jar MemcachedUDF.java

Finally, Java DB must be told how to call the methods. Put the required jar files and the directory where the MemcachedUDF class is located in your CLASSPATH variable. Then start the IJ client and execute CREATE FUNCTION and CREATE PROCEDURE statements to make the methods accessible from SQL.

kah@tecra:~/src % export CLASSPATH=/opt/SUNWjavadb/lib/derbyrun.jar:/usr/share/lib/java/java_memcached-release_2.0.1.jar:$HOME/src
kah@tecra:~/src % java org.apache.derby.tools.ij
ij version 10.3
ij> connect 'jdbc:derby:testdb;create=true';
ij> create procedure memcached_init(servers varchar(32672))
       language java parameter style java
       external name 'MemcachedUDF.init';
0 rows inserted/updated/deleted
ij> create procedure memcached_set(k varchar(32672), v varchar(32672))
       language java parameter style java
       external name 'MemcachedUDF.set';
0 rows inserted/updated/deleted
ij> create function memcached_get(k varchar(32672)) returns varchar(32672)
       language java parameter style java
       external name 'MemcachedUDF.get';
0 rows inserted/updated/deleted
ij> create procedure memcached_delete(k varchar(32672))
       language java parameter style java
       external name 'MemcachedUDF.delete';
0 rows inserted/updated/deleted

There! The UDFs have been defined, and are ready to be used. To test them, we also need a Memcached server to connect to. Simply execute the following command in another terminal window:

kah@tecra:~ % /usr/lib/memcached -p 12345 -l localhost

Let's see how the functions work by looking at an example. The SQL statements below initialize the UDFs and create a table with triggers that automatically update the cache as the contents of the table change.

ij> call memcached_init('localhost:12345');
0 rows inserted/updated/deleted
ij> create table my_table(id varchar(20) primary key, x varchar(20));
0 rows inserted/updated/deleted
ij> create trigger insert_trigger after insert on my_table
       referencing new as n
       for each row
       call memcached_set(n.id, n.x);
0 rows inserted/updated/deleted
ij> create trigger update_trigger after update of x on my_table
       referencing new as n
       for each row
       call memcached_set(n.id, n.x);
0 rows inserted/updated/deleted
ij> create trigger delete_trigger after delete on my_table
       referencing old as o
       for each row
       call memcached_delete(o.id);
0 rows inserted/updated/deleted

Now, if you insert a couple of rows into the table, the values will be readily available in the cache:

ij> insert into my_table values ('one', 'First row'), ('two', 'Second row');
2 rows inserted/updated/deleted
ij> values memcached_get('one'), memcached_get('two');
1
--------------------
First row
Second row

2 rows selected

Also, if you update or delete a row, the changes will be reflected in the cache:

ij> update my_table set x = 'First row - updated' where id = 'one';
1 row inserted/updated/deleted
ij> delete from my_table where id = 'two';
1 row inserted/updated/deleted
ij> values memcached_get('one'), memcached_get('two');
1
--------------------
First row - updated
NULL

2 rows selected

See, it works! :)

Tuesday Sep 09, 2008

Derby 10.4.2.0 has been released

Rick Hillegas has just announced that Apache Derby 10.4.2.0 has been released. From the announcement:

Derby 10.4.2.0 contains localizations for messages added in the previous feature release, 10.4.1.3. In addition, Derby 10.4.2.0 contains many bug and documentation fixes, including improvements to large object access, UNION queries, pooled connections, function arguments, metadata access, and statement caching.

See the full announcement here.

Friday May 16, 2008

JavaOne 2008

I spent last week at JavaOne in San Francisco. Although there were many interesting sessions, I found it even more interesting to be at the Java DB booth and talk to users and fellow developers who popped by. The general impression was that the users were very happy with Java DB/Apache Derby, especially that it was so easy to get it up and working. Not surprisingly, the most frequently requested feature was support for LIMIT and OFFSET in SQL. (Yes, I did tell them about the ROW_NUMBER function, but because of the current limitations of the implementation and the more verbose syntax, it didn't quite match what they wanted.)

Also, many of our new colleagues from MySQL visited us at the Java DB booth. The picture below shows Geir Høydalsvik (left) welcoming Mårten Mickos to the Java DB team. Mårten, we're still waiting for your first patch! :)

Sunday Apr 27, 2008

Derby 10.4.1.3 is released

Apache Derby 10.4.1.3 has just been released. This is the first official release on the 10.4 branch, and it introduces many new features, improvements and bug fixes, including:

  • Asynchronous replication with manual failover
  • Table functions that allow you to query non-relational data using SQL
  • Monitoring with Java Management Extensions (JMX)
  • Performance improvements (better scalability on multi-core machines and caching of session state and compiled statements in the network client driver)
  • Fix for a data corruption bug that could make the database unrecoverable

For download details and the full list of improvements, see the announcement from Dyre Tjeldvoll who was the release manager for this release.

Java DB 10.4.1, which is based on Apache Derby 10.4.1.3, has also been released and is available for download here.

Monday Feb 04, 2008

Derby documentation quick search in Firefox

Since Derby's documentation is spread across a number of different manuals, and I have never actually tried to learn which manual contains what, I usually end up using Google when I need to look something up.

So, say that I wanted to look up the syntax for ALTER TABLE, I would type "derby alter table" in the search bar. The results from the search however only contained links to sections about ALTER TABLE in old Derby manuals archived on the Apache site, and a number of hits in forums and mailing list archives. To find it in the most recent manuals, I would therefore also append "site:db.apache.org/derby/docs/dev" which would limit the search to the alpha version manuals. That's tedious to type, though, and hits in the Portuguese translation of the manual tended be on top of the list (and I don't read Portuguese that well... :) ). Using "Advanced Search" and limiting the search to English pages would be even more tedious.

Finally, I got sufficiently tired of typing and scrolling, and came up with a quicker way of doing it. Firefox has this quick search function, which allows you to type a keyword that tells which search engine to use, followed by the search terms, and then Firefox automatically jumps to a URL that performs the search. To create my own quick search for the Derby documentation, I selected Bookmarks->Organize Bookmarks and added a bookmark in the "Quick Searches" folder with the following properties:

Name: Derby documentation
Location: http://www.google.com/search?q=%s+site:db.apache.org/derby/docs/dev&lr=lang_en
Keyword: derby

Now, I only need to go to Firefox's address bar and type "derby alter table", and Firefox sends me to this page, which shows exactly what I want.

Friday Dec 21, 2007

The importance of good hash functions

I have always been told that it is important to design hash functions carefully in order to ensure that they return values distributed evenly across the entire range of possible values. Otherwise, you risk that many distinct keys map to the same bucket in the hash table, and the hash lookups in the worst case degenerate to linked list traversals. Although this sounds like a very good piece of advice in theory, I have never seen a real-world example where a naive implementation of a hash function has done any harm, until now.

I was making some changes to a stress test that ran simple join queries against an Apache Derby database. The changes were quite innocent, I was just changing the order in which the tables and indices used in the joins were created, so I didn't expect them to influence the test noticeably. But they did! When I got the test back up and running, its performance had dropped to less than 50% of what it was before.

My first thought was that the changed table/index order had somehow tricked Derby's query optimizer into choosing a different query execution plan than before. If the optimizer makes a bad decision (or an uninformed decision, like in this case recently discussed on the derby-user mailing list), performance drops of this size, or even greater, may be experienced. However, when I made Derby print the query plans, I couldn't see any difference at all.

The next step was to profile the test in order to find out why it took so long. I have found collect/analyzer from Sun Studio to be a valuable tool for such tasks. You simply start your application from the command line with "collect -j on" followed by the command you normally start it with:

  collect -j on java MyApp arg1 arg2 ...

When I opened the collected performance data in analyzer, this is what I saw:

Okay, more than 60% of the CPU time is spent in RecordId.equals(), whose only job is to check the equality of four integer fields in two RecordId objects. Since it's a cheap method, it means that it must have been called very frequently in order to spend that much of the total CPU time. Further investigation showed that almost all the calls to this method came from a call to ConcurrentHashMap.get() in Derby's lock manager, which used RecordId objects as hash table keys. However, there was nothing in the profiling data indicating that ConcurrentHashMap.get() was invoked more frequently now than it was before. So then there had to be more calls to RecordId.equals() per call to to ConcurrentHashMap.get().

Now, what could cause that? The only thing I could think of, was collisions in the hash table caused by a bad hash function. I inserted some print statements in the Derby code and found out that the test visited about 350 000 distinct records in the database. In the original version of the test, 350 000 distinct RecordId values mapped to only 12 000 distinct hash values, which is bad to start with. In the new version of the test, the number of distinct hash values had dropped even further down to 6000. When I looked at the calculation of the hash value in RecordId.hashCode(), it was pretty obvious how this could happen. The hash value was constructed by taking the bitwise XOR of these four integers:

recordId
A record number which is unique within the page on which the record exists
pageNumber
The id of the page containing the record
containerId
The id of the container (that is, table or index) containing the page
segmentId
Uninteresting variable in this discussion, since it's always 0

Since all of these variables essentially are counters starting from 0, they tend to be in the lower value range, and all the higher bits are unused. When you take the bitwise XOR of such integers, the higher bits will still be unused, and you'll end up with lots of values clustered in a relatively small range. This basically means that collisions are bound to happen. When I changed the table and index creation order in the test, I must have been unlucky and got less variation in the containerId variable for the tables involved in the join, hence the lower number of distinct hash values and the increased frequency of collisions.

So, now the problem had been identified: the hash function is biased. But how could it be fixed? I mentioned that I've always heard that careful design is needed for hash functions. Unfortunately, I must have missed the lecture where we were taught how to write good hash functions, if there ever was one. Luckily, after a couple of unsuccessful attempts to fix it, I discovered that NetBeans had this nice Insert Code menu.

I only needed to pick Generate hashCode() and tell which fields I wanted to include in the hash code calculation. NetBeans then generated a RecordId.hashCode() method which looked like this:

    public int hashCode() {
        int hash = 7;
        hash = 89 \* hash + pageId.hashCode();
        hash = 89 \* hash + recordId;
        return hash;
    }

It doesn't look much more complex than the old method which only XOR'ed the fields, but thanks to the repeated addition and multiplication with a prime number, the hash values will be distributed across a wider range, even if the individual fields don't differ very much. (Apparently, this is very similar to the approach suggested by Joshua Bloch in his Effective Java Programming Language Guide.) I repeated the procedure for PageKey.hashCode(), which is called from RecordId.hashCode().

Then, when I reran the test with the modified Derby code, the performance was back at its normal level, more than twice as high as without the fixed hash functions. And the 350 000 records that previously mapped to 12 000 or 6000 distinct hash values, now mapped to 280 000 distinct hash values. Not bad to get such an improvement with just a couple of mouse clicks in NetBeans, and not writing a single line of code myself! :)

About

kah

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today