Thursday Apr 18, 2013

New Derby feature release: 10.10.1.1

The Derby community voted to approve Apache Derby 10.10.1.1 earlier this week, and now it has been released. Get it from http://db.apache.org/derby/derby_downloads.html and try it out for yourself!

This release contains a number of new features, as well as bug fixes and other improvements. Here are the headlights from the release announcement:

  • JDBC 4.2 - Derby supports the Java 8 enhancements to JDBC.
  • Small device profile - Derby runs on the small CP2 profile of Java 8. See the corresponding JEP at http://openjdk.java.net/jeps/161
  • User-defined aggregates - Applications can create their own aggregate operators. See the "CREATE DERBY AGGREGATE" section in the Derby Reference Manual.
  • Varargs routines - SQL routines can be bound to user-written Java methods which have variable length argument lists.
  • Optional tools - Derby has new, optional tools to support data migration and metadata queries. See the "Optional tools" section in the Derby Tools and Utilities Guide.
  • SYSCS_UTIL.SYSCS_INVALIDATE_STORED_STATEMENTS - Derby has a new system procedure for invalidating stored prepared statements when you think that your metadata queries or triggers are misbehaving--for example if they throw a NoSuchMethodError on execution. See the section on this new system procedure in the Derby Reference Manual.
  • Faster query compilation - A number of changes have made SQL compilation faster.
  • Unencrypting databases - Encryption can be turned off on existing databases. See the section on the decryptDatabase attribute in the Derby Reference Manual.


Wednesday Oct 26, 2011

Derby 10.8.2.2 released

Apache Derby 10.8.2.2 was released earlier today. From the release announcement:

The Apache Derby project is pleased to announce a new bug-fix release, 10.8.2.2. 

Apache Derby is a subproject of the Apache DB project. Derby is a pure
Java relational database engine which conforms to the ISO/ANSI SQL and
JDBC standards. Derby aims to be easy for developers and end-users to
work with.

Derby 10.8.2.2 can be obtained from the Derby download site:
 http://db.apache.org/derby/derby_downloads.html.

Please try out this new release.

Wednesday Jan 05, 2011

OpenGrok 0.10

We're proud to announce that OpenGrok 0.10 has just been released. See the release notes below for details on how to get it and what's new in this release.

OPENGROK 0.10 RELEASE NOTES

OpenGrok is a fast and usable source code search and cross reference engine. It helps you search, cross-reference and navigate your source tree. It can understand various program file formats (C, C++, C#, Java, sh, Perl, Python, ...) and version control histories like (Mercurial, Git, SCCS, RCS, CVS, Subversion, Teamware, ClearCase, Perforce, Monotone and Bazaar). In other words it lets you grok (profoundly understand) the open source, hence the name OpenGrok. It is written in Java.

For more details go to http://www.opensolaris.org/os/project/opengrok/

To download the binaries, please go to: http://hub.opensolaris.org/bin/view/Project+opengrok/files#HBinarysourcereleases and look for 0.10 binary, source or a package file.

This is a major update. It is advised that you cleanup your old DATA_ROOT (ideally remove it, or move it aside) and run indexing from fresh. Also the web application should be replaced, since the new indexes are not compatible with the old web application. Note that this also applies to upgrades from the 0.10 release candidate, since the format of the xrefs changed between the release candidate and the release.

For install help refer to the install wiki page. The discussions page lists forums and mailing lists where you can ask for help or discuss bugs or new features. Bugs should be logged in Bugzilla in the Development/OpenGrok category.

New features

  • python analyzer, perl analyzer
  • listing of variables/methods for quicker navigation
  • sh analyzer xrefs improved
  • SQL analyzer updated to SQL:2008
  • UI usability improvements
  • updated Lucene to 3.0.2
  • introduced Maven as alternative build system to Ant

Bug fixes

  • Bug #5590: Unhighlighted difference in compare list
  • Bug #11264: Add a python specific language analyzer.
  • Bug #13184: provide quick links to annotate,history and download from search results view: diff link for history search
  • Bug #13184: provide quick links to annotate,history and download from search results view
  • Bug #13777: Indexer should print "Adding: ..." before it adds a file
  • Bug #14545: clicking on annotate doesn't preserve line number
  • Bug #14663: broken syntax highlight in ksh script when ` is used (and when " is escaped)
  • Bug #14924: Java method calls incorrectly classified as definitions
  • Bug #15468: Log useful info from indexer regardless of -v
  • Bug #15661: OpenGrok should support new ksh command substitution x=${ cmd ; }
  • Bug #15890: Missing anchor in xref
  • Bug #15938: Analyzers should only use JFlex's public API
  • Bug #15947: Detect GNU CSSC in addition to SCCS
  • Bug #15948: Fail if svnadmin is missing: tests depend on it
  • Bug #15954: teamware repository not detected anymore
  • Bug #15956: indexer fails on indexing
  • Bug #15961: Allow certain symlinks to be followed to help with using clearcase dynamic views
  • Bug #16006: Indexer's -P option removes customization of existing projects
  • Bug #16070: our ctags parser fails to parse a c signature
  • Bug #16076: problem of ELFAnalyzer in opengrok 0.9
  • Bug #16334: Indexer hangs when files contain 0x1A characters
  • Bug #16370: build.xml: Check for JFlex.jar and taskdef for jflex-task use different classpath
  • Bug #16451: Tray app doesn't work with other protocols than jmxmp
  • Bug #16454: Trayapp shuts down when a window is closed
  • Bug #16465: SVN history of filenames containing @ cannot be retrieved
  • Bug #16481: NumberFormatException when opening Connection Settings the second time
  • Bug #16611: svn repository with svn:// protocol as urn results in NPE when reading history
  • Bug #16660: Perforce repositories return months 1-12, not 0-11
  • Bug #16677: revamp option -v
  • Bug #16692: Index files left open after multiple project search
  • Bug #16760: Generating incorrect links for C/C++ "#include" statements if extension of a header file is not on the lexer's list
  • Bug #16761: Perforce history reader would not use per-project clients (defined by P4CONFIG files)
  • Bug #16764: Perforce source annotation: the date is one month ahead, time information is not available
  • Bug #16772: status.jsp should be less chatty in production environments
  • Bug #16783: more.jsp garbles matches on lines crossing internal JFlex buffer boundary
  • Bug #16786: Control for User Information Provider URL (userPage)
  • Bug #16833: Subversion repositories disabled by default
  • Bug #16848: Search context is displayed in lower case
  • Bug #16883: ShXref should reset stack on reuse
  • Bug #16891: Unneeded files included in opengrok.jar
  • Bug #16892: Maven build includes management client source in jar
  • Bug #16893: Maven build includes SQLXref.lex
  • Bug #16909: Move JFlex generated sources to default location (Maven)
  • Bug #16913: OpenGrok fails to use JDBC cache if -R option is specified
  • Bug #16961: Definitions should have serialVersionUID
  • Bug #16986: Relative path for data root and source root has stopped working
  • Bug #17074: Home link no longer works
  • Bug #17127: OpenGrok leaving too many open files
  • Bug #17128: add more syntax highlighting styles and list all definitions/functions
  • Bug #17169: enable wildcard prefix search in OpenGrok script by default
  • Bug #17294: show progress on demand
  • Bug #17349: path needs to be percent encoded
  • Bug #17373: breadcrumb path needs to be URI encoded
  • Bug #17374: history links need percent encoding
  • Bug #17375: annotate link needs percent encoding
  • Bug #17376: download link needs percent encoding
  • Bug #17378: Please remove email obfuscation
  • Bug #17550: Malformed search for query with national chars
  • Bug #17551: Double-click on some project in project list doesn't work in Internet Explorer
  • Bug #17554: Empty "Navigate" window in Perl code
  • Bug #17582: context shown in hits doesn't show symbol type for non lowercased symbols
  • Bug #17631: Cookie name "OpenGrok/project" is a reserved token

For bug details refer to Bugzilla.

The road ahead

A tentative roadmap for OpenGrok can be found here: http://www.opensolaris.org/os/project/opengrok/Roadmap/

Contributors

We'd like to thank all the people who report bugs, send us ideas for improvements or contribute code. Below are just the people who sent patches or pushed code, but that doesn't mean that we don't appreciate your work. Reports and feedback from the users of OpenGrok is essential to make the developers able to continue improving the product. Note that this is just a summary of how many changes each person made, which doesn't necessarily reflect how significant each change was. Inspect the history log with hg log -v -r 0.9:0.10 to get all details.

68 Knut Anders Hatlen <knut.hatlen@oracle.com>, <Knut.Hatlen@Sun.COM>
39 Lubos Kosco <Lubos.Kosco@Sun.COM>, <Lubos.Kosco@Oracle.COM>
10 Jorgen Austvik <jaustvik@acm.org>
9 Trond Norbye <trond.norbye@gmail.com>
5 Patrick Higgins <patrick.allen.higgins@gmail.com>
3 Yuriy Vasylchenko <yuriy.av@gmail.com>
2 Leo Davis <ldavis@fonix.com>
1 cicicici <cicicici@gmail.com>
1 Martin Walsh <martin.walsh@oracle.com>
1 seadog <seadog1234567890@gmail.com>

Friday Dec 17, 2010

What's new in Derby 10.7?

Apache Derby 10.7.1.1 was released earlier this week. You'll find the binaries and the release notes at the 10.7.1.1 download page. This is the first release from the 10.7 branch, and it contains some useful new features that were not part of the 10.6 releases. I will present some of these features here. There's also an interview with Rick Hillegas, the release manager for Derby 10.7.1, posted on the Java Spotlight Podcast, where the new features are discussed.

Boolean data type

Starting with 10.7, Derby supports the Boolean data type (yay, finally!). In earlier releases you would have to work around this limitation by using a CHAR(1) column or an SMALLINT column. Now you can declare the columns as BOOLEAN directly:

ij> create table t(b boolean);
0 rows inserted/updated/deleted
ij> insert into t values true, false, null;
3 rows inserted/updated/deleted
ij> select \* from t;
B    
-----
true 
false
NULL 

3 rows selected
ij> select \* from t where b;
B    
-----
true 

1 row selected

BOOLEAN may also be used for parameters and return values in stored procedures and functions, just like any other data type.

Plan exporter tool

Sometimes you want to know exactly how Derby executes a query. For example, you may have a big join that takes a very long time to complete, and you want to check if Derby uses the relevant indexes to get the best performance.

It has always been possible to get runtime statistics from Derby (see the Working with RunTimeStatistics chapter in Tuning Derby), but the statistics output has been hard to analyze (read: unstructured text format). The 10.6 release improved the situation somewhat by introducing XPLAIN style statistics, which puts the collected data into database tables that can be queried from SQL. Still, it can be difficult to dig up the information you want, especially for complex queries.

Derby 10.7 further improves this by adding a tool that processes the data in the XPLAIN tables for you, and produces an XML document representing the execution plan for a specific query. It can also do some simple rendering and output a graphical version of the plan in HTML format. This new tool is documented in the Derby Tools and Utilities Guide, in the PlanExporter chapter.

Let's take a look at an example. I wanted to inspect the query plan for a big and ugly join performed by OpenGrok to get the commit log for a version-controlled file. Enabling collection of statistics is easy. Just call one system procedure to enable it, and another one to say that you want XPLAIN style statistics and where to store them. I added this code in the method that executed the query I was interested in:

And for good measure I also disabled collection before returning from the method, so that I wouldn't get lots of statistics for queries I didn't care about:

Then I ran a JUnit test that I knew would exercise this code. After it had completed, I could connect to the database with the ij tool and verify that the statistics had been collected into the XPLAIN tables:

ij version 10.7
ij> connect 'jdbc:derby:/path/to/DB';
ij> select stmt_id from xplain.sysxplain_statements
>          where stmt_text like 'SELECT CS.REVISION%';
STMT_ID                             
------------------------------------
f93441e2-012c-e092-02e5-000003bb1070
8ef50298-012c-e092-02e5-000003bb1070
7bef4397-012c-e092-02e5-000003bb1070
0614c3de-012c-e092-02e5-000003bb1070
5623848d-012c-e092-02e5-000003bb1070

5 rows selected

The query I was interested in had been executed five times. I picked one of the STMT_ID values and passed it to the PlanExporter tool, asking it to produce both XML and HTML versions of the selected plan:

$ java org.apache.derby.tools.PlanExporter \\
    jdbc:derby:/path/to/DB XPLAIN \\
    f93441e2-012c-e092-02e5-000003bb1070 \\
    -xml plan.xml -html plan.html

The above command produced the following two files: plan.xml and plan.html. If you open the XML version, you'll see that most modern web browsers allow you to expand and collapse the nodes, which is neat when you analyze a large query and you want to focus on one specific part of it. For comparison, I'm also including plan.txt, which shows what the plan would look like in the old format (which is still supported, by the way).

Definer's rights for routines

A Derby stored function or stored procedure may execute SQL statements against the database using a nested connection. If you have SQL authorization enabled, the user that invokes the routine must have been granted privileges for all the SQL that is executed by the routine. So if the routine deletes rows from a table, you would have to grant the user delete privileges on that table, even if you don't really want that user to be allowed to delete rows except via the routine.

To address this concern, Derby 10.7 adds an optional clause (EXTERNAL SECURITY { DEFINER | INVOKER }) to the CREATE FUNCTION and CREATE PROCEDURE statements. This clause lets you specify whose privileges the routine should use when executing SQL statements on a nested connection, and could be used to delegate more fine-grained privileges to users.

For example, to allow a user to execute a procedure that performs some insert and select statements, you would previously have to give the user select and insert privileges on the tables involved, giving him carte blanche to insert whatever he wants into the tables. The code would look something like this, with multiple GRANT statements:

ij> create procedure add_item(id int, name varchar(10), category int)
>          external name 'MyProcedures.addItem'
>          language java parameter style java;
0 rows inserted/updated/deleted
ij> grant execute on procedure add_item to user1;
0 rows inserted/updated/deleted
ij> grant insert on items to user1;
0 rows inserted/updated/deleted
ij> grant select on categories to user1;
0 rows inserted/updated/deleted

If you use the new EXTERNAL SECURITY clause introduced in Derby 10.7 and specify that the procedure should execute with the definer's privileges, you only need to grant the user privileges to execute the procedure, and he will no longer have privileges to insert data directly into the table:

ij> create procedure add_item(id int, name varchar(10), category int)
>          external name 'MyProcedures.addItem'
>          language java parameter style java
>          external security definer;
0 rows inserted/updated/deleted
ij> grant execute on procedure add_item to user1;
0 rows inserted/updated/deleted

Less code, easier to maintain, and more secure.

TRUNCATE TABLE

New in 10.7 is also the TRUNCATE TABLE statement, which is handy if you want to delete all rows in a table. It is faster than DELETE FROM <TABLE>, especially on large tables, because it wipes the entire table and its indexes in one go instead of deleting one row at a time.

ij> select count(\*) from t;
1          
-----------
9          

1 row selected
ij> truncate table t;
0 rows inserted/updated/deleted
ij> select count(\*) from t;
1          
-----------
0          

1 row selected

Unicode database names

The embedded driver has always supported all characters in the Unicode character set in database names, as long as the underlying file system allows those characters in file names. But the network protocol has only worked with ASCII characters in the database names until now, so such databases couldn't be created or accessed from the client driver. In this release, the network protocol switched to UTF-8 encoding in the fields that hold the database name, so that you now, for example, can create and use a database with Japanese characters:

ij version 10.7
ij> connect 'jdbc:derby://localhost/データベース;create=true';
ij> 

This only affects the database names. The network protocol used UTF-8 encoding for SQL queries and results even before this improvement, so inserting non-ASCII text values into the database, or using non-ASCII characters in table and column names, is also supported, just like it was in all previous releases.

Wednesday Oct 13, 2010

Derby 10.6.2 is out

Apache Derby 10.6.2.1 has just been released. See the announcement from the release manager, Lily Wei, here. This is a maintenance release which contains mostly bug fixes, including the fix for one potential data corruption (DERBY-4677) and one regression in 10.6.1.0 (DERBY-4671). It also adds Chinese, French, German, Italian, Japanese, Korean and Spanish translations for the messages that were new in 10.6.1.0.

The downloads and the release notes can be found at http://db.apache.org/derby/releases/release-10.6.2.1.html.

Wednesday May 19, 2010

Derby 10.6.1 has been released

Apache Derby 10.6.1.0 has just been released. This is a new feature release, and the first release in the 10.6 series. See the full announcement here. The downloads and the release notes can be found at this URL: http://db.apache.org/derby/releases/release-10.6.1.0.cgi

Make sure that you read the release notes carefully before you upgrade. Pay special attention to the note about the security issue CVE-2009-4269, which details how to protect your database from a vulnerability in Derby's BUILTIN authentication and password substitution in previous releases. (The vulnerable code is still there to make it possible to access old databases, but the defaults have changed so that new databases will not have the vulnerability.)

The new features in this release include:

  • Sequence Generators - Named generators for allocating successive, evenly spaced numbers. See feature T176 of the SQL Standard.
  • User-defined types - Named types bound to serializable Java objects.
  • Restricted table functions - Limits on the columns and rows returned by table functions.
  • XPLAIN statistics collection - Query plan statistics stored in tables for later analysis.
  • GROUP BY ROLLUP - A subset of the SQL Standard ROLLUP functionality on the GROUP BY clause. See feature T431 of the SQL Standard.
  • CROSS JOIN - CROSS JOIN syntax. See feature F401-04 of the SQL Standard.
  • Named columns join - USING clauses in joins.
  • SHOW FUNCTIONS - IJ command that lists stored functions.
  • In-memory back end enhancements - Numerous improvements, including the ability to delete in-memory databases.
  • ORDER BY in subqueries - Syntax for explicitly ordering rows returned by subqueries. See features F851, F852, and F855 of the SQL Standard.
  • OFFSET, FETCH FIRST/NEXT in subqueries - Generalized syntax for retrieving row subsets. See features F856, F857, F858, F859, F861, F862, F863, and F864 of the SQL Standard.
  • NATURAL JOIN - Support for NATURAL JOIN. See feature T431 of the SQL Standard.
  • Qualified identifers in ij - Ability to reference cursors and prepared statements in other connections.
  • Configurable hash algorithm - Ability to customize the hash algorithm used by BUILTIN authentication.
  • Context-sniffing scripts - Ability of shipped scripts to locate Derby jars when DERBY_HOME isn't set.
  • Case-insensitive strings - Ability to ignore case in string comparisons and sorts.

See also my earlier preview entry, which describes some of these features in more detail. Most of them are also described in the Derby 10.6 manuals.


Thursday Dec 10, 2009

An early look at Derby 10.6

The Derby community is currently working on the next feature release, Derby 10.6, and a number of new features have already been implemented. If you haven't already done so, now would be a good time to download the development sources and kick the tires on your favourite new feature! Early feedback to the community (either on the mailing lists or in the JIRA bug tracker) increases the likelihood of getting improvements and bug fixes implemented before the final release.

If you don't feel like building jar files from the development sources yourself, you can download the binaries used in the nightly regression testing from http://dbtg.foundry.sun.com/derby/bits/trunk/. Make sure that you understand all the warnings about the difference between a production build and a development build before you start using it!

The release page on the wiki lists some of the features that are planned for 10.6. Let's take a quick look at some of those that are mostly done and ready for testing.

Store Java objects in columns

DERBY-651 tracks the ongoing work that enables storing of Java objects directly in the columns of a table. The objects must belong to a class that implements the java.io.Serializable interface.

For example, if you want to store java.util.List objects in a table, you first need to declare a user-defined type that maps to the Java type, and create a column with that type:

stmt.execute("CREATE TYPE JAVA_UTIL_LIST " +
             "EXTERNAL NAME 'java.util.List' " +
             "LANGUAGE JAVA");
stmt.execute("CREATE TABLE T(C1 JAVA_UTIL_LIST)");

Next, you prepare a statement and use setObject() to insert the Java object that you want to store:

PreparedStatement ps = conn.prepareStatement(
        "INSERT INTO T(C1) VALUES (?)");

ArrayList lst = new ArrayList();
lst.add("First element");
lst.add("Second element");

ps.setObject(1, lst);
ps.execute();

Finally, you execute a SELECT query and use getObject() to restore the object from the database:

ResultSet rs = stmt.executeQuery("SELECT C1 FROM T");
while (rs.next()) {
    List list = (List) rs.getObject(1);
    System.out.println("Size of list: " + list.size());
}

You can also use your object types as arguments to user-defined functions or procedures:

stmt.execute("CREATE FUNCTION LIST_SIZE(LST JAVA_UTIL_LIST) " +
             "RETURNS INTEGER " +
             "EXTERNAL NAME 'MyListFunctions.listSize' " +
             "LANGUAGE JAVA PARAMETER STYLE JAVA");
ps = conn.prepareStatement("values list_size(?)");
ps.setObject(1, lst);

Multi-level grouping

Derby's GROUP BY syntax has been extended with the ROLLUP keyword, which allows for multi-level grouping. There's a fine write-up about the feature on this wiki page. The example below shows how you can ask for the total amount of sales of each product in each state, the total amount of sales in each state, the total amount of sales in each region, and the amount of sales in all regions, in one single query:

ij> SELECT REGION, STATE, PRODUCT, SUM(SALES) FROM SALES_HISTORY
        GROUP BY ROLLUP(REGION, STATE, PRODUCT)
        ORDER BY REGION, STATE, PRODUCT;
REGION    |STA&|PRODUCT   |4          
--------------------------------------
East      |MA  |Boats     |10     <-- all boat sales in Massachusetts
East      |MA  |Cars      |190        
East      |MA  |NULL      |200    <-- sum of all sales in Massachusetts
East      |NY  |Boats     |570        
East      |NY  |Cars      |10         
East      |NY  |NULL      |580        
East      |NULL|NULL      |780    <-- sum of all sales in region East
West      |AZ  |Boats     |40         
West      |AZ  |Cars      |300        
West      |AZ  |NULL      |340        
West      |CA  |Boats     |570        
West      |CA  |Cars      |750        
West      |CA  |NULL      |1320       
West      |NULL|NULL      |1660       
NULL      |NULL|NULL      |2440   <-- sum of all sales in all regions

15 rows selected

Richer JOIN syntax

Derby already supports INNER JOIN, LEFT OUTER JOIN and RIGHT OUTER JOIN. In 10.6, the syntax will be enhanced with support for

  • the CROSS JOIN operation
  • named column joins
  • sub-queries in ON clauses

CROSS JOIN is the simplest of the join operations, and SELECT \* FROM T1 CROSS JOIN T2 is just another way of writing SELECT \* FROM T1, T2. It may also be combined with other join operations:

ij> SELECT \* FROM T1 CROSS JOIN T2 LEFT OUTER JOIN T3 ON T1.A = T3.B;
A          |B          |A          |B          |A          |B          
-----------------------------------------------------------------------
5          |2          |2          |7          |NULL       |NULL       

1 row selected

Named columns join may save you some typing when the columns in the join key have the same name in the two tables being joined. In the example below, where two tables COUNTRIES and CITIES are joined, you no longer need to write the full join condition ON COUNTRIES.COUNTRY = CITIES.COUNTRY. Instead, you just say that the tables should be joined on the COUNTRY column with a USING clause:

ij> SELECT COUNTRY, COUNT(CITY_ID) FROM COUNTRIES JOIN CITIES
        USING (COUNTRY) GROUP BY COUNTRY;
COUNTRY                   |2          
--------------------------------------
Afghanistan               |1          
Argentina                 |1          
Australia                 |2          
.
.
.
United Kingdom            |1          
United States             |37         
Venezuela                 |1          

45 rows selected

Explain plan

The XPLAIN functionality makes it possible to access run-time statistics with SQL queries. See this section in the development version of Tuning Derby for details about how to enable it and use it. Here's one of the more advanced examples from the manual, which shows how to get all queries that performed a table scan on the COUNTRIES table, and the number of pages and rows visited in the scan:

ij> SELECT ST.STMT_TEXT,
           SP.NO_VISITED_PAGES AS PAGES,
           SP.NO_VISITED_ROWS AS "ROWS"
    FROM STATS.SYSXPLAIN_SCAN_PROPS SP, 
         STATS.SYSXPLAIN_RESULTSETS RS, 
         STATS.SYSXPLAIN_STATEMENTS ST 
    WHERE ST.STMT_ID = RS.STMT_ID AND 
          RS.SCAN_RS_ID = SP.SCAN_RS_ID AND 
          RS.OP_IDENTIFIER = 'TABLESCAN' AND 
          SP.SCAN_OBJECT_NAME = 'COUNTRIES';
STMT_TEXT                     |PAGES      |ROWS       
------------------------------------------------------
SELECT \* FROM COUNTRIES       |2          |114        

1 row selected

ORDER BY in sub-queries

Up till now, Derby has only accepted ORDER BY clauses in top-level SELECT queries. The ongoing work on DERBY-4397 will allow ORDER BY clauses in sub-queries as well. This is going to resolve the very old request for ordering of inserts (DERBY-4). There's also a request for OFFSET/FETCH in sub-queries (DERBY-4398), which would be very useful in combination with ORDER BY.

Dropping of in-memory databases

Derby 10.5 added support for in-memory databases, but there was no convenient API for deleting the in-memory databases and reclaiming the memory without taking down the Java process. In Derby 10.6, in-memory databases can be destroyed with an API similar to the one used for creating and shutting down databases. Simply add drop=true to the JDBC URL.

ij> connect 'jdbc:derby:memory:mydb;create=true';
ij> create table t (x int);
0 rows inserted/updated/deleted
ij> connect 'jdbc:derby:memory:mydb;drop=true';
ERROR 08006: Database 'memory:mydb' dropped.

Very nice if you're running JUnit tests and want to clear the database and free up the memory between each test case.

SHOW FUNCTIONS

Derby's interactive tool for running scripts or queries against a database, ij, lacked a command to show the functions that are stored in the database. Derby 10.6 adds SHOW FUNCTIONS to ij's list of SHOW commands, so now you can issue the following command to view all the functions in the APP schema:

ij> SHOW FUNCTIONS IN APP;
FUNCTION_SCHEM|FUNCTION_NAME               |REMARKS                            
----------------------------------------------------------
APP           |MY_FUNC                     |MyFunctions.f                      

1 row selected

Restricted table functions

In Derby 10.6 it will be possible to push predicates into table functions and process the restrictions before the function returns the results to the SQL engine. This can speed up execution of some table function queries significantly. Take this example:

SELECT \* FROM TABLE ( MY_FUNCTION() ) AS T WHERE X = 5

If MY_FUNCTION() returns thousands of rows, and only a handful of them match the restriction X = 5, a lot of work is wasted generating and scanning rows that are just thrown away. With restricted table functions, Derby's SQL engine will pass information about the restriction down to the table function, and the table function may use this information to produce and return only the rows that are actually needed.

Details about how to use restricted table functions can be found in the functional spec attached to DERBY-4357.

ROW_NUMBER improvements

Previous releases of Derby only allowed the ROW_NUMBER function to appear in the select list of a query, and it could not be used to build up more complex expressions. Now these limitations have been removed, and you can write queries like this one without getting syntax errors:

SELECT X / ROW_NUMBER() OVER ()
  FROM T
  ORDER BY ROW_NUMBER() OVER () DESC

Friday Nov 13, 2009

OpenGrok 0.8 RC1

Lubos has just built and posted the first release candidate for OpenGrok 0.8. See his announcement here. Please test it and give feedback on opengrok-discuss, or file bugs or enhancement requests on http://defect.opensolaris.org/.

It's a long time since the previous release (more than a year!), so we're of course eager to get all the bug fixes and the fine new features we've been working on out very soon now.

Apart from a couple of bug fixes, my main task for this release has been to implement the Java DB history back-end (bug #3624). This is a new cache layer on top of the code that talks to all the version control systems, and it can be used instead of the old XML-based cache. The XML-based cache is still the default, but Lubos wrote a nice section on how to set up the new cache in README.txt. If you try the Java DB back-end, you may notice that the initial indexing and generation of the history cache is somewhat slower, but that's compensated by the ability to update the cache incrementally later. Here's the full list of new functionality provided by the new back-end:

  • Incremental update of the history cache (bug #3052), so no need to retrieve and process the full history more than once
  • History for directories is cached
  • The history view shows which files were touched by the different check-ins

Other new features:

  • Ability to search across multiple projects has been reintroduced
  • Lucene back-end upgraded to 2.4.1
  • New scripts to simplify indexing and deployment
  • Auto-generated OpenSearch description enables web browsers to detect OpenGrok as a search engine
  • Several UI improvements

Saturday Sep 12, 2009

Java DB 10.5.3 is ready for download

Java DB 10.5.3.0, Sun's supported distribution of Apache Derby 10.5.3.0, is now ready for download on the Java DB site. This release includes all the bug fixes and the new localized messages found in Derby 10.5.3. Also new in this release is a native installer for Mac OS X, which comes in addition to the existing Java DB installers for Solaris, Linux and Windows and the platform-independent zip archive.

Thursday Aug 27, 2009

Derby 10.5.3 has been released

Apache Derby 10.5.3.0 has been released. It's a maintenance release which fixes a number of bugs found in the latest feature release (10.5.1), and it contains updated localizations for the new messages introduced in 10.5.1. If you haven't already upgraded to Derby 10.5, now is a good time to do it!

See the full release announcement here.

You may have noticed that Derby 10.5.2.0 was also released recently. That release did however suffer from a wrong results regression that was discovered just as the release was posted on the website. Derby 10.5.3.0 is essentially the same as Derby 10.5.2.0, but with the regression and a couple of other bugs fixed.

Wednesday Jul 01, 2009

Accessing Derby from Perl

Bernt found out that There is More Than One Way To Do It (TMTOWTDI) is true for Derby too, and wrote a wiki page which explains how to access a Derby database from Perl using the DBD::JDBC module. Very cool! :) See the wiki page here.

Friday May 29, 2009

Java DB 10.5.1.1 has been released

Java DB 10.5.1.1 has just been posted on the Java DB site. This release contains the exact same jar files as, and is functionally equivalent to, the recently released Apache Derby 10.5.1.1. In addition, you get

  • installers for Solaris, Linux and Windows
  • the option to register and track your Java DB installations in Sun Inventory
  • the option to buy support from Sun
For more information about what's new in Java DB 10.5.1.1, see my previous blog entries about Apache Derby 10.5.1.1, the Apache Derby site and the Java DB site.

Sunday May 03, 2009

Derby 10.5.1.1 is out

Apache Derby 10.5.1.1 was released two days ago. See the full announcement here.

In addition to lots of bug fixes and documentation fixes, there are a number of new features in this release, including

SQL roles
Read more in my previous blog entry
SQL generated columns
Read more in my previous blog entry
LOB improvements
Many performance and usability improvements for BLOBs and CLOBs
Replication improvements
Replication is now also possible with encrypted databases
SQL OFFSET/FETCH (aka limit functionality)
Read more in my previous blog entry
In-memory storage engine
Read more in my previous blog entry
Update Statistics Stored Procedure
Refresh outdated index cardinality statistics in an easier fashion. More info here and here

Also, the index split deadlock frequently seen in previous releases has been fixed.

Friday Dec 05, 2008

User-defined ordering of strings in Apache Derby

Apache Derby (and Java DB) has supported language-based ordering of strings (aka territory-based collation) since version 10.3.1.4. This means for instance that you can get an ORDER BY clause in an SQL query to sort the results alphabetically according to the rules of the language of your choice, instead of sorting the strings by the value of their Unicode codepoints, which is the default.

So when you're using the default codepoint-based ordering, your results would be ordered like this (note that 'a' comes after 'B'):

ij> connect 'jdbc:derby:TestDB;create=true';
ij> select \* from (values 'a','A','B') as s(x) order by x;
X
-
A
B
a

3 rows selected

Using language-based ordering for the en_US locale would give you a more natural ordering of the results (both 'a' and 'A' come before 'B'):

ij> connect 'jdbc:derby:EnglishDB;create=true;territory=en_US;collation=TERRITORY_BASED';
ij> select \* from (values 'A','B','a') as s(x) order by x;
X
-
a
A
B

3 rows selected

This feature is very useful as long as you're happy with the ordering provided by the locales supported by your Java Runtime Environment. But sometimes you may want to sort your strings based on some other rules than those in the languages supported by the JRE. The most common example is that users want case-insensitive string matching. Another example came up recently in this thread on derby-user, where a user wanted to modify the German collation rules to make Greek characters sort near their Latin equivalents ('α' near 'a', 'β' near 'b', and so on).

Derby currently (as of 10.4) doesn't allow you to define your own collation rules, it merely relies on the collators returned by java.text.Collator.getInstance(java.util.Locale). Fortunately, the JRE allows you to define new locales with their own collation rules. And it turns out that it's quite easy, just perform these three simple steps:

  • Create a class that extends java.text.spi.CollatorProvider and returns a collator that orders strings the way you want it to
  • Create a text file named META-INF/services/java.text.spi.CollatorProvider which contains one line with the name of your collator provider class
  • Put the compiled class file and the text file in a jar file which you drop into your JRE's lib/ext directory or in one of the directories specified by the java.ext.dirs property

Let's look at an example. Collators can have different strengths, which tells which differences to regard as significant when comparing strings. Case difference is a tertiary difference, so to get a case-insensitive collator we could set the strength to secondary and the case differences will be ignored. If we want to define a variant of the en_US locale, say en_US_caseinsensitive, whose collators follow the ordering rules of the English language and ignore case differences, we could write this small CollatorProvider:

public class MyCollatorProvider extends CollatorProvider {
    public Locale[] getAvailableLocales() {
        return new Locale[] {
            new Locale("en", "US", "caseinsensitive")
        };
    }
    public Collator getInstance(Locale locale) {
        Collator c = Collator.getInstance(Locale.US);
        // Ignore tertiary differences (case differences)
        c.setStrength(Collator.SECONDARY);
        return c;
    }
}

Now compile the class and package it in a jar, together with a text file in META-INF/services as described above. In the directory where the jar is stored, start ij (Derby's SQL client) with the following command:

$ java -Djava.ext.dirs=. -jar /path/to/derbyrun.jar ij
ij version 10.4
ij> 

Then create a database which uses the new locale that you just defined, and insert some test data and see that the string matching really is case-insensitive:

ij> connect 'jdbc:derby:NoCaseDB;territory=en_US_caseinsensitive;collation=TERRITORY_BASED;create=true';
ij> create table t (x varchar(12));
0 rows inserted/updated/deleted
ij> insert into t values 'hi!','Hi!','hello','Hello','HELLO','HeLlO';
6 rows inserted/updated/deleted
ij> select \* from t where x = 'hi!';
X           
------------
hi!         
Hi!         

2 rows selected
ij> select distinct \* from t order by x;
X           
------------
hello       
hi!         

2 rows selected
ij> select x, count(\*) from t group by x;
X           |2          
------------------------
hello       |4          
hi!         |2          

2 rows selected

If you want to define your own custom ordering, like having Greek characters in between Latin characters, you could define another locale with a CollatorProvider that returns a RuleBasedCollator with whichever rules you want. See its class javadoc for details about how you specify rules. In its simplest form, a set of rules might look like "a,A < b,B < c,C", which means more or less that a and A should be sorted before b and B, which should be sorted before c and C. So to get the Greek characters sorted near similar Latin characters, define a CollatorProvider that looks like this one:

public class MyCollatorProvider extends CollatorProvider {
    public Locale[] getAvailableLocales() {
        return new Locale[] {
            new Locale("en", "US", "greek")
        };
    }
    public Collator getInstance(Locale locale) {
        StringBuilder rules = new StringBuilder();
        // alpha should go between a and b
        rules.append("< a,A < \\u03b1 < b,B");
        // beta should go between b and c
        rules.append("& b,B < \\u03b2 < c,C");
        // add more rules here ....
        try {
            return new RuleBasedCollator(rules.toString());
        } catch (ParseException pe) {
            throw new Error(pe);
        }
    }
}

Again, put the compiled class and META-INF/services/java.text.spi.CollatorProvider in a jar file, and start ij with -Djava.ext.dirs=. in the directory where the jar file is located. Create a database which uses the new locale and insert some data with both Greek and Latin characters:

ij> connect 'jdbc:derby:GreekDB;territory=en_US_greek;collation=TERRITORY_BASED;create=true';
ij> create table t (x varchar(12));
0 rows inserted/updated/deleted
ij> insert into t values 'a', 'b', 'c', 'α', 'β';
5 rows inserted/updated/deleted
ij> select \* from t order by x;
X           
------------
a           
α           
b           
β           
c           

5 rows selected

Look, the ordering was just like we wanted it to be, with the Greek characters in between the Latin ones, and not at the end where they would normally be located!

One final word of caution: If you ever update your custom CollatorProvider so that the ordering is changed after you have created a database, you will need to recreate the database. This is because the indexes in the database are ordered, and you may see strange results if the indexes are ordered with a different collator than the one your database is currently using.

Want to try this out yourself? Here's a Java class that implements the CollatorProvider for the two examples. Put it in a directory together with this Ant script, and execute ant to create a jar file with the required file in META-INF/services (requires Apache Ant 1.7.0 or later).

Friday Dec 21, 2007

The importance of good hash functions

I have always been told that it is important to design hash functions carefully in order to ensure that they return values distributed evenly across the entire range of possible values. Otherwise, you risk that many distinct keys map to the same bucket in the hash table, and the hash lookups in the worst case degenerate to linked list traversals. Although this sounds like a very good piece of advice in theory, I have never seen a real-world example where a naive implementation of a hash function has done any harm, until now.

I was making some changes to a stress test that ran simple join queries against an Apache Derby database. The changes were quite innocent, I was just changing the order in which the tables and indices used in the joins were created, so I didn't expect them to influence the test noticeably. But they did! When I got the test back up and running, its performance had dropped to less than 50% of what it was before.

My first thought was that the changed table/index order had somehow tricked Derby's query optimizer into choosing a different query execution plan than before. If the optimizer makes a bad decision (or an uninformed decision, like in this case recently discussed on the derby-user mailing list), performance drops of this size, or even greater, may be experienced. However, when I made Derby print the query plans, I couldn't see any difference at all.

The next step was to profile the test in order to find out why it took so long. I have found collect/analyzer from Sun Studio to be a valuable tool for such tasks. You simply start your application from the command line with "collect -j on" followed by the command you normally start it with:

  collect -j on java MyApp arg1 arg2 ...

When I opened the collected performance data in analyzer, this is what I saw:

Okay, more than 60% of the CPU time is spent in RecordId.equals(), whose only job is to check the equality of four integer fields in two RecordId objects. Since it's a cheap method, it means that it must have been called very frequently in order to spend that much of the total CPU time. Further investigation showed that almost all the calls to this method came from a call to ConcurrentHashMap.get() in Derby's lock manager, which used RecordId objects as hash table keys. However, there was nothing in the profiling data indicating that ConcurrentHashMap.get() was invoked more frequently now than it was before. So then there had to be more calls to RecordId.equals() per call to to ConcurrentHashMap.get().

Now, what could cause that? The only thing I could think of, was collisions in the hash table caused by a bad hash function. I inserted some print statements in the Derby code and found out that the test visited about 350 000 distinct records in the database. In the original version of the test, 350 000 distinct RecordId values mapped to only 12 000 distinct hash values, which is bad to start with. In the new version of the test, the number of distinct hash values had dropped even further down to 6000. When I looked at the calculation of the hash value in RecordId.hashCode(), it was pretty obvious how this could happen. The hash value was constructed by taking the bitwise XOR of these four integers:

recordId
A record number which is unique within the page on which the record exists
pageNumber
The id of the page containing the record
containerId
The id of the container (that is, table or index) containing the page
segmentId
Uninteresting variable in this discussion, since it's always 0

Since all of these variables essentially are counters starting from 0, they tend to be in the lower value range, and all the higher bits are unused. When you take the bitwise XOR of such integers, the higher bits will still be unused, and you'll end up with lots of values clustered in a relatively small range. This basically means that collisions are bound to happen. When I changed the table and index creation order in the test, I must have been unlucky and got less variation in the containerId variable for the tables involved in the join, hence the lower number of distinct hash values and the increased frequency of collisions.

So, now the problem had been identified: the hash function is biased. But how could it be fixed? I mentioned that I've always heard that careful design is needed for hash functions. Unfortunately, I must have missed the lecture where we were taught how to write good hash functions, if there ever was one. Luckily, after a couple of unsuccessful attempts to fix it, I discovered that NetBeans had this nice Insert Code menu.

I only needed to pick Generate hashCode() and tell which fields I wanted to include in the hash code calculation. NetBeans then generated a RecordId.hashCode() method which looked like this:

    public int hashCode() {
        int hash = 7;
        hash = 89 \* hash + pageId.hashCode();
        hash = 89 \* hash + recordId;
        return hash;
    }

It doesn't look much more complex than the old method which only XOR'ed the fields, but thanks to the repeated addition and multiplication with a prime number, the hash values will be distributed across a wider range, even if the individual fields don't differ very much. (Apparently, this is very similar to the approach suggested by Joshua Bloch in his Effective Java Programming Language Guide.) I repeated the procedure for PageKey.hashCode(), which is called from RecordId.hashCode().

Then, when I reran the test with the modified Derby code, the performance was back at its normal level, more than twice as high as without the fixed hash functions. And the 350 000 records that previously mapped to 12 000 or 6000 distinct hash values, now mapped to 280 000 distinct hash values. Not bad to get such an improvement with just a couple of mouse clicks in NetBeans, and not writing a single line of code myself! :)

About

kah

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today