IT Innovation

XQuery: A New Way to Search

XQuery makes finding the right XML easy.

By Caroline Kvitka

January/February 2005

XML has grown to become the de facto standard for data representation. This growth in popularity has spurred the need for a way to search XML documents for specific information and to scour repositories of documents represented in XML to find specific documents. Enter the XML Query language—XQuery—a language designed specifically for querying collections of XML data. The challenge for XQuery is to search XML documents and pull out the document that you want or the parts of the document that you want efficiently, easily, reliably, and predictably.

"The situation is actually complicated a little bit more by the fact that XML has come to serve different purposes, sometimes at the same time," says Jim Melton, consulting member of the technical staff at Oracle, cochair of the W3C's XML Query Working Group, and editor of all parts of the SQL standard. "XML is used to mark up documents, but it's also used to mark up data to say, 'Here is the employee ID,' 'Here is the employee salary,' and so on. Because XML serves a mixture of purposes, the query language has to be able to deal with the different uses of XML."

From the looks of it, XQuery is standing up to the challenge. Currently a working draft specification of the World Wide Web Consortium (W3C), XQuery was first proposed in 1998. Oracle, IBM, Microsoft, DataDirect Technologies, Bell Labs, and BEA are all active in the development of the specification. In addition, several universities have been involved in solving some of the theoretical data model problems.

Why XQuery?

XQuery provides a mechanism for efficiently and easily extracting information from relational databases, files, and other data sources, requiring only that those data sources provide an XML view of the data. XQuery also allows you to construct XML data by using the result of a query. With Oracle's XQuery, you can view and query a relational database table as just another XML data source. One of XQuery's most distinct features is its ability to run a single query on multiple data sources. For example, you can run a query that combines an incoming purchase order in native XML format, an archive of catalog data in native XML format, and an inventory system held in a relational database.

"With XQuery, you have the ability to correlate information from multiple documents in a single query, so that your answers are more comprehensive, coming straight out of the query, without your application having to combine the information," says Melton.

But the feature XQuery fans are probably most excited about is that the specification delivers a real data model for XML (based on, but not limited to, XML Schema). "Until now, there have been several attempts at building data models, and almost always the people who built them immediately claimed they were not really data models but just a way of talking about XML," says Melton. "So we finally have a real data model, and we can start applying more-rigorous handling of XML data, with a more theoretically sound foundation."

Other Query Options: XPath and SQL

Without XQuery support, you have a couple of options for querying XML data. The first is XPath 1.0, an expression language used to select portions of an XML document. The second is to convert your data to a SQL-like representation and run SQL queries against it. "Your only other alternative is to go to one of the so-called 'native' XML databases, which don't provide all the robustness and security and functionality and world-class support that one of the mainstream databases such as Oracle provides," says Stephen Buxton, director of product management for search and XML at Oracle and one of Oracle's representatives on the W3C's XML Query Working Group.

XPath 1.0 is relatively primitive in that you can't readily combine information from multiple documents into one query result. Instead, you have to query one document at a time. "People who deal with XML data all the time really are saying, 'This XPath language is OK, but it cannot do joins. In order to do joins, I have to have XQuery,'" says Buxton.

However, there's good news for XPath. XQuery and XPath 2.0 are being developed by the same W3C Working Group, and XQuery actually includes XPath 2.0 as a subset. Merging the development efforts is expected to make XPath much more powerful. With the XQuery data model and the integration of XQuery and XPath, you can apply strong typing to XPath operations as well as to XQuery. With strong typing, you decrease the chances of using the wrong kind of data to develop your answers.

A query alternative for XML in an object-relational database is to use the SQL extensions called SQL/XML. These extensions enable users to publish data as XML and to integrate XQuery with their SQL applications, according to Melton.

"In a sense, a query language for XML is not terribly different from a query language for ordinary business data, the kind that you find in relational databases," says Melton. However, because XML data is not as fully structured as relational data, there is a need for a language designed specifically for XML.

A Look Inside XQuery

What's so special about XQuery? For starters, XQuery aims to increase developer productivity by defining a library of nearly 100 functions and operators that applications can use in their XQuery or XPath expressions. "This is a bit unusual in the XML world, where people are used to building their own functions," explains Melton. "With XPath 1.0, you typically had to rely on extensions to have these sorts of functions available, but with XQuery they're now built into the language."

XQuery is also unique as far as query languages go in that it has two syntaxes: a human-readable syntax and an XML syntax—that is, XML elements and attributes that express a query equivalent to the human-readable syntax. The XML syntax for XQuery, called XQueryX, is easy for programs to parse and can be used by standard XML tools to create, interpret, or modify queries. So if you have a program generating XQuery queries automatically, the program could generate the queries in XQueryX, rather than generating them in human-readable syntax.

"If you develop libraries of queries that are expressed in XQueryX, you can write queries on those queries," says Melton. "So if you want to find out all of your corporation's queries that ask about salaries, it's very easy to do that now. You can write an XQuery that goes out and queries all of your XQueryX queries to find out which of them are asking questions about salaries."

XQuery Support in Oracle

Although XQuery has not reached the final standard (recommendation) stage, Buxton says that the specification is sufficiently stable—and that customer demand is sufficiently great—to build support for XQuery into the Oracle technology stack, including the database, the application server, and the tool set.

"From Oracle's point of view, our customers are going to want to represent their data as XML," says Buxton. "Whether that's for exchanging data between companies, between applications, or between a data store and a browser, we need to make sure that our customers can store data as XML; represent data as XML—whether it's stored as XML or not; query it; and extract the right pieces from it." Oracle plans to support almost all of the XQuery specification as well as extensions for full-text search, which is not yet part of the specification.

In fact, Buxton feels that XQuery support is vital for any database that's going to survive. "Customers are asking for native XML support in the database, and native XQuery support in the query capabilities of the database," he says. "Oracle will be the first major database vendor to provide both of those, and we expect to lead the way in native XML support going forward."

Oracle Database 10g Release 1 included extensions to SQL that allowed users to look inside XML, using XPath 1.0. However, Oracle Database 10g Release 2—announced at Oracle OpenWorld, San Francisco, 2004—includes support for XQuery. That means that in Release 2, if you're dealing with XML data and running XQuery queries, you can apply those queries directly in your Oracle database and have them execute natively.

In Oracle Database 10g Release 2, Oracle also supports some SQL/XML extensions as well as some proprietary extensions that will allow users to run XQuery queries over XML stored in the database or XML views in the database.

With XQuery support in the Oracle database, you'll maintain all the benefits of the Oracle database while storing or representing your data in XML, explains Buxton. "You'll be able to do these arbitrarily complex queries on your data without having to analyze it, shred it, and convert it to some other format first, and you'll be able to store or represent your data in XML."

In the Oracle Application Server 10g release planned for later this year (2005), Oracle XML Data Synthesis (XDS) uses XQuery to integrate different data sources by firing XQuery at each of them.

The End . . . for Native XML Databases

The bottom line? XQuery is a whole new way of querying XML data, and it's here to stay. "The earliest implementers of XQuery have been the so-called 'native' XML databases," says Buxton. "It's interesting that the major players right now on the standards working group and task forces are the major database vendors—Oracle, IBM, and Microsoft. We see that as an indication that the 'native' XML database vendors will quietly fade away, and we see XQuery becoming mainstream in traditional, relational databases over the next couple of years."

Next Steps

 Oracle XML Data Synthesis technology preview

 READ more about XQuery


Photography by Paul Frenzel, Unsplash