Monday, October 11, 2010

XQuery - Querying XML 'datastore'

XQuery is a query and functional programming language that is designed to query collections of XML data. It provides the means to extract and manipulate data from XML documents or any data source that can be viewed as XML, such as relational databases or office documents.
XQuery uses XPath expression syntax to address specific parts of an XML document. It supplements this with a SQL-like "FLWOR expression" for performing joins. A FLWOR expression is constructed from the five clauses after which it is named: FOR, LET, WHERE, ORDER BY, RETURN.
XPath is query language for selecting nodes from an XML document. It has ability to navigate around the XML tree structure, selecting nodes by a variety of criteria.

Usages:
1. Extracting information from a database for a use in web service.Eg: in ESB
2. Generating summary reports on data stored in an XML database.Eg: markmail.org
3. Searching textual documents on the Web for relevant information and compiling the results.Eg: markmail.org
4. Selecting and transforming XML data to XHTML to be published on the Web.
5. Pulling data from databases to be used for the application integration.
6. Splitting up an XML document that represents multiple transactions into multiple XML documents.
Eg: http://markmail.org/ - free service for searching mailing list archives. Here, Each email is stored internally as an XML document, and accessed using XQuery. All searches, faceted navigation, analytic calculations, and HTML page renderings are performed by a small MarkLogic Server cluster running against millions of messages.
[eg: of a url on xquery ->
 1. http://markmail.org/search/?q=xquery
 2. http://markmail.org/search/?q=xquery#query:xquery+page:1+mid:gt5gf3btmcdwyuks+state:results]

FLWOR detailed
for $d in document("depts.xml")//deptno
let $e := document("emps.xml")//employee[deptno = $d]
where count($e) >= 10
order by avg($e/salary) descending
return
<big-dept>
for $d in document("depts.xml")//deptno
let $e := document("emps.xml")//employee[deptno = $d]
where count($e) >= 10
order by avg($e/salary) descending
return
<big-dept>
{ $d,
<headcount>{count($e)}</headcount>,
<avgsal>{avg($e/salary)}</avgsal>
}
</big-dept>
{ $d,
<headcount>{count($e)}</headcount>,
<avgsal>{avg($e/salary)}</avgsal>
}
</big-dept>

for generates an ordered list of bindings of deptno values to $d
let associates to each binding a further binding of the list of emp elements with that department number to $e
at this stage, we have an ordered list of tuples of bindings: ($d,$e)
where filters that list to retain only the desired tuples
order sorts that list by the given criteria
return constructs for each tuple a resulting value


General rules:
for and let may be used many times in any order
only one where is allowed
many different sorting criteria can be specified 
for $x in /company/employee
    generates a list of bindings of $x to each employee element in the company, but:
let $x := /company/employee

    generates a single binding of $x to the list of employee elements in the company.


Another FLWOR eg:

Sample xml doc:
<bib>
<book>
<title>TCP/IP Illustrated</title>
<author>Stevens</author>
<publisher>Addison-Wesley</publisher>
</book>
<book>
<title>Advanced Programming
in the Unix Environment</title>
<author>Stevens</author>
<publisher>Addison-Wesley</publisher>
</book>
<book>
<title>Data on the Web</title>
<author>Abiteboul</author>
<author>Buneman</author>
<author>Suciu</author>
</book>
</bib>

Query eg: to author list and the lists of books published by each author
<authlist>
{
for $a in fn:distinct-values($bib/book/author)
order by $a
return
<author>
<name> {$a} </name>
<books>
{
for $b in $bib/book[author = $a]
order by $b/title
return $b/title
}
</books>
</author>
}
</authlist>

Result of above expression:
<authlist>
<author>
<name>Abiteboul</name>
<books>
<title>Data on the Web</title>
</books>
</author>
<author>
<name>Buneman</name>
<books>
<title>Data on the Web</title>
</books>
</author>
<author>
<name>Stevens</name>
<books>
<title>Advanced Programming
in the Unix Environment</title>
<title>TCP/IP Illustrated</title>
</books>
</author>
<author>
<name>Suciu</name>
<books>
<title>Data on the Web</title>
</books>
</author>
</authlist>

How FLWOR works:


Query to list each publisher and the average price of their books:
for $p in distinct-values(document("bib.xml")//publisher)
let $a := avg(document("bib.xml")//book[publisher = $p]/price)
return
<publisher>
<name>{ $p/text() }</name>
<avgprice>{ $a }</avgprice>
</publisher>

Operators available:
  • concatenation: ,
  • set operators: | (or union), intersect, except
  • functions: remove, index-of, count, avg, max, min, sum, distinct-values
  • Universal Quantification Expressions

The syntax of the universal quantification expression is:
every $v in seq-expression satisfies test-expression

Eg: The following query checks if the price of every item customer 1001 orders is over 200.

# Query listing
let $price := document("data/PO.xml")
//po[customer/custno='1001']
/lineitems/lineitem/item/price
return
if (every $v in $price satisfies ( $v > 200 ) ) then
<result>
customer always orders expensive items!
</result>
else
<result>
Customer does not always order expensive items
</result>

Unless customer 1001 always orders items priced over 200, the result is
<result>
Customer does not always order expensive items
</result>

Example Implementation
Zorba - XQuery Processor

XQuery Processor embeddable in a variety of environments such as other programming languages extended with XML processing capabilities, browsers, database servers, XML message dispatchers, or smartphones. Available as API for Java.

XQuery API for Java (JSR 225)A common API that allows an application to submit queries conforming to the W3C XQuery 1.0 specification and to process the results of such queries.Refer: Blog on XQJ - JSR 225
Misc:
W3C XQuery
XQJ - JSR 225
XML Querying
Zorba XQuery Processor
Excellent quick book

1 comment: