UNIT IV
XML – Databases
XML Database is used to store huge amount of information in the XML
format. As the use of XML is increasing in every field, it is required to
have a secured place to store the XML documents. The data stored in
the database can be queried using XQuery, serialized, and exported
into a desired format.
XML Database Types
There are two major types of XML databases −
XML- enabled
Native XML (NXD)
XML - Enabled Database
XML enabled database is nothing but the extension provided for the
conversion of XML document. This is a relational database, where
data is stored in tables consisting of rows and columns. The tables
contain set of records, which in turn consist of fields.
Native XML Database
Native XML database is based on the container rather than table
format. It can store large amount of XML document and data. Native
XML database is queried by the XPath-expressions.
Native XML database has an advantage over the XML-enabled
database. It is highly capable to store, query and maintain the XML
document than XML-enabled database.
Example
Following example demonstrates XML database −
<?xml version = "1.0"?>
<contact-info>
<contact1>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact1>
<contact2>
<name>Manisha Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 789-4567</phone>
</contact2>
</contact-info>
Here, a table of contacts is created that holds the records of contacts
(contact1 and contact2), which in turn consists of three entities
− name, company and phone.
XML Hierarchical (Tree) Data Model
The basic object in XML is the XML document. Two main
structuring concepts are used to construct an XML document: elements
and attributes.
An example of an XML element called . As in HTML,
elements are identified in a document by their start tag and end tag. The
tag names are enclosed between angled brackets
< ... >, and end tags are further identified by a slash.</…..>.
Complex elements are constructed from other elements
hierarchically, whereas simple elements contain data values. A major
difference between XML and HTML is that XML tag names are defined
to describe the meaning of the data elements in the document, rather
than to describe how the text is to be displayed. This makes it possible
to process the data elements in the XML document automatically by
computer programs.
Also, the XML tag (element) names can be defined in
another document, known as the schema document, to give a semantic
meaning to the tag names that can be exchanged among multiple users.
In HTML, all tag names are predefined and fixed; that is why they are
not extendible. It is possible to characterize three main types of XML
documents:
Data-centric XML documents.
These documents have many small data items that follow a
specific structure and hence may be extracted from a structured
database. They are formatted as XML documents in order to exchange
them over or display them on the Web. These usually follow a
predefined schema that defines the tag names.
Document-centric XML documents.
These are documents with large amounts of text, such as
news articles or books. There are few or no structured data elements in
these documents.
Hybrid XML documents.
These documents may have parts that contain structured
data and other parts that are predominantly textual or unstructured. They
may or may not have a predefined schema
<?xml version= “1.0” standalone=“yes”?>
<Projects>
<Project>
<Name>ProductX</Name>
<Number>1</Number>
<Location>Bellaire</Location>
<Dept_no>5</Dept_no>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</Last_name>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<First_name>Joyce</First_name>
<Hours>20.0</Hours>
</Worker>
</Project>
<Project>
<Name>ProductY</Name>
<Number>2</Number>
<Location>Su garland</Location>
<Dept_no>5</Dept_no>
<Worker>
<Ssn>123456789</Ssn>
<Hours>7.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>20.0</Hours>
</Worker>
<Worker>
<Ssn>333445555</Ssn>
<Hours>10.0</Hours>
</Worker>
</Project>
...
</Projects>
XML documents that do not follow a predefined schema of element names
and corresponding tree structure are known as schemaless XML
documents. It is important to note that data-centric XML documents can
be considered either as semistructured data or as structured data
XML - Documents
An XML document is a basic unit of XML information composed of
elements and other markup in an orderly package. An
XML document can contains wide variety of data. For example,
database of numbers, numbers representing molecular structure or a
mathematical equation.
XML Document Example
A simple document is shown in the following example −
<?xml version = "1.0"?>
<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>
The following image depicts the parts of XML document.
Document Prolog Section
Document Prolog comes at the top of the document, before the root
element. This section contains −
XML declaration
Document type declaration
You can learn more about XML declaration in this chapter − XML
Declaration
Document Elements Section
Document Elements are the building blocks of XML. These divide the
document into a hierarchy of sections, each serving a specific
purpose. You can separate a document into multiple sections so that
they can be rendered differently, or used by a search engine. The
elements can be containers, with a combination of text and other
elements.
XML DTD
What is DTD
DTD stands for Document Type Definition. It defines the legal building blocks of an XML
document. It is used to define document structure with a list of legal elements and
attributes.
Purpose of DTD
Its main purpose is to define the structure of an XML document. It contains a list of legal
elements and define the structure with the help of them.
Checking Validation
Before proceeding with XML DTD, you must check the validation. An XML document is
called "well-formed" if it contains the correct syntax.
A well-formed and valid XML document is one which have been validated against DTD.
Valid and well-formed XML document with
DTD
Let's take an example of well-formed and valid XML document. It follows all the rules of
DTD.
employee.xml
1. <?xml version="1.0"?>
2. <!DOCTYPE employee SYSTEM "employee.dtd">
3. <employee>
4. <firstname>vimal</firstname>
5. <lastname>jaiswal</lastname>
6. <email>vimal@javatpoint.com</email>
7. </employee>
In the above example, the DOCTYPE declaration refers to an external DTD file. The content
of the file is shown in below paragraph.
employee.dtd
1. <!ELEMENT employee (firstname,lastname,email)>
2. <!ELEMENT firstname (#PCDATA)>
3. <!ELEMENT lastname (#PCDATA)>
4. <!ELEMENT email (#PCDATA)>
Description of DTD
<!DOCTYPE employee : It defines that the root element of the document is employee.
<!ELEMENT employee: It defines that the employee element contains 3 elements
"firstname, lastname and email".
<!ELEMENT firstname: It defines that the firstname element is #PCDATA typed. (parse-
able data type).
<!ELEMENT lastname: It defines that the lastname element is #PCDATA typed. (parse-
able data type).
<!ELEMENT email: It defines that the email element is #PCDATA typed. (parse-able data
type).
XML DTD with entity declaration
A doctype declaration can also define special strings that can be used in the XML file.
An entity has three parts:
1. An ampersand (&)
2. An entity name
3. A semicolon (;)
Syntax to declare entity:
1. <!ENTITY entity-name "entity-value">
Let's see a code to define the ENTITY in doctype declaration.
author.xml
1. <?xml version="1.0" standalone="yes" ?>
2. <!DOCTYPE author [
3. <!ELEMENT author (#PCDATA)>
4. <!ENTITY sj "Sonoo Jaiswal">
5. ]>
6. <author>&sj;</author>
XML Schema
What is XML schema
XML schema is a language which is used for expressing constraint about XML documents.
There are so many schema languages which are used now a days for example Relax- NG
and XSD (XML schema definition).
An XML schema is used to define the structure of an XML document. It is like DTD but
provides more control on XML structure.
Checking Validation
An XML document is called "well-formed" if it contains the correct syntax. A well-formed
and valid XML document is one which have been validated against Schema.
XML Schema Example
Let's create a schema file.
employee.xsd
1. <?xml version="1.0"?>
2. <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
3. targetNamespace="http://www.javatpoint.com"
4. xmlns="http://www.javatpoint.com"
5. elementFormDefault="qualified">
6.
7. <xs:element name="employee">
8. <xs:complexType>
9. <xs:sequence>
10. <xs:element name="firstname" type="xs:string"/>
11. <xs:element name="lastname" type="xs:string"/>
12. <xs:element name="email" type="xs:string"/>
13. </xs:sequence>
14. </xs:complexType>
15. </xs:element>
16.
17. </xs:schema>
Let's see the xml file using XML schema or XSD file.
employee.xml
1. <?xml version="1.0"?>
2. <employee
3. xmlns="http://www.javatpoint.com"
4. xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
5. xsi:schemaLocation="http://www.javatpoint.com employee.xsd">
6.
7. <firstname>vimal</firstname>
8. <lastname>jaiswal</lastname>
9. <email>vimal@javatpoint.com</email>
10. </employee>
Description of XML Schema
<xs:element name="employee"> : It defines the element name employee.
<xs:complexType> : It defines that the element 'employee' is complex type.
<xs:sequence> : It defines that the complex type is a sequence of elements.
<xs:element name="firstname" type="xs:string"/> : It defines that the element
'firstname' is of string/text type.
<xs:element name="lastname" type="xs:string"/> : It defines that the element
'lastname' is of string/text type.
<xs:element name="email" type="xs:string"/> : It defines that the element 'email' is of
string/text type.
XML Schema Data types
There are two types of data types in XML schema.
1. simpleType
2. complexType
simpleType
The simpleType allows you to have text-based elements. It contains less attributes, child
elements, and cannot be left empty.
complexType
The complexType allows you to hold multiple attributes and elements. It can contain
additional sub elements and can be left empty.
Querying XML data
his lesson shows you how to query XML data by using SQL, XQuery (with XQuery expressions),
or a combination of both.
If you use only SQL, you can query only at the column level. That is, you can return an entire XML
document stored in the column, but you cannot query within the document or return fragments of
the document. To query values within an XML document or return fragments of a document, you
must use XQuery.
The queries in this lesson use XQuery in an SQL context and SQL in an XQuery context.
Important: XQuery is case sensitive, but SQL is not. Therefore, when using XQuery, carefully specify
names such as table and SQL schema names, which are both uppercase by default. Even in an SQL context,
XQuery expressions remain case sensitive.
Querying in an SQL context
Retrieving entire XML documents
To retrieve all of the XML documents stored in the column named INFO and values from the CID
primary key column, issue the following SELECT statement:
SELECT cid, info FROM customer~
This query returns the two stored XML documents.
Retrieving and filtering XML values
To query within the XML documents in the INFO column, issue the following SELECT statement,
which uses the XMLQUERY function to invoke an XQuery expression:
SELECT XMLQUERY (
'declare default element namespace "http://posample.org";
for $d in $doc/customerinfo
return <out>{$d/name}</out>'
passing INFO as "doc")
FROM Customer as c
WHERE XMLEXISTS ('declare default element namespace "http://posample.org";
$i/customerinfo/addr[city="Toronto"]' passing c.INFO as "i")~
In the XMLQUERY function, a default namespace is first specified. This namespace
matches the namespace of the documents previously inserted. The for clause specifies
iteration through the <customerinfo> elements in each document from the Info column. The
INFO column is specified by using the passing clause, which binds the INFO column to the
variable named doc that is referenced in the for clause. The return clause then constructs
an <out> element, which contains the <name> element from each iteration of the for clause.
The WHERE clause uses the XMLEXISTS predicate to consider only a subset of the
documents in the Info column. This filtering yields only those documents that have a <city>
element (along the path specified) with a value of Toronto.
The SELECT statement returns the following constructed element:
<out xmlns="http://posample.org"><name>Kathy Smith</name></out>
Using db2-fn:sqlquery with parameters
To pass a value to the SQL fullselect in the db2-fn:sqlquery function, run the following
query:
VALUES XMLQUERY (
'declare default element namespace "http://posample.org";
for $d in db2-fn:sqlquery(
''SELECT INFO FROM CUSTOMER WHERE Cid = parameter(1)'',
$testval)/customerinfo
return <out>{$d/name}</out>'
passing 1000 as "testval" )~
The XMLQUERY function passes the value 1000 to the XQuery expression by using the
identifier testval. The XQuery expression then passes the value to the db2-
fn:sqlquery function by using the PARAMETER scalar function.
The XQuery expression returns the following constructed element:
<out xmlns="http://posample.org">
<name>Kathy Smith</name>
</out>
Querying in an XQuery context
DB2® XQuery offers two built-in functions specifically for use with DB2 databases: db2-
fn:sqlquery and db2-fn:xmlcolumn. db2-fn:sqlquery retrieves a sequence that is the result table of
an SQL fullselect. db2-fn:xmlcolumn retrieves a sequence from an XML column.
If your query invokes an XQuery expression directly, you must prefix it with the case-insensitive
keyword XQUERY.
Note: There are several options that you can set to customize your command-line processor environment,
particularly for displaying the results of an XQuery expression. For example, set the -i option to make the
results from XQuery expressions easier to read, as follows:
UPDATE COMMAND OPTIONS USING i ON~
Retrieving entire XML documents
To retrieve all of the XML documents previously inserted into the INFO column, you can use
XQuery with either db2-fn:xmlcolumn or db2-fn:sqlquery.
Using db2-fn:xmlcolumn
To retrieve all XML documents in the INFO column, run the following query:
XQUERY db2-fn:xmlcolumn ('CUSTOMER.INFO')~
Names in SQL statements are automatically converted to uppercase by default. Therefore,
when you created the CUSTOMER table by using the CREATE TABLE SQL statement,
the names of the table and columns were made uppercase. Because XQuery is case
sensitive, you must be careful to use the correct case when specifying the table and column
names when using db2-fn:xmlcolumn.
This query is equivalent to the SQL query SELECT Info FROM Customer.
Using db2-fn:sqlquery
To retrieve all XML documents in the INFO column, run the following query:
XQUERY db2-fn:sqlquery ('SELECT Info FROM Customer')~
You do not have to specify the INFO and CUSTOMER names in uppercase because the
SELECT statement is processed in an SQL context and is therefore not case sensitive.
Retrieving partial XML documents
Instead of retrieving an entire XML document, you can retrieve fragments of the document and
filter on values present in the document by using XQuery with either db2-fn:xmlcolumn or db2-
fn:sqlquery.
Using db2-fn:xmlcolumn
To return elements containing <name> nodes for all documents in the Info column that have a
<city> element (along the path specified) with a value of Toronto, run the following query:
XQUERY declare default element namespace "http://posample.org";
for $d in db2-fn:xmlcolumn('CUSTOMER.INFO')/customerinfo
where $d/addr/city="Toronto"
return <out>{$d/name}</out>~
The db2-fn:xmlcolumn function retrieves a sequence from the INFO column of the CUSTOMER
table. The for clause binds the variable $d to each <customerinfo> element in the
CUSTOMER.INFO column. The where clause restricts the items to only those that have a <city>
element (along the path specified) with a value of Toronto. The return clause constructs the
returned XML value. This value is an element <out> that contains the <name> element for all
documents that satisfy the condition specified in the where clause, as follows:
<out xmlns="http://posample.org">
<name>
Kathy Smith
</name>
</out>
Using db2-fn:sqlquery
To issue a fullselect within an XQuery expression, run the following query:
XQUERY declare default element namespace "http://posample.org";
for $d in db2-fn:sqlquery(
'SELECT INFO
FROM CUSTOMER
WHERE Cid < 2000')/customerinfo
where $d/addr/city="Toronto"
return <out>{$d/name}</out>~
In this example, the set of XML documents being queried is first restricted, in the fullselect,
by particular values in the non-XML CID column. This example demonstrates an advantage
of db2-fn:sqlquery: it allows SQL predicates to be applied within an XQuery expression.
The documents that result from the SQL query are then further restricted in
the where clause of the XQuery expression to those documents that have a <city> element
(along the path specified) with a value of Toronto.
The query yields the same results as in the previous example, which used db2-fn:xmlcolumn:
<out xmlns="http://posample.org">
<name>
Kathy Smith
</name>
</out>
Using db2-fn:sqlquery with parameters
To pass a value to the SQL fullselect in the db2-fn:sqlquery function, run the following
query:
XQUERY declare default element namespace "http://posample.org";
let $testval := 1000
for $d in db2-fn:sqlquery(
'SELECT INFO FROM CUSTOMER WHERE Cid = parameter(1)',
$testval)/customerinfo
return <out>{$d/name}</out>~
In the XQuery expression, the let clause sets the value of $testval to 1000. In the for clause,
the expression then passes the value to the db2-fn:sqlquery function using the
PARAMETER scalar function.
The XQuery expression returns the following constructed element:
<out xmlns="http://posample.org">
<name>Kathy Smith</name>
</out>
What is XHTML
XHTML stands for EXtensible HyperText Markup Language. It is a cross between HTML
and XML language.
XHTML is almost identical to HTML but it is stricter than HTML. XHTML is HTML defined as
an XML application. It is supported by all major browsers.
Although XHTML is almost the same as HTML but It is more important to create your code
correctly, because XHTML is stricter than HTML in syntax and case sensitivity. XHTML
documents are well-formed and parsed using standard XML parsers, unlike HTML, which
requires a lenient HTML-specific parser.
History
XHTML 1.0 became a World Wide Web Consortium (W3C) Recommendation on January
26, 2000. XHTML 1.1 became a W3C Recommendation on May 31, 2001. The standard
known as XHTML5 is being developed as an XML adaptation of the HTML5 specification.
Why use XHTML
XHTML was developed to make HTML more extensible and increase interoperability with
other data formats. There are two main reasons behind the creation of XHTML:
o It creates a stricter standard for making web pages, reducing incompatibilities
between browsers. So it is compatible for all major browsers.
o It creates a standard that can be used on a variety of different devices without
changes.
Let's take an example to understand it.
HTML is mainly used to create web pages but we can see that many pages on the internet contain
"bad" HTML (not follow the HTML rule).
This HTML code works fine in most browsers (even if it does not follow the HTML rules).
For example:
1. <html>
2. <head>
3. <title>This is an example of bad HTML</title>
4. <body>
5. <h1>Bad HTML
6. <p>This is a paragraph
7. </body>
The above HTML code doesn't follow the HTML rule although it runs. Now a day, there are
different browser technologies. Some browsers run on computers, and some browsers run
on mobile phones or other small devices. The main issue with the bad HTML is that it can't
be interpreted by smaller devices.
o, XHTML is introduced to combine the strengths of HTML and XML.
XHTML is HTML redesigned as XML. It helps you to create better formatted code on your
site.
XHTML doesn't facilitate you to make badly formed code to be XHTML compatible. Unlike
with HTML (where simple errors (like missing out a closing tag) are ignored by the
browser), XHTML code must be exactly how it is specified to be.
The course element contains sub elements course id, title, dept name, and credits (in that
order).
Similarly, department and instructor have the attributes of their
relational schema defined as sub elements in the DTD. Finally, the elements course id,
title, dept name, credits, building, budget, IID, name, and salary are all declared to be of
type #PCDATA.
The keyword #PCDATA indicates text data; it derives its name,
historically, from “parsed character data”.
Two other special type declarations are empty, which says that the
element has no contents, and any, which says that there is no constraint on the sub
elements of the element; that is, any elements, even those not mentioned in the DTD,
can occur as sub elements of the element. The absence of a declaration for an element is
equivalent to explicitly declaring the type as any.