The overall analysis of database based on XML

Source: Internet
Author: User
Tags object character set contains file system key object model version xsl
xml| Data | Database We know that when there's a lot of data that needs to be processed, it's best to put that data into the database, so almost all the big business applications are associated with the database, so if XML needs to be in business, it must be linked to the database. So the first thing to discuss here is that XML itself is not a database, and in a strict sense, XML simply means XML documents. Because although an XML document contains data, it is simply a text file if it is not processed by other software software. So XML itself can not hook up with the database, but with some other auxiliary tools, we can think of the entire XML as a database system, XML text itself may be considered as the data area in the database, DTD or schemas can be considered as a database schema design, XQL can be viewed as a database query language, and sax or DOM can be viewed as a database processing tool. Of course, it still lacks some of the things the database needs, such as effective storage organization, index structure, security, transaction processing, data integrity, triggers, multi-user processing mechanisms, and so on.

But why do you associate XML with a database? Give an example to illustrate this problem, for example, you have an e-business application that needs to use XML for data transfer. What you care about is the structure that the data itself should have, and you don't care about the actual storage structure of it in the document. If your application is simple, the basic file system will meet your needs, but if the application itself is complex, you need a complete development application environment to support XML. On the other hand, suppose you have a Web site whose content is made up of a series of XML documents, not only to manage the site, but also to provide users with a mechanism to search the content of the site. And these all need to use the database to realize. The most important factor in choosing a database is whether you need a database to store data or documents, and if you want to store data, you need a relational database or an object database to store the actual data, and you need the middleware to bridge the relationship between the database and the XML document, on the other hand, If you want to store documents, you need a content management system through which to store documents. In fact, XML documents can be divided into two broad categories: data-centric or document-centric.

Data-centric documents: data-centric documents have very regular results, such as an XML document on a sales order or a restaurant menu. Data-centric documents are usually designed for machines, which means they are mostly convenient for machines to handle. Typically, any Web site can dynamically build HTML documents, with the following steps: Finding the relevant data-oriented XML document based on the user's query request, and then transforming the XML document through XSL, allowing the Html-based browser to browse the results easily.

Document-centric Documents: Document-centric documents have an irregular structure, and the granularity of the data is larger. Specific examples are books, emails, advertisements, and so on. Document-centric documents are primarily designed for human use.

To store or extract data, you can use databases and middleware, or you can use an XML server, or an xml-based Web server. To store documents, you need a content management system or a sustainable DOM implementation. A large number of data-centric documents can be found in a database or in an XML document. So we need tools to convert data from a database into an XML document, or to transform an XML document into a database. Also note that when storing data in a database, you need to discard a lot of information about a document, such as its name and DTD, its physical structure, such as the entity definition and use, the placement of elements under one node, the way in which binary data is stored, and so on. Similarly, when extracting data from a database, the resulting XML document usually does not contain CDATA or the description used by the entity, and the placement of the elements under the node is only the same as the order of the records in the database. In fact, an XML document is stored in a database and then generated by the database, and the two document formats are almost impossible to be exactly the same.

To pass data between a database and an XML document, you must establish a mapping between the document structure and the database structure, which can have two categories: template-driven and model-driven.

1. Template-driven mapping: the need to embed commands in a template and handle them with data transfer middleware. For example, consider the following template:

<?xml version= "1.0"? > >
<FlightInfo>
<Intro> The following flights have available seats: </Intro>
<SelectStmt> SELECT Airline, Fltnumber, Depart, Arrive from Flights </SelectStmt>
<Conclude> We Hope one of these meets your needs </Conclude>
</FlightInfo>

Note that one of the SELECT statements is embedded. When processing with data transfer middleware, each SELECT statement is replaced by its result, which is expressed in XML format:

<?xml version= "1.0"? > >
<FlightInfo>
<Intro> The following flights have available seats: </Intro>
<Flights>
<Row>
<Airline> ACME </Airline>
<FltNumber> 123 </FltNumber>
<Depart> Dec, 1998 13:43 </Depart>
<Arrive> Dec, 1998 01:21 </Arrive>
</Row>
...
</Flights>
<Conclude> We Hope one of these meets your needs </Conclude>
</FlightInfo>

Template-driven mapping can be quite flexible, for example, some products allow you to place the result set anywhere in the XML document, and you can set parameters for the SELECT statement, and you can use for loop statements and if conditional statements, and so on. It is worth noting that the current template-driven mapping can only be used to pass data between a relational database and an XML document.

Model-driven mapping: That is, the transfer of data from a database to an XML document is implemented with a specific model, so that XSL can be incorporated into a product based on model mapping. In XML documents, two models are common: the table model and the data-specific object model (Data-specific object).

2 Tabular Model: Many middleware packages use a tabular model to pass data between XML documents and relational databases. It represents an XML document as a single table or as a collection of tables. Thus, the structure of an XML document can be expressed in the following form:

<database>
<table>
<row>
<column1> ... </column1>
<column2> ... </column2>
...
</row>
...
</table>
...
</database>

Here the keyword "Talbe", when transferring data from a database to an XML document, represents a single result set that represents a single table or view when the data is passed from an XML document to a database. However, when the result collection is more than one, or when the XML document contains multiple complex nesting, this transfer method can not adapt.

2 data-specific object model: An XML document is represented as a tree of data objects, and each element type corresponds to an object. It is mainly used in object-oriented and hierarchical database, and can be mapped into relational database through traditional relation-object model. Note that this model is not a Document Object Model (DOM). For example, a sales order document can be viewed as an object tree, which includes five classes: Orders, SalesOrder, Customer, line, and part.

When an XML document is viewed as a data-centric object tree, the element does not necessarily correspond to the object, for example, an element that contains only pcdata, which can be treated as an attribute, which includes a single, scalar value.

In fact, when data is transformed between XML and database, two procedures need to be considered: one is to generate a DTD from the database schema and the other is to generate a database schema from the DTD

The steps to generate a relational pattern from a DTD are as follows:

1. For each element, produce a table and a primary key column.

2. For each element that has mixed content, a separate table is created to store the Pcdata, which is linked through the parent table's primary key and parent table.

3. For each single value attribute in the element type, a separate column is generated for a child element with only pcdata content (the child element appears sequentially), and the column should be allowed to be a null type if the child element type or value is selectable.

4. For attributes with multiple values and child elements that can occur multiple times (the child element Pcdata), you need to create a separate table to store the values, which are associated with the parent table's primary key and parent table.

5. For each child element that contains elements or mixed content, the parent element and child elements are joined by the primary key of the parent table.

The steps to build a DTD from a relational database schema are as follows:

1. For each table, create an element.

2. For each column in the table, create a property or a child element that has only pcdata content.

3. Create child elements of the table element based on each primary key/foreign key relationship in the table.

   the classification of database products based on XML

The XML database contains seven types of products, based on the description of Ronald Bourret in the XML database product, respectively:

2Middleware(middleware)

1. XML-driven databases (xml-enabled Databases), such as Oralce and Microsoft, claim to be able to seamlessly interface with XML in their most recent database products.

2, the original XML database (Native XML databases)

3, XML server (XML Servers)

4, XML application server, such as IBM's WebSphere

5. Content Management System (contents Management Systems)

2 A Persistent DOM implementation (persistent DOM implementations)

Below we make a specific description and introduction to each kind of product

Middleware: The so-called middleware is the software used to process and transform XML documents and databases. Mainly used in data-centric applications, it can be written in a variety of languages, generally speaking, it needs to use ODBC, JDBC, or OLE DB. Although it can be transmitted over the Internet as data, it is generally implemented through a Web server to transmit data.

Here we need to consider how to choose the middleware that is appropriate for your application when storing XML documents in a database.

In fact, when we choose middleware, we have to consider some of the following factors

1. Data type: XML does not support data types, that is, all data in an XML document is text, even if the data itself represents another type of data, such as a date or an integer. In general, data transfer middleware will convert to other types.

2. Binary data processing: There are two common methods for storing binary in XML documents: Unresolved entities (unparsed entities) and Base64 encodings.

3. Null type handling: In the relational database world, NULL indicates that the data does not exist, and it is certainly not the same as a 0 or an empty string. Of course, XML also supports the concept of NULL. If a selectable element type or attribute is null, it is not included in the document. When mapping an XML document's structure to a database or generating an XML document from the database content, you need to consider the mapping between the optional element type and the attribute and the nullable column.

4. Character set: An XML document can contain any Unicode characters, and unfortunately many databases do not support Unicode. So if your data includes non-ASCII characters, you need to be aware of the database and middleware processing of these characters.

5. On processing instructions in XML: Processing instructions are not data in an XML document, so it is difficult for the middleware to decide how to store them. So when choosing middleware, it depends on how they handle the processing instructions.

6. Tag storage: Note that the processing of tokens differs from one middleware to another. And the storage patterns in the database are different, see the following example:

<description>
<b> Confusing example: </b>


The forms stored in the database are as follows:

Confusing example:

This is mainly because the database does not recognize whether and are markup or text.

[1] [2] Next page



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.