Relationship Between XML and databases

Source: Internet
Author: User
Tags cdata xsl
1.0 Introduction
This paper briefly discusses the relationship between XML and the database, and lists some software that can use the database to process XML documents. Although I am not going to introduce these software in detail here, I hope it can describe the main part of the XML document for processing data using databases. This is a bit biased towards relational databases because of my experience.
2.0 is XML a database?
Before we begin to discuss XML and databases, we need to answer a lot of questions: "Is XML a database? "Strictly speaking, if" XML "is an XML document, the answer is" no ". Although the XML document contains data, if there is no other software to process the data, it has no difference in the meaning of the database and other text files.
In a broader sense, when "XML refers to XML documents and all related XML tools and technologies, the answer is" yes ". The reason must be that XML provides a lot of database needs: storage (XML document), structure (DTD, XML schema language), query language (xql, XML-QL, quilt, etc ), programming interfaces (SAX, DOM), and so on. However,... XML still lacks a lot of necessary content in real databases: effective storage, indexing, security, transactions, data integrity, multi-user access, triggering, and multi-document query.
Therefore, XML can be used as a database in an environment with a moderate data volume, a small number of users, and low performance requirements. In most product environments, XML cannot be used because many users require strict data integrity and high performance requirements. In addition, given that databases such as dbase and access are both cheap and easy to use, XML is rarely used as a database role in the first case.
3.0 why database?
When considering using XML and databases, the first question to ask myself should be: Why do I need to use databases? Do you need to export the original data? Do you need to save your Web homepage? Do you want to use a database in an e-commerce application, and XML is used as the data format for transmission? The answers to these questions will directly affect your selection of databases and middleware (if any.
For example, assume that you use XML for data transmission in e-commerce applications. This is a good solution, because your data has a highly standard structure, and the entities and codes in XML are not important to you. After all, you only care about data, not how the data is physically stored in documents. If your applications are relatively simple, relational databases and data transmission middleware can meet your needs. If the applications are large and complex, then you need a development environment that fully supports XML.
On the other hand, suppose you have a website created from a scattered XML file. You not only need to manage this website, but also provide methods for users to query the content. At this time, your files will be very nonstandard, and the use of entities will become very important to you, because the structure of these files is the foundation of the website. In this example, you need a type of "Native XML" database to execute versioning, track object usage, and support query languages such as xql.
4.0 comparison of data and documents
The author believes that when selecting a database, the most important judgment factor may be whether you use the database to store data or save documents. If you want to save data, the database you need is mainly for data storage (such as relational databases or object-oriented databases) and conversion between databases and XML documents. From another perspective, if you want to store documents, you need a content management system specifically designed to store files.
Although you can save files in relational databases or object-oriented databases, you often find that your job is to repeat the features of the content management system. Similarly, although a content management system is usually built on an object-oriented database or relational database, it may be difficult to use a content management system as a database.
You need to store data or documents. The answer is often dependent on your XML document. The reason is that XML files are divided into two categories: data-centric and document-centric ..
4.1 data-centric files
Data-centric files are characterized by relatively standardized structures, good data granularity (that is, the smallest independent unit in the data is the pcdata element or attribute), and few or no mixed content. The sequence of appearance of the same level elements and pcdata is not important. A typical example is that XML documents include sales orders, flight schedules, restaurant menus, and so on. Data-centric documents are often used by machines. xml may be redundant-it is only a means of data transmission.
For example, the following sales order document is data-centric:
ABC Industries
123 Main St.
Chicago
Il
60609
981215
Turkey wrench:
Stainless steel, one-piece construction,
Lifetime guarantee.
9.95

10
Stuffing separator:
Aluminum, one-year guarantee.
13.27
5
In the XML world, many documents with rich content are actually data-centric. Take the Amazon.com website that displays the library information as an example. Although this page is a huge text, the structure of this text is highly standard, many of which are the same for any book description pages, in addition, the size of each part of the page is limited. That is to say, this page can be created through a simple, data-centric XML document that contains the text information retrieved from the database and an XSL style table. Generally, currently, any website that dynamically constructs HTML pages by filling in database data in the template can be replaced by the data-centric XML document and one or more XSL style sheets described above..
ABC industries agrees to lease the property
123 Main St., Chicago, il from XYZ
Properties for a term of not less than timeunit = "months"> 18 at a cost of currency = "USD" timeunit = "months"> 1000
You can obtain the following XML documents and simple style sheets:
ABC Industries
123 Main St., Chicago, il
XYZ Properties
18
1000
4.2 document-centric files
Document-centric documents are characterized by nonstandard structures and greater data granularity (that is, the smallest independent data unit is an element containing mixed content or the entire XML document) and a large amount of mixed content. The order of the appearance of elements and pcdata at the same level is very important. Typical examples are books, emails, advertisements, and most XHTML documents. Document-centric documents are used by people.
For example, the following product description document is document-centric:
Turkey wrench

Full fabrication labs, Inc.

Like a monkey wrench, but not as big.

The turkey wrench, which comes in both right-and
Left-handed versions (skyhook optional), is made of the finest
Stainless steel. the READI-grip rubberized handle quickly adapts
To your hands, even in the greasiest situations. adjustment is
Possible through a variety of M dials.

You can:


Order your own turkey wrench
Read more about wrenches
Download the catalog


The turkey wrench costs just $19.99 and, if you
Order now, comes with a hand-crafted shrimp hammer as
Bonus gift.
4.3 data, documents, and databases
In reality, the differences between data-centric and document-centric files are not very strict. For example, a data-centric file (such as an invoice) may also contain coarse-grained and irregular data (such as the description of the invoice ). A document-centric file (such as a user manual) may also contain structured data (usually metadata) with good granularity and rules, such as the author and revision date. In addition, making your documents data-centric or document-centric helps you determine whether you are concerned about data or documents, which also determines what system you need to adopt.
To store or retrieve data, you can use a database (usually relational, object-oriented, or hierarchical) and middleware (word band or third-party ), you can also use an XML server (that is, to create a distributed application platform, such as an e-commerce application that uses XML for data transmission ). To save the document, you will need a content management system or a consistent Dom implementation system. Discussion on various systems in Section 5.0
Section 6.0 "store and retrieve data" and section"

Href = "# storingretrievingdocs"> store and retrieve documents "section. You can also

Href = "http://www.rpbourret.com/xml/XMLDatabaseProds.htm">

For a detailed list of related products, see XML database.

5.0 store and retrieve data
The data content in a data-centered document may come from the database (in this case, you want to export the data in XML format ), it may also be an XML document (in this case, you want to store the data in the database ). The former is an example of storing a large amount of existing data (or legacy data) in a relational database; the latter is an example of publishing data as XML on the web, in addition, you want to store data in your database for more processing. In this way, you may need to transfer the XML document to the database software, or transfer the XML document software from the database, or both.
5.1 Transfer Data
When storing data in a database, you often need to discard a large amount of information related to the document, such as the document name and DTD, as well as its physical structure, for example, object definition and usage, attribute values, sequence of elements at the same layer, and storage of binary data (base64 encoding, unanalyzed entity, or other methods), character data segment, and Other encoding information. Similarly, when retrieving data from a database, the generated XML document results except for non-predefined entities LT (<"), GT ("> "), AMP ("&"), APOs ("'"), quot (") does not contain any CDATA or entity reference. The order in which the elements and attributes of the same layer appear is usually the order of the data returned from the database.
Although it surprised you At the beginning, it is often reasonable. For example, assume that you need to use XML as the data format to transfer a sale from one database to another. In this case, the XML document does not care whether the sales order number is saved before or after the date of the sales order, or whether the customer name is stored in the character data (CDATA) segment is still an external entity, or directly a pcdata. The most important thing is that the relevant data is transferred from the first database to the second database. In this way, the data transmission software needs to consider the data hierarchy (this structure groups the related sales orders), while others do not have to worry too much.
One of the consequences of ignoring the document information and its physical structure is the inconsistent effect of the document's "inverse Regression", which stores the data of a document in the database, based on the data, organize the new document again. Even according to the standard format, the results are often different from the previous documents. Whether this is acceptable depends on your needs and affects your choice of database and data transmission middleware.
5.2

Ing from document structure to database structure
To transmit data between XML and databases, the ing between the document structure and database structure is required. Such mappings are generally divided into two categories: Template-driven and mode-driven.
5.2.1 template-driven ing
In the template-driven ing, no mappings between the document structure and the database structure are pre-defined.

Instead, use the method of embedding the template in the command statement to let the data transmission middleware process the template. For example, consider the following template (note that this template does not apply to any actual product), and the SELECT statement is embedded in the <selectstmt> element:

<? XML version = "1.0"?>
<Flightinfo>
<Intro> The following flights have available seats: </intro>
<Selectstmt> select airline, fltnumber, depart, arrive from flights </selectstmt>
<Conclude> we hope one of these meets your needs </conclude>
</Flightinfo>
When the data transmission middleware processes this document, each select statement will be replaced by their respective execution results. The following XML format is obtained:

<? XML version = "1.0"?>
<Flightinfo>
<Intro> The following flights have available seats: </intro>
<Flights>
<Row>
<Airline> Acme </airline>
<Fltnumber> 123 </fltnumber>
<Depart> dec 12,199 </depart>
<Arrive> dec 13,199 8 </arrive>
</Row>
...
</Flights>
<Conclude> we hope one of these meets your needs </conclude>
</Flightinfo>

This template-driven ing can be quite flexible. For example, some products allow you to replace the content you want in any result set (including using parameters in select), rather than simply formatting the result as in the preceding example. It also supports programming for construction, such as loop and condition judgment structure. There are also some parameters that support the SELECT statement, such as passing parameters through HTTP.
Currently, template-driven ing only supports conversion from a relational database to an XML document.
5.2.2 model-driven ing
In model-driven ing, the data model corresponding to the XML document structure is explicitly or implicitly mapped to the database structure, and vice versa. Its disadvantage is that it is not flexible enough, but easy to use. This is because it is mapped based on a specific data model and can usually achieve a lot of conversion tasks for users. Because the results of converting data from a database to XML follow a single model,

Therefore, in this mode, the flexibility of the template-driven system is usually provided in combination with XSL.
Data views in XML documents generally have two models: Table model and specific data object model. Sometimes there may be other models. For example, by using the ID and idref attributes, an XML document can be used for a specified image. However, many existing middleware products do not support these models.
5.2.2.1 table model
Many middleware software packages use table models to convert between XML and relational databases. It regards the XML model as a separate table or a series of tables. That is to say, the structure of the XML document is similar to the following example. In the case of a single table, <database> does not appear:

<Database>
<Table>
<Row>
<Column1>... </column1>
<Column2>... </column2>
...
</Row>
...
</Table>
...
</Database>

The term "table" can be understood as a single result set (when converting data from a database to XML ), or a separate table or updatable view (when converting data from XML to the database ). If the data needs to come from multiple result sets (when the data comes from the database) or compared with a set of tables (when the data is converted to the database, XML documents contain deeper nested elements, so similar conversions are almost impossible.

5.2.2.2 Specific Data Object Model

The second Common Data Model in XML documents is the tree structure of specific data objects. In this model, element types usually correspond to objects, while content models, attributes, and pcdata in XML correspond to the attributes of objects. This model is directly mapped to object-oriented databases and hierarchical databases. Of course, with the help of the traditional Object-relationship ing technology and SQL

3. The object view can also be mapped to a relational database. Note that this model is not a Document Object Model (DOM ). Dom models the document rather than the data in the document. For example

Href = "# writeyourown"> as described in section 6.1.2, Dom is used to establish a content management system based on relational databases.
For example, the above sales order document can be seen as a tree structure composed of five classes. See the following view, including orders, salesorder, customer,

Line and part class:

Orders
|
Salesorder
/| Customer line
|
Part
When an XML document is modeled as a specific data object tree, there is no need to require elements to correspond to objects. For example, if an element only contains pcdata, such as the custname element in the sales order document, it can be processed as an attribute, so the attribute only contains a single, scalar value. Similarly, it is useful to model Mixed elements or element content into attributes. A ready-made example is the processing of the description element in the sales order document: although it contains mixed content in the XHTML format, it is more useful to regard the description element as a single attribute, because its components are meaningless.

5.3 Data Type, null value, Character Set, and others
This section describes how to store XML documents from databases. Generally, you cannot decide how the middleware you choose solves these problems, but you should be aware of the existence of these problems, because it helps you select your middleware correctly.
5.3.1 Data Type
XML does not support any meaningful data types. Except for non-analytic entities, data in all XML documents is treated as text, even if it can be represented by other data types (such as dates or integers. Generally, data conversion middleware converts text in XML documents to Data Types in other databases, and vice versa. However, the text formats recognized by specific data types are restricted, such as the JDBC

Restrictions on the Data Types supported by the driver. Among these many data types, the date type usually causes trouble. Differences in digital formats may also cause problems in different international regions.
5.3.2 binary data
There are usually two ways to save binary data to an XML document: unparsed entity and base64 encoding (a mime encoding method that maps binary data to a subset of a US-ASCII ).

For relational databases, both methods may have problems, because the rules for saving and retrieving binary data from the database are very strict, which may lead to middleware problems.
In addition, there is no standard symbol to indicate that an element in an XML document contains base64 encoded data, so that the middleware may not be able to recognize this encoding at all. Finally, when storing data to the database, symbols related to unparsed entities or base64 encoded elements may be ignored. Therefore, if binary data is very important to you, make sure that your middleware supports binary data.
5.3.3 Null Value
In the database World, NULL data means that the data does not exist. However, this is very different from a number with a value of 0 or a string with a length of 0. For example, assume that your data comes from a weather station,

If a weather station thermometer is faulty and cannot read the temperature value, a null value instead of a 0 value will be stored in your database. Obviously, a value of 0 is totally different.
You can set optional element types or attributes to support the XML empty value concept. If the element type or attribute value is null, XML only needs to include this element or attribute in the document. However, for databases, null elements or attributes containing 0-length strings are not null: their values are strings with a length of 0.
When ing between XML documents and database structures, you must pay special attention to whether the optional element types or attributes correspond to null values in the database. If this is not done, insertion errors may occur (when data is converted to the database) or invalid document errors (when data is read from the database ).

Because the symbol null value is also used, XML is more flexible than the database. Specifically, many XML users are likely to include null elements or attributes of null strings. At this time, you must consider how to select the appropriate middleware to solve this problem. Some middleware allows users to choose what to define in the XML document to make up null values.
5.3.4 Character Set

According to the definition, apart from some control characters, XML documents can contain any Unicode characters. Unfortunately, many databases limit or do not support Unicode, and special configurations are required to process non-ASCII character data. If your data contains non-ASCII characters, check whether your database and middleware can process these characters.
5.3.5 processing commands

Processing commands are not part of the "data" section in the XML document. Therefore, many middleware may not be able to process the commands normally at present. The problem is that, especially when the XML document structure is strictly mapped to a database structure, processing commands are usually difficult to handle because they can appear virtually anywhere in the document. Therefore, it is difficult for middleware to determine the location where they are saved and when they are retrieved and read. If processing commands and circular reply ("Round-tripping") of documents is very important to you, check that your middleware solves this problem.

5.3.6 storage tag

In

Href = "# markup"> 4

Href = "# markup">. 2.2

As mentioned in the section, it is very useful to save elements that contain elements or mixed content directly to the database without further parsing. The most common method is to simply save the tag itself to the database. Unfortunately, when retrieving data from the database, it is impossible to determine whether the mark in the database is a real mark or an entity that represents the mark character, for example, characters escaped by LT and GT.

For example, the following description element:
 
<Description>
<B> confusing example: </B> <Foo/>
</Description>

Stored in the database:

<B> confusing example: </B> <Foo/>

In this case, the database cannot determine whether <B> and <Foo> are tags or texts. There are several possible solutions, such as marking in a certain way or using entities for non-marking characters. However, pay special attention to whether this method is compatible with other applications that use the data. For example, if you want to query

Pay special attention to entity ("<.

5.4 generate a DTD from the database structure and its reciprocal process

When converting data between an XML document and a database, a common problem is: how to generate an xml dtd from the database schema, if the DTD from the XML generates a database structure. In short, this is a very direct operation, but the results are usually far from the expectations of many users.

(Note that this is usually a one-time operation, and most applications, especially all vertical applications, combine a set of known DTD and relational schema. An obvious exception is the tool for storing random XML documents in relational databases or publishing relational data as XML documents. In the following cases, the role of DTD is not obvious .)

For each element type that has a single value attribute and contains only pcdata content

Ble creates a new column (field ). If the child element type or attribute is optional, this field can be blank.

Create a separate element type for each attribute with multiple values or multiple child element types containing only pcdata.

Table to save their values, and connect to the parent table through the primary keyword of their parent table.

For each child element, these child elements also have elements or mixed content. Use the keywords in the parent table

The parent element table is connected to the child element table.

The following is a process for generating XML documents from the structure of a relational database (simplified ):

Create an element for each table.

Create an attribute for each column in the table or a child element containing only pcdata

Create a child element for each column that contains the primary key value in the primary key/foreign key keyword relationship.
For example, the following process (simplified) illustrates how to generate a relational structure from a DTD:

For each element type that contains elements or mixed content, create a table and a primary key field.

For each element type that contains the mixed content, create a separate table to store the data that has not been parsed. Use the primary key of the parent element to link to the parent table.

Create a field in this table for each single-Value Attribute of this element type and child element that only contains the data that has not been parsed and appears once. If the element type or attribute is optional, you can set this field to a null value.

For each multi-value attribute and child element that appears multiple times, create a separate table to store the value and link it to the parent table through the primary key of the parent element.

For each child element with element or mixed content, the parent element table and the child element table are connected through the primary key of the parent element.

The following process (simplified) illustrates how to generate a DTD from a relational structure:

Create an element for each table;

For each field in the table, create a new attribute or a child element that only contains data not analyzed;

Create a child element for the relationship between the primary key and the foreign key in each table field.

Unfortunately, there are still some defects in these processes. For example, there is no way in DTD to specify the data type or field length accurately in advance.

This is because any pre-defined document (for example, reading a sample document) may cause errors when reading other "type" documents or other documents that contain documents that exceed the length of words. (The long-term strategy is to use the data type of the XML schema document .) Simply put, when a DTD is generated from a relational structure, it is impossible to determine in advance whether the sequence or field (such as the row ID inside the database) of the child element "should" appear should be fully converted.

In both cases, naming conflicts may occur.

Despite such flaws, these methods can still lay a good foundation for the conversion between relational structures and DTD.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.