XML and Database

Source: Internet
Author: User
Tags date format base64 empty numeric value object model string version
xml| Data | database Turkey wrench:
Stainless steel, one-piece construction,
Lifetime Guarantee.
  
9.95
  
10
  
Stuffing Separator:
Aluminum, one-year guarantee.
  
13.27
  
5

In the world of XML, many content-rich documents are actually data-centric. Let's take the Amazon.com website, which shows the book information, for example. Although this page is a fairly large text, the structure of the text is highly canonical, many of which are the same for any book description page, and the size of each part of the feature page is limited. That is, the page can be built with a simple, data-centric XML document that contains textual information retrieved from the database and an XSL stylesheet. In general, any Web site that dynamically constructs an HTML page by populating the database data in a template can be overridden by the data-centric XML document described above and one or more XSL stylesheet forms.
  
For example, let's look at the following rental (Lease) Documentation:
  
ABC Industries agrees to lease
123 Main St., Chicago, IL from XYZ
Properties for a term of the less than timeunit= "Months" >18 at a cost of currency= "USD" timeunit= "Months" >1000.
  
You can get from the following XML document and a simple stylesheet:
  
ABC Industries
123 Main St., Chicago, IL
XYZ Properties
18
1000

4.2 Document-centric files

Document-centric documents are characterized by irregular structure and greater data granularity (i.e., the smallest independent data unit is an element that contains mixed content or an entire XML document) and contains a large amount of mixed content. The order in which elements and pcdata occur at the same level is very important. Typical examples are books, e-mails, advertisements, and most XHTML documents. Document-centric documents are used for human use.

For example, the following Product description document is document centric:
  
Turkey Wrench
  
Full Fabrication Labs, Inc.
  
Like a monkey wrench, but isn't as big.
  
The turkey wrench, which comes in both Right-and
Left-handed versions (Skyhook optional), is made to the finest
Stainless steel. The Readi-grip rubberized handle quickly adapts
To your hands, even in the greasiest situations. Adjustment is
Possible through a variety of custom dials.
  
You can:
  
Order your own Turkey wrench
Read more about Wrenches
Download the Catalog
  
The turkey wrench costs just $19.99 and, if you
Order now, comes with a hand-crafted shrimp hammer as a
Bonus Gift.

4.3 data, documents, and databases

In reality, the distinction between data-centric and document-centric files is not very stringent. For example, a data-centric file (such as an invoice) may also contain coarse grained, irregular data, such as the description portion of the invoice. A document-centric file, such as a user's manual, may also contain structured data (usually metadata) with good granularity, rules, such as author and revision dates. In addition, having data-centric or document-centric features in your document can help you determine whether you care about the data or the document, which will also determine what kind of system you need to adopt.
  
To store or retrieve data, you can use a database (usually relational, object-oriented, or hierarchical) and middleware (word or third party), or you can use an XML server (a platform for creating distributed applications, such as E-commerce applications that use XML for data transmission). To save the document, you will need a content management system or a consistent DOM implementation system. The discussion of various systems in 5.0
  
"Store and retrieve data" section and 6.0 "
  
href= the "#storingretrievingdocs > Store and retrieve Documents" section. You can also be in
  
href= "Http://www.rpbourret.com/xml/XMLDatabaseProds.htm" >
  
A detailed list of related products in the XML database product.

5.0 storing and retrieving data

The data content in a data-centric document can come from a database (you want to export the data to an XML format) or an XML document (at which point you want to store the data in a database). The former example is a large amount of existing data (or legacy data) stored in a relational database, an example of which is publishing data as XML on the Web, and you want to store it in your database for more processing. So, depending on your needs, you may need to transfer the XML document to the database software, or you may need to transfer the software from the database to the XML document, or both.

5.1 Transfer data

When you store data in a database, you often need to discard a large amount of information about the document. For example, the document name and DTD, as well as their physical structure, such as the definition and use of entities, attribute values and the order of the same layer elements, binary data storage mode (is BASE64 encoding, is an unresolved entity or his way), Character data segments and other encoded information. A similar, when retrieving data from a database, the resulting XML document results in addition to the non predefined entity LT (<), the GT (">"), The Amp ("&"), apos ("'"), and quot ("") does not contain any CDATA or entity references. The order in which elements and attributes appear in the same layer is often the order in which data is returned from the database.

This is often reasonable, although it may surprise you at first. For example, suppose you need to transfer a sale from one database to another using XML as the data format. In this case, there is no concern in the XML document whether the number of the sales order is saved before or after the date of the sales order, nor does it concern the retention of the customer's name in character data (CDATA) segments or as an external entity, or directly as a pcdata. The most important thing is that the relevant data is transferred from the first database to the second database. In this way, this data transfer software needs to consider the hierarchy of the data (which groups the sales order), while others do not have to be considered too much.

One of the consequences of ignoring document information and its physical structure is
  
Inconsistent effect of the "reverse regression" of a document, where the data for a document is stored in a database and then organized into a new document based on that data. Even if processed according to the standard format, it is often the same as the previous document. Whether this is acceptable depends on your needs and will also affect your choice of database and data transfer middleware.

5.2
  
Mapping from document structure to database structure

To transfer data between XML and the database, you need to map the document structure to the database structure. Such mappings usually fall into two broad categories: template-driven and schema-driven.

5.2.1 Template-Driven mappings

In a template-driven mapping, there is no predefined mapping relationship between the document structure and the database structure
  
Instead, it uses the method of embedding a template inside a command statement to allow data transfer middleware to process the template. For example, consider the following template (note that the template does not apply to any actual product), and a SELECT statement is embedded in the <SelectStmt> element:
  
<?xml version= "1.0"?>
<FlightInfo>
<intro>the following flights have available seats:</intro>
<selectstmt>select Airline, Fltnumber, Depart, Arrive from flights</selectstmt>
<conclude>we Hope one of these meets your needs</conclude>
</FlightInfo>

When the data transfer middleware is processed to the document, each SELECT statement is replaced by the respective execution results, and the following XML format is obtained:
  
<?xml version= "1.0"?>
<FlightInfo>
<intro>the following flights have available seats:</intro>
<Flights>
<Row>
<Airline>ACME</Airline>
<FltNumber>123</FltNumber>
<depart>dec, 1998 13:43</depart>
<arrive>dec, 1998 01:21</arrive>
</Row>
...
</Flights>
<conclude>we Hope one of these meets your needs</conclude>
</FlightInfo>
  
This template-driven mapping can be quite flexible. For example, some products can allow you to replace what you want in any result set (including using parameters in select), rather than simply formatting the result as in the example above. It also supports the use of programming for construction, such as loops and conditional judgment structures. Some also support the parameterization of SELECT statements, such as passing parameters through HTTP.

Currently, template-driven mappings only support the transition from a relational database to an XML document.

5.2.2 Model-driven mappings

In a model-driven mapping, the data model corresponding to the XML document structure is explicitly or implicitly mapped to the structure of the database, and vice versa. Its disadvantage is that it is not flexible enough, but it is easy to use, because it is based on a specific data model to map, usually to the user to achieve a lot of conversion work. Because the results of converting data from a database into XML are based on a single model,
  
Therefore, it is common in this way to combine XSL to provide flexibility in a template-driven system.

There are usually two models of a data view in an XML document: A table model and a specific data object model. Sometimes other models may also appear. For example, by adopting the ID and IDREF attributes, an XML document can be used to specify a graphic. However, many of the existing middleware does not support these models.

5.2.2.1 Table Model

Many middleware packages use a tabular model to convert between XML and relational databases. It sees the XML model as a separate table or a series of tables. In other words, the structure of the XML document is similar to the following example, where,<database> does not appear in the case of a single table:
  
<database>
<table>
<row>
<column1>...</column1>
<column2>...</column2>
...
</row>
...
</table>
...
</database>

The term "table" is understood as a single result set (when data is converted from the database to XML), or as a separate table or updatable view (when data is converted from XML to the database). A similar conversion is almost impossible if the data needs to come from multiple result sets (when the data comes from the database) or if the XML document contains a deeper set of nested elements than a list of tables (when converting data to a database).

5.2.2.2 specific Data object model

The second universal data model in an XML document is the tree structure of a particular data object. In this model, the element type usually corresponds to the object, while the content model, attributes, and pcdata in the XML correspond to the properties of the object. This model is directly mapped into object-oriented database and hierarchical database, with the help of traditional object-relational mapping technology and SQL
  
The 3 object view can also be mapped to a relational database. Note that this model is not a Document Object Model (DOM). The DOM is modeled on the document itself, not the data in the document. Such as
  
href= "#writeyourown" >6.1.2, Dom is used to build a content management system based on a relational database.
  
For example, the sales order document above can be viewed as a tree structure consisting of five classes. As shown in the following view, including orders, SalesOrder, Customer,
  
Line and Part classes:
  
Orders
|
SalesOrder
/| Customer Line
| |
Part part

When an XML document is modeled as a tree of specific data objects, there is no need to require that elements must correspond to objects. For example, if an element contains only pcdata, such as a custname element in a sales order document, it can be treated as an attribute, so the property contains only a single, scalar numeric value. Similarly, it is sometimes useful to make a mixed element or element content model into a property. A ready-made example is the processing of a description element in a sales order document: Although it has mixed content in the XHTML format, it is more useful to treat the description element as a single attribute because its constituent parts do not make sense.

5.3 data types, null values, character sets, and others

This section explores some storage issues with XML documents from the database. Usually, you can't decide how the middleware you choose solves these problems, but you should be aware of these problems because it helps you choose your middleware correctly.

5.3.1 Data type

XML does not support any data types that have a real meaning. In addition to not analyzing entities, the data in all XML documents is treated as text, even if it can be represented by other data types, such as dates or integers. Typically, data transformation middleware converts text in an XML document into data types in other databases, and vice versa. However, the text format recognized by a particular data type is limited, for example, by the provided JDBC.
  
Limitations of the data types supported by the driver. In these many data types, date types usually cause trouble. Differences in digital formats in different international regions may also be problematic.

5.3.2 Binary Data

There are usually two ways to save binary data to an XML document: Unresolved entities and BASE64 encoding (a MIME encoding method that maps binary data to a subset of US-ASCII).
  
Both methods may be problematic for relational databases because the rules for saving and retrieving binary data from a database are very strict, which can cause problems with the middleware.

In addition, there is no standard notation to illustrate that an element in an XML document contains BASE64 encoded data, so that the middleware may not be able to recognize the encoding at all. Finally, when storing data to a database, the symbols associated with an unresolved entity or BASE64 encoding element may be ignored. So if binary data is very important to you, make sure your middleware supports binary data.

5.3.3 Null value

In the database world, null-value (NULL) data means that the data does not have a value. But this is a lot different from a number with a value of 0 or a string of length 0. For example, suppose your data comes from a weather station,
  
If the thermometer in the weather station doesn't read the temperature, then your database will store a null value instead of a 0. Obviously, a value of 0 is entirely another matter.

Support for the concept of an XML hollow value can be implemented by setting an optional element type or attribute. If the element type or attribute value is null,xml, it is only possible if the document does not contain the element or attribute. However, for a database, an empty element or a property containing a 0-length string is not null: their value is a string of length 0.

When mapping between XML documents and database structures, you must pay special attention to whether the optional element type or attribute corresponds to a null entry in the database. If you do not, there is a good chance of an insert error (when converting data to a database) or an invalid document error (when the data is read from the database).

Because it also uses a symbolic null value, XML is more flexible relative to the database. In particular, many XML users are likely to contain empty strings of empty elements or attributes that are null values. This time you must consider how to choose the appropriate middleware to solve this problem. Some middleware allows the user to choose what to use in an XML document to compose null values.

5.3.4 Character Set

By definition, the XML document can contain any Unicode characters except for some control characters. Unfortunately, many databases either restrict or do not support Unicode, and require special configurations to handle non-ASCII encoded character data. If your data contains non-ASCII characters, be sure to verify that your database and middleware are able to handle these characters.

5.3.5 Processing Instructions

Processing instructions are not part of the "Data" section of an XML document, so many middleware currently may not work properly. The problem is that, especially when the XML document structure is strictly mapped to a database structure, processing instructions are often difficult to handle because they can appear virtually anywhere in the document. Therefore, it is difficult for the middleware to determine where to store them and when to retrieve them. If it is important for you to process instructions and document looping replies ("round-tripping"), be sure to check your middleware to solve the problem.

5.3.6 storage Tag
  
In
  
href= "#markup" >4
  
href= "#markup" >.2.2
  
In the section, it is sometimes useful to save elements that contain elements or mixed content directly into the database without further parsing. The most common method is simply to save the tag itself directly to the database. Unfortunately, when retrieving data from a database, there is a problem: it is not possible to determine whether the tag in the database is a true tag or an entity that represents a tagged character, such as a character that is escaped by the LT and the GT.

For example, the following description element:

<description>
<b>confusing example:</b> <foo/>
</description>

stored in the database as:
  
<b>confusing example:</b> <foo/>

At this point, the database cannot determine whether <b> and <foo> are markup or text. There are several possible solutions, such as marking tags in a certain way or using entities for unmarked tag characters. But then you should pay extra attention to whether such a method is compatible with other applications that use the data. For example, if you want to query the database for less than number ("<") and
  
The LT entity ("<") should pay special attention.

5.4 Generating DTDs and their reciprocal processes from the structure of the database

When converting data between XML documents and databases, a common problem is how to generate XML DTDs from the structure of the database (Schema), if the database structure is generated from an XML DTD. In short, this is a very straightforward operation, but the resulting results are usually a little bit away from the expectations of many users.

(Also note that this is usually a one-time operation, and most applications, especially all of the vertical applications, combine a known set of DTDs and relational schemas.) The obvious exceptions are the tools for storing random XML documents in relational databases or publishing relational data as XML documents, while in the latter case the DTD does not work very clearly. )

For each attribute with a single numeric value in the element type and a child element type that contains only pcdata content, the TA
  
A new column (field) is created in the BLE. If the child element type or property is optional, let the field allow null.
  
For each attribute with multiple values or more than just the child element type of the Pcdata content, create a separate
  
Table to hold their values and connect to the parent table through the primary key of their parent table.
  
For each child element, the child elements themselves have elements or mixed content, and using the keyword in the parent table will
  
The parent element table is connected to the child element table.
  
The following is a process of generating an XML document from the structure of a relational database (Simplified):
  
For each table, create a new element.
  
For each column in the table, create a property or only pcdata child elements
  
Creates a new child element for each column that contains a primary key value in a primary key/foreign key relationship.
      
For example, the following procedure (simplified) shows how to generate a relational structure from a DTD:

For each element type that contains elements or mixed content, create a new table and a primary key field.
  
For each element type that contains mixed content, create a separate table that holds the unresolved data and links to the parent table through the parent element primary key.
  
Create a field in the table for each single value property of this element type and only one child element that contains the content of the unresolved data only once. If the element type or attribute is optional, you can let the field be set to a null value.
  
For each multivalued attribute and multiple child elements that appear, create a separate table to store the values, and link to the parent table through the parent element primary key.

For each child element with elements or mixed content, the parent element table and child element table are connected through the parent key.

The following procedure (simplified) shows how to generate a DTD from a relational structure:

For each table, create a new element;

For each field in the table, create a new property or a child element that contains only the unresolved data;

Creates a new child element for the relationship between the primary key/foreign key that provides the primary key in each table field.
   
Unfortunately, there are some flaws in these processes. For example, there is no method in the DTD to specify the data type or field length in advance and accurately.
  

Because any predefined (for example, by reading an example document) can cause errors when reading other "type" documents or other documents that contain more than word length content. (The long-term strategy is to use the data type of the XML Schema document.) In simple terms, when a DTD is generated from a relational structure, there is no way to prejudge the order in which the child element should appear or whether the field (such as the row ID inside the database) should be fully converted.
  
A named conflict can occur in both of these cases.

Despite such flaws, these methods still provide a good place to start the transition between a relational structure and a DTD.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.