1.0 Introduction
This paper briefly discusses the relationship between XML and the database, and lists some software that can use the database to process XML documents. Although I am not going to introduce these software in detail here, I hope it can describe the main part of the XML document for processing data using databases. This is a bit biased towards relational databases because of my experience.
2.0 is XML a database?
Before we begin to discuss XML and databases, we need to answer a lot of questions: "Is XML a database? "Strictly speaking, if" XML "refers to an XML document
Yes "no ". Although the XML document contains data, if there is no other software to process the data, it has no difference in the meaning of the database and other text files.
In a broader sense, when "XML refers to XML documents and all related XML tools and technologies, the answer is" yes ". The reason must be that XML provides a lot of data in the database: storage (XML document), structure (DTD,
XML Schema
Language), query language (xql, XML-QL, quilt, etc.), programming interface (SAX, DOM), and so on. However,... XML still lacks many of the features required in real databases.
Backup content: effective storage, indexing, security, transactions, data integrity, multi-user access, triggering, and multi-document query.
Therefore, XML can be used as a database in an environment with a moderate data volume, a small number of users, and low performance requirements. In most product environments, many users are required to use and strict data integrity is required.
XML is not competent because of its high performance requirements. In addition, given that databases such as dbase and access are both cheap and easy to use, XML is rarely used in the first case.
There is a reason to act as a database role.
3.0 why database?
When using XML and
The first question I want to ask myself is: Why do I need to use a database? Do you need to export the original data? Do you need to save your Web homepage? You want to use data in an e-commerce application
And XML is used as the data format for transmission? The answers to these questions will directly affect your selection of databases and middleware (if any.
For example, assume that you are using e-commerce Program Use XML for data transmission. This is a good solution, because your data has a highly standard structure, and the entities and codes in XML are not for you.
Important. After all, you only care about data, not how the data is physically stored in documents. If your application is relatively simple, the relational database and data transmission middleware will be able to be full
Meet your needs. If the application is large and complex, you need a development environment that fully supports XML.
On the other hand, assume that you have
Websites created from scattered XML files. You not only need to manage this website, but also provide methods for users to query the content. At this time, your files will be very nonstandard, and the use of entities for you
The structure of these files is the foundation of the website. In this example, you need some kind of "Native
XML "databases can perform versionization, trace object usage, and support query languages such as xql.
4.0 comparison of data and documents
The author believes that when selecting a database, the most important judgment factor may be whether you use the database to store data or save documents. If you want to save data, the database you need is mainly for data storage (for example
Relational databases or object-oriented databases) and the conversion between databases and XML documents. From another perspective, if you want to store documents, you need a Content Manager specially designed to store files.
Management System.
Although you can save files in relational databases or object-oriented databases, you often find that your job is to repeat the features of the content management system. Similarly, although a content management system is usually built on an object-oriented database or relational database, it may be difficult to use a content management system as a database.
whether you need to store data or documents depends on your XML document. The reason is that XML files are divided into two categories: data-centric and document-centric ..
4.1 data-centric files
data-centric files are characterized by standardized structures and good data granularity (that is, the smallest independent unit in the data is the pcdata element or attribute. There is little or no mixed content. The sequence of appearance of
level elements and pcdata is not important. A typical example is that XML documents include sales orders, flight schedules, restaurant menus, and so on. Data-centric documents are often used by machines.
XML may be redundant-it is only a means of data transmission.
for example, the following sales order document is data-centric:
ABC industries
123 Main St.
Chicago
IL
60609
981215
In the XML world, many documents with rich content are actually data-centric. Take the Amazon.com website that displays the library information as an example. Although this page is a huge text
The structure of this text is highly standard, many of which are the same for any book description page, and the size of each part of the page is limited. That is to say, the page can be
A single, data-centric XML document that contains the text information retrieved from the database and an XSL style table. Currently, any dynamic structure by filling the database data in the template
Websites that create HTML pages can be replaced by the data-centric XML document and one or more XSL style sheets described above.
For example, let's take a look at the following lease document:
ABC industries agrees to lease the property
123 Main St., Chicago, il from XYZ
Properties for a term of not less than timeunit = "months"> 18 at a cost of currency = "USD" timeunit = "months"> 1000.
You can obtain the following XML documents and simple style sheets:
ABC Industries
Document-centric documents are characterized by nonstandard structures and greater data granularity (that is, the smallest independent data unit is an element containing mixed content or the entire XML document) and a large amount of mixed content
. The order of the appearance of elements and pcdata at the same level is very important. Typical examples are books, emails, advertisements, and most XHTML documents. Document-centric documents are used by people
.
For example, the following product description document is document-centric:
Turkey wrench
Full fabrication labs, Inc.
Like a monkey wrench, but not as big.
The turkey wrench, which comes in both right-and
Left-handed versions (skyhook optional), is made of the finest
Stainless steel. the READI-grip rubberized handle quickly adapts
To your hands, even in the greasiest situations. adjustment is
Possible through a variety of M dials.
You can:
Order your own turkey wrench
Read more about wrenches
Download the catalog
The turkey wrench costs just $19.99 and, if you
Order now, comes with a hand-crafted shrimp hammer as
Bonus gift.
4.3 data, documents, and databases
In reality, the differences between data-centric and document-centric files are not very strict. For example, a data-centric file (such as an invoice) may also contain coarse granularity and
Data (such as the description of the invoice ). A document-centric file (such as a user manual) may also contain structured data (usually metadata) with good granularity and rules, such as the author and revision date.
In addition, making your documents data-centric or document-centric helps you determine whether you are concerned about data or documents, which also determines what system you need to adopt.
to store or retrieve data, you can use a database (usually relational, object-oriented, or hierarchical) and middleware (word band or third-party ), you can also use an XML server to create a
distributed application platform, such as an e-commerce application that uses XML for data transmission ). To save the document, you will need a content management system or a consistent Dom implementation system. For more information about various systems, see
Section 5.0
"store and retrieve data" and section 6.0 "
href =" # storingretrievingdocs "> storage and retrieval Documentation Section. You can also learn a detailed list of related products in
href = "http://www.rpbourret.com/xml/XMLDatabaseProds.htm">
XML database products.
5.0 store and retrieve data
The data content in a data-centered document may come from the database (in this case, you want to export the data in XML format ), it may also be an XML document (in this case, you want to store the data in the database ). Example of the former
It refers to a large amount of existing data (or legacy data) stored in relational databases. The latter example is to publish data as XML on the web, and you want to store it in your database for more places
. In this way, you may need to transfer the XML document to the database software, or transfer the XML document software from the database, or both.
5.1 Transfer Data
When storing data in a database, you often need to discard a large amount of information related to the document, such as the document name and DTD, as well as its physical structure, for example, object definition and usage, attribute values, and
Storage Methods of ordered and binary data (which are base64 encoded, non-analytic entities or other methods), character data segments, and Other encoding information. Similarly, when retrieving data from a database, the generated XML document
Except for the non-predefined entities LT (<"), GT ("> "), AMP (" & "), APOs (" '"), quot (""") does not contain any
CDATA or object reference. The order in which the elements and attributes of the same layer appear is usually the order of the data returned from the database.
Although
It surprised you, but it is often reasonable. For example, assume that you need to use XML as the data format to transfer a sale from one database to another. In this case
You do not need to worry about whether to save the customer name in the character data segment or as an external entity, or directly
Pcdata. The most important thing is that the relevant data is transferred from the first database to the second database. In this way, the data transmission software needs to consider the data hierarchy (this structure will be related to the sales order
.
One of the consequences of ignoring document information and its physical structure is
The inconsistency effect of the document's "inverse Regression" means that the data of a document is stored in the database and then organized into a new document based on the data. Even according to the standard format, the results are often different from the previous documents. Whether this is acceptable depends on your needs and affects your choice of database and data transmission middleware.
5.2
Ing from document structure to database structure
To transmit data between XML and databases, the ing between the document structure and database structure is required. Such mappings are generally divided into two categories: Template-driven and mode-driven.
This template-driven ing can be quite flexible. For example, some products allow you to replace the content you want in any result set (including using parameters in the SELECT statement), rather than simply formatting the result in the preceding example
. It also supports programming for construction, such as loop and condition judgment structure. There are also some parameters that support the SELECT statement, such as passing parameters through HTTP.
currently, template-driven ing only supports conversion from a relational database to an XML document.
5.2.2 model-driven ing
In the ing of model-driven, the data model corresponding to the XML document structure is explicitly or implicitly mapped to the database structure, and vice versa. Its disadvantage is that it is not flexible enough, but easy to use.
because it is mapped based on a specific data model, it can usually achieve a lot of conversion for users. Because the results of converting data from a database to XML follow a single model,
therefore, In this mode, the flexibility of the template-driven system is usually provided in combination with XSL.
data views in XML documents generally have two models: Table model and specific data object model. Sometimes there may be other models. For example, by using the ID and idref attributes, an XML document can be used for a specified image. However, many existing middleware products do not support these models.
5.2.2.1 table model
Many middleware software packages use table models to convert between XML and relational databases. It regards the XML model as a separate table or a series of tables. That is to say, the structure of the XML document is similar to the following example. In the case of a single table, <database> does not appear:
<Database>
<Table>
<Row>
<Column1>... </column1>
<Column2>... </column2>
...
</Row>
...
</Table>
...
</Database>
The term "table" can be understood as a single result set (when converting data from a database to XML ), or a separate table or updatable view (when converting data from XML to the database
). If the data needs to come from multiple result sets (when the data comes from the database) or compared with a set of tables (when the data is converted to the database, XML documents contain a deeper level
So similar conversions are almost impossible.
5.2.2.2 Specific Data Object Model
The second Common Data Model in XML documents is the tree structure of specific data objects. In this model, element types usually correspond to objects, while content models, attributes, and pcdata in XML correspond to the attributes of objects. This model is directly mapped to object-oriented databases and hierarchical databases. Of course, with the help of the traditional Object-relationship ing technology and SQL
3. The object view can also be mapped to a relational database. Note that this model is not a Document Object Model (DOM ). Dom models the document rather than the data in the document. For example
Href = "# writeyourown"> as described in section 6.1.2, Dom is used to establish a content management system based on relational databases.
For example, the above sales order document can be seen as a tree structure composed of five classes. See the following view, including orders, salesorder, customer,
Line and part class:
Orders
|
Salesorder
/| Customer line
|
Part
When an XML document is modeled as a specific data object tree, there is no need to require elements to correspond to objects. For example, if an element only contains pcdata, such
The custname element can be processed as an attribute. Therefore, an attribute only contains a single, scalar value. Similarly, it is useful to model Mixed elements or element content into attributes.
A ready-made example is the processing of the description element in the sales order document: although it contains a mix of content in the XHTML format, it regards the description element
A single attribute is more useful because its components are meaningless.
5.3 Data Type, null value, Character Set, and others
This section describes how to store XML documents from databases. Generally, you cannot decide how the middleware you choose solves these problems, but you should be aware of the existence of these problems, because it helps you select your middleware correctly.
5.3.1 Data Type
XML
It does not support any meaningful data types. Except for non-analytic entities, data in all XML documents is treated as text, even if it can be represented by other data types (such as dates or integers. Connect
Generally, data conversion middleware converts text in XML documents to Data Types in other databases, and vice versa. However, the text formats recognized by specific data types are restricted, for example
JDBC
Restrictions on the Data Types supported by the driver. Among these many data types, the date type usually causes trouble. Differences in digital formats may also cause problems in different international regions.
5.3.2 binary data
There are two methods to save binary data to XML documents: unparsed entity and base64 encoding (a mime encoding method that maps binary data to a subset of a US-ASCII ).
for relational databases, both methods may have problems because the rules for saving and retrieving binary data from the database are very strict, this will cause middleware problems.
In addition, there is no standard symbol used to indicate that an element in an XML document contains base64 encoded data, so that the middleware may not be able to recognize this encoding. Finally, when storing data
to the database, symbols related to unparsed entities or base64 encoded elements may be ignored. Therefore, if binary data is very important to you, make sure that your middleware supports binary data.
.
5.3.3 null value
in the database World, NULL data indicates that the data does not exist. However, this is very different from a number with a value of 0 or a string with a length of 0. For example, assume that your data comes from a weather station.
if the temperature of the weather station is faulty, then, a null value instead of a 0 value will be stored in your database. Obviously, a value of 0 is completely another matter
support for the XML empty value concept can be achieved by setting optional element types or attributes. If the element type or attribute value is null, XML only needs to include this element or attribute in the document. However, for databases, null elements or attributes containing 0-length strings are not null: their values are strings with a length of 0.
When ing between XML documents and database structures, you must pay special attention to whether the optional element types or attributes correspond to null values in the database. If this is not done, insertion errors may occur (when data is converted to the database) or invalid document errors (when data is read from the database ).
Because the symbol null value is also used, XML is more flexible than the database. Specifically, many XML users are likely to include null elements or attributes of null strings. At this time, you must consider how to select the appropriate middleware to solve this problem. Some middleware allows users to choose what to define in the XML document to make up null values.
5.3.4 Character Set
According to the definition, apart from some control characters, XML documents can contain any Unicode characters. However, unfortunately, many databases limit or do not support Unicode, and some special
To process non-ASCII character data. If your data contains non-ASCII characters, check whether your database and middleware can process these characters.
5.3.5 processing commands
Processing commands are not part of the "data" section in the XML document. Therefore, many middleware may not be able to process the commands normally at present. The problem is that, especially when the XML document structure is strictly mapped to the database structure
It is usually difficult to handle because they can appear virtually anywhere in the document. Therefore, it is difficult for middleware to determine the location where they are saved and when they are retrieved and read. If the processing command and
If the circular reply ("Round-tripping") in the document is very important to you, you must check that your middleware solves this problem.
5.3.6 storage tag
On
href = "# markup"> 4
href = "# markup ">. 2.2
as mentioned in the section, it is very useful to save elements containing elements or mixed content directly to the database without further parsing. The most common method is to simply save the tag itself to the data
database. Unfortunately, when retrieving data from the database, it is impossible to determine whether the mark in the database is a real mark or an entity that represents the mark character, for example, characters escaped by LT and GT.
For example, the following description element:
<Description>
<B> confusing example: </B> <Foo/>
</Description>
Stored in the database:
<B> confusing example: </B> <Foo/>
In this case, the database cannot determine whether <B> and <Foo> are tags or texts. There are several possible solutions, such as marking in a certain way or marking non-marking words
Object. However, pay special attention to whether this method is compatible with other applications that use the data. For example, if you want to query
Pay special attention to entity ("<.
5.4 generate a DTD from the database structure and its reciprocal process
When converting data between an XML document and a database, a common problem is: how to generate an xml dtd from the database schema, if the DTD from the XML generates a database structure. In short, this is a very direct operation, but the results are usually far from the expectations of many users.
(Note that this is usually a one-time operation, and most applications, especially all vertical applications, combine a set of known DTD and relational schema. An obvious exception is the tool for storing random XML documents in relational databases or publishing relational data as XML documents. In the following cases, the role of DTD is not obvious .)
For each element type that has a single value attribute and contains only pcdata content
Ble creates a new column (field ). If the child element type or attribute is optional, this field can be blank.
Create a separate element type for each attribute with multiple values or multiple child element types containing only pcdata.
Table to save their values, and connect to the parent table through the primary keyword of their parent table.
For each child element, these child elements also have elements or mixed content. Use the keywords in the parent table
The parent element table is connected to the child element table.
The following is a process for generating XML documents from the structure of a relational database (simplified ):
Create an element for each table.
Create an attribute for each column in the table or a child element containing only pcdata
Create a child element for each column that contains the primary key value in the primary key/foreign key keyword relationship.
For example, the following process (simplified) illustrates how to generate a relational structure from a DTD:
For each element type that contains elements or mixed content, create a table and a primary key field.
For each element type that contains the mixed content, create a separate table to store the data that has not been parsed. Use the primary key of the parent element to link to the parent table.
Create a field in this table for each single-Value Attribute of this element type and child element that only contains the data that has not been parsed and appears once. If the element type or attribute is optional, you can set this field to a null value.
For each multi-value attribute and child element that appears multiple times, create a separate table to store the value and link it to the parent table through the primary key of the parent element.
For each child element with element or mixed content, the parent element table and the child element table are connected through the primary key of the parent element.
The following process (simplified) illustrates how to generate a DTD from a relational structure:
Create an element for each table;
For each field in the table, create a new attribute or a child element that only contains data not analyzed;
Create a child element for the relationship between the primary key and the foreign key in each table field.
Unfortunately, there are still some defects in these processes. For example, there is no way in DTD to specify the data type or field length accurately in advance.
This is because any pre-defined document (for example, reading a sample document) may cause errors when reading other "type" documents or other documents that contain documents that exceed the length of words. (The long-term strategy is to use
The data type of the XML schema document .) Simply put, when a DTD is generated from a relational structure, it is impossible to determine in advance the sequence or field that the child element "should" appear (such as the internal
Whether to perform full conversion.
In both cases, naming conflicts may occur.
Despite such flaws, these methods can still lay a good foundation for the conversion between relational structures and DTD.