XML-based overall database analysis

Source: Internet
Author: User
Tags software ag

We know that when there is a large amount of data to be processed and analyzed, it is best to put the data into the database, so almost all large commercial application systems are associated with the database, therefore, if XML needs to expand in the business field, it must also be associated with the database. So the first question to be discussed here is whether XML itself is a database. Strictly speaking, XML only means XML documents. Although an XML document contains data, it is only a text file if it is not processed by other software. Therefore, XML itself cannot be hooked up with databases. However, with some other auxiliary tools, we can regard XML as a database system and XML text as a data zone in the database, DTD or schemas can be regarded as a database pattern design, xql can be regarded as a database query language, and sax or Dom can be regarded as a database processing tool. Of course, it still lacks something necessary for the database, such as effective storage organization, index structure, security, transaction processing, data integrity, triggers, multi-user processing mechanisms, and so on.

But why should we associate XML with the database? For example, you have an e-commerce application.ProgramXML is required for data transmission. What you care about is the structure that data should have. You do not care about the actual storage structure in the document. If your application is simple, the basic file system will meet your needs, but if the application itself is complex, you need a complete development application environment to support xml. On the other hand, if you have a Web site whose content is composed of a series of XML documents, you must not only manage the site, at the same time, you need to provide a mechanism for users to search for the content of the site. All of these require the use of databases. The most important factor for choosing a database is whether you need a database to store data or documents. If you want to store data, you need a relational database or an object database to store actual data, at the same time, you need middleware to build a bridge between databases and XML documents. On the other hand, if you want to store documents, you need a content management system to store documents. In fact, XML documents can be divided into two categories: data-centric or document-centric.

Data-centric documents: data-centric documents have very rule results, such as XML documents about sales orders or hotel menus. Data-centric documents are generally designed for machines, that is, they are mainly used to facilitate machine processing. Generally, any Web site can dynamically construct HTML documents. The steps are as follows: Find the relevant data-Oriented XML documents based on the user's query requests, and then convert the XML documents through XSL, allows HTML-based browsers to conveniently browse results.

Document-centric documents: Document-centric documents have irregular structures and a large data granularity. Specific examples include books, emails, and advertisements. Document-centric documents are designed primarily for humans.

To store or extract data, you can use databases and middleware, or you can use an XML server or an XML-based Web server. To store documents, you need a content management system or a persistent Dom implementation. A large number of data-centric documents can be found in databases or XML documents. In this way, we need tools to convert data from the database into XML documents, or convert an XML document into a database. At the same time, it should be noted that when you store data in a database, you need to discard a lot of information about a document, such as its name and DTD, its physical structure, for example, object definition and usage, arrangement of elements under a node, and storage of binary data. Similarly, when extracting data from a database, the generated XML document usually does not contain CDATA or entity usage instructions, in addition, the elements in a node are arranged in the same order as those recorded in the database. In fact, an XML document is stored in the database and then generated by the database. The formats of the two documents are almost impossible to be the same.

To transfer data between a database and an XML document, a ing must be established between the document structure and the database structure. This ing can be classified into two categories: Template-driven and model-driven.

1. template-driven ing: You need to embed commands in a template and use the data transmission middleware for processing. For example, consider the following template:

<? XML version = "1.0"?>
<Flightinfo>
<Intro> The following flights have available seats: </intro>
<Selectstmt> select airline, fltnumber, depart, arrive from flights </selectstmt>
<Conclude> we hope one of these meets your needs </conclude>
</Flightinfo>

Note that a SELECT statement is embedded in the statement. When data transmission middleware is used for processing, each select statement will be replaced by its results. XML format is shown as follows:

<? XML version = "1.0"?>
<Flightinfo>
<Intro> The following flights have available seats: </intro>
<Flights>
<Row>
<Airline> Acme </airline>
<Fltnumber> 123 </fltnumber>
<Depart> dec 12,199 </depart>
<Arrive> dec 13,199 8 </arrive>
</Row>
...
</Flights>
<Conclude> we hope one of these meets your needs </conclude>
</Flightinfo>

Template-driven ing can be quite flexible. For example, some products allow you to place the result set anywhere in the XML document and set parameters for the SELECT statement, you can also use for loop statements and if condition statements. It is worth noting that the current template-based ing can only be used to transmit data between relational databases and XML documents.

Model-driven ing: this means that data is transmitted from a database to an XML document using a specific model. In this way, XSL can be combined with a model-based ing product. In XML documents, two models are common: Table model and data-specific object model ).

2. Table model: Many middleware software packages use table models to transmit data between XML documents and relational databases. It represents an XML document as a single table or a collection of tables. In this way, the structure of an XML document can be expressed as follows:

<Database>
<Table>
<Row>
<Column1>... </column1>
<Column2>... </column2>
...
</Row>
...
</Table>
...
</Database>

The keyword "talbe" indicates a single result set when the data is transmitted from the database to the XML document. when the data is transmitted from the XML document to the database, represents a single table or view. However, when there are more than one result set, or when the XML document contains multiple complex nesting, this transfer method cannot be adapted.

2. Data-specific object model: represents an XML document as a tree composed of data objects. Each element type corresponds to an object. It is mainly used in object-oriented and hierarchical databases. It can also be mapped to relational databases through the traditional relational-object model. Note that this model is not a Document Object Model (DOM ). For example, a sales order document can be viewed as an object tree, which includes five categories: orders, salesorder, customer, line, and part. As follows:

When we regard an XML document as a data-centric Object Tree, elements may not necessarily correspond to objects. For example, an element that only contains pcdata can be considered as an attribute, it includes a Single Scalar Value.

In fact, there are two steps to convert data between XML and databases: one is to generate a DTD from the database mode, and the other is to generate a database mode based on the DTD.

To generate a link mode from a DTD, follow these steps:

1. For each element, a table and a primary key column are generated.

2. Create an independent table for each element with mixed content to store pcdata and associate it with the parent table through the primary key of the parent table.

3. for each attribute of a single value in the element type, for a child element with only pcdata content (this child element appears in order), a separate column is generated, if the child element type or value is optional, the column should be allowed to be null.

4. for attributes with multiple values and child elements that can appear multiple times (the child element pcdata), you need to create a separate table to store these values, the primary key of the parent table is connected to the parent table.

5. For each child element that contains an element or mixed content, the parent element and the child element are connected through the primary key of the parent table.

Follow these steps to build a DTD from a relational database:

1. Create an element for each table.

2. Create an attribute or a child element with only pcdata content for each column in the table.

3. Create child elements of the table element based on the relationship between each primary key and foreign key in the table.

XML-basedDatabaseProduct Category

According to Ronald bourret's description in XML database products, XML database contains seven types of products:

2Middleware(Middleware)

1. XML-enabled databases, which can be combined with XML to drive XML databases. For example, both oralce and Microsoft claim that their latest database products can be seamlessly connected with XML.

2. Native XML Database)

3. xml servers)

4. XML application server, such as IBM WebSphere

5. Content Management Systems)

2. Persistent Dom implementation (persistent Dom implementations)

The following is a detailed description and introduction of each product.

Middleware: middleware is the software used to process and convert XML documents and databases. It is mainly used in data-centric applications. It can be written in a variety of languages. Generally, it requires ODBC, JDBC, or ole db. although it can transmit data over the Internet, it generally uses Web servers to transmit data.

Next we need to consider how to choose the middleware suitable for your application when storing XML documents in the database.

In fact, when we select middleware, we need to consider the following factors:

1. Data Type: XML does not support data types. That is to say, all data in XML documents is text, even if the data itself represents another data type, such as a date or integer. Generally, data transmission middleware converts data to other types.

2. binary data processing: There are two common methods to store binary data in XML documents: unparsed entities (unparsed entities) and base64 encoding.

3. null type processing: in the relational database World, null indicates that the data does not exist. It is certainly different from 0 or an empty string. Of course, XML also supports the concept of null. If an optional element type or attribute is null, it is not included in this document. When ing the structure of an XML document to a database or generating an XML document based on the database content, you need to consider the ing between the optional element types and attributes and the columns that can be null.

4. Character Set: an XML document can contain any UNICODE character. Unfortunately, many databases do not support Unicode. Therefore, if your data includes non-ASCII characters, you need to pay attention to the processing of these characters in the database and middleware.

5. processing commands in XML: processing commands are not data in XML documents, so it is difficult for middleware to decide how to store them. Therefore, when selecting middleware, we need to see how they process the processing commands.

6. Tag storage: note that different middleware processes tag differently. In addition, the storage mode in the database is also different. See the following example:

<Description>
<B> confusing example: </B>
</Description>

The storage format in the database is as follows:

<B> confusing example: </B> <Foo/>

This is mainly because the database cannot identify whether the <B> and <Foo> are markup or text.

Typical middleware:

ADO: It can implement bidirectional conversion between databases and XML documents. XML supports saving a record set object as an XML document. It can also process an XML document as a record set, which provides a bridge between the XML document and the database. This type of ing is actually model-driven. data can be viewed as an object tree. A tree with a nested structure can be displayed as a nested record result set, and vice versa. In addition, changes in the record set data can be reflected in the corresponding XML document, and changes in the content of the XML document can also lead to changes in the database content.

Asp2xml: Transfers data between XML documents and databases based on ODBC or ole db through a COM object. This product is actually model-driven and regards XML documents as a single table. When data is transmitted from a database to an XML document, you can specify a SELECT statement, the output contains tags dedicated to asp2xml. When the XML document data is transmitted to the database, the XML document must contain a specific asp2xml tag. The COM Object supports automation, that is, it can be used in script languages, such as ASP.

XML-supported Database Systems(XML-enabled databases): The database provides an extended function for data transmission between XML documents and databases. It is usually designed to be a data-centric document that can be stored and extracted. Generally, the XML document is parsed and stored in the corresponding table. Of course, the document-centric document can also be stored, that is to say, the entire document is used as a field in a single table and then queried through the text retrieval mechanism. Because many databases can now publish content to websites, the differences between XML-based databases and XML servers become very vague.

Typical products include SQL Server 2000 of Microsoft. SQL Server2000 supports XML in three ways:

1. added the for XML Condition Clause in the SELECT statement: The for XML Condition Clause has three options to specify how to map the SELECT statement to XML. In RAW mode, the result set is specified as a table. Each row in the table corresponds to an element. Each column corresponds to the attribute of the element or the child element it contains. The difference between auto and raw is that the element names of rows are consistent with the table names. The generated XML documents are linearly nested and correspond to the order in which tables appear in select statements. Explicit allows you to use the Union of a series of select statements to build an XML document.

2. Information locating through XPath: establishes ing between elements and attributes of the XML document and between tables and fields in the database. XML is regarded as an object tree, And a subset of xpath is used for query.

3. Use the openxml function in the stored procedure: the openxml function is used to extract any part of the XML document and treat it as a table. Then, you can use the from statement in the SELECT statement to specify the table, the insert statement is used to transmit data between XML documents and databases. You can also use XPath to specify specific elements or attributes.

Native XML database: in fact, there are four options for storing XML documents in a database:

1. The entire document is stored in text format. For example, the Blob type is used in relational databases and the file type is used in file systems.

2. Store the entire document in a modified form in the file system, for example, in the form of compression or pre-resolution.

3. Map the document structure to the database. For example, map the DOM mode to the table form in the database. How to Establish a ing can be implemented in different databases ,.

4. Map the data structure to the database. For example, map an XML document containing sales orders to tables such as order, itmes, parts, and customer.

The difference between the original XML database and the database that supports XML is that the original XML database generally uses methods 2 and 3. Generally, Method 4 is used for databases that support xml.

Typical products include the lore system developed by Stanford University. Its database is actually a semi-structured database. The so-called semi-structured example is our resume, which contains both structured information, such as gender, age, communication phone number, and unstructured information, such as personal expertise. XML itself is an example of a very good semi-structured data mode, which is self-described and contains a lot of metadata. It can also expand or add new metadata (or new fields ).

Lore is used to store semi-structured databases. It was initially used to store HTML document data, but can now be used as an XML database. It includes a query language (lorel), multi-index mechanism, query optimizer, multi-user support, logging and recovery, and can be imported into external data. Because Lore supports semi-structured structure, it can also store XML documents without DTD definitions.

XML server : Generally, the XML server is a platform for providing data services, and the data here appears in XML format, the data mainly serves distributed applications, such as e-commerce and B2B applications. The XML server usually includes a complete application development environment, and various data storage methods are used to allow applications to conveniently obtain and use the data. The stored data includes traditional database data, email information, and file systems. We know that traditional Web servers transmit information based on HTML text. With the emergence of XML technology, the demand for XML-based Web servers also arises. So what is an XML server? It is difficult to accurately define the concept of XML server, because it is a relatively new and widely used concept. Although many products have already called themselves XML server, for example, datachannel's datachannel Server 4.1; Software AG's Tamino; excelon's excelon, each product is different in terms of application scope and functions, therefore, we will not define the XML server here, But summarize the common features of these products and explain the concept of XML server in a descriptive way. To put it simply, XML server is a platform for providing data. It can interact with distributed applications in the form of XML documents. For example, e-commerce applications. This sounds like a traditional database. It provides data storage and extraction functions like a database, but the data format is based on XML. Therefore, in terms of data processing, the technology is completely different from that of traditional databases.

Therefore, XML server is considered to be one of the XML databases. XML-enabled server is relatively easy to understand, because it is essentially a Web server. for the client, browsing the Web, the client does not feel much different from the traditional web server, but the server is actually processing requests, XML-enabled web server and traditional Web server are two completely different methods, which are determined by the characteristics of XML documents and HTML documents. As we all know, XML documents are data-centric documents. XML documents themselves do not represent formatted Information, but are presented through specific XSL or CSS, that is to say, data and performance are separated. After the client submits the request, the XML-enabled web server combines the content and form to publish the final result to the Web server of the client. Therefore, this is a fat server and thin customer model. This pattern is completely different from Microsoft's design philosophy of integrating XSL parser into IE. It provides documents in the corresponding format for different devices to browse the Web without manual interference.

From another perspective, the XML server can provide better ability to manage XML format data than simply XML documents, in addition, you can avoid the data conversion process when using traditional databases (Because XML is a standard Extended Markup Language and is not the exclusive Technology of various companies, database manufacturers have different formats for their own databases, so some middleware must be used for conversion) to achieve efficiency.

Of course, no technology is perfect. Any new technology has many shortcomings before it is fully mature. XML server and XML-enabled server are no exception, they have the following disadvantages or problems:

For XML server: the performance of XML server has not been verified, because it adopts a new data organization method, this method has not been widely used in the past. Just as we have reason to be optimistic about the development prospects of XML server, we also have reason to be skeptical about such products that have not been used in a wide range. Let's take a look at the Configuration Requirements of several typical XML server products.

△Datachannel Server 4.1 hardware requirements in Windows:

500 MHz or faster Pentium III processor with at least 256 MB of RAM

△Datachannel Server 4.1 hardware requirements in Windows:

Sun ultra 10 or equivalent. At least 256 MB of RAM

△Tamino hardware requirements in Windows:

At least 300 MHz or faster. At least 256 MB of RAM

For the XML-enabled web server, the biggest problem is that the XML-enabled web server is too complicated. Compared with the traditional Web server and HTML server, there are still few people who know how to use the advanced technologies of XML, and the installation of server is too complicated, and the development tools are too professional. How to make computer beginners quickly learn how to use it is another problem that needs to be solved. The XML-based Web server architecture is as follows:

Finally, let's take a look at Microsoft's support for XML-based Web servers, the actual implementation method is to provide direct access to SQL Server through HTTP through IIS ISAPI extension and return the query results to the client in XML format. The simplest access method is to use an SQL statement in an HTTP URL:

Http: // iisserver/virtualroot? SQL = select + * + from + customers + for + XML + auto

Note that you can execute stored procedures in URLs and use XML document templates.

XML application server: The XML application server is actually a Web application server that supports XML. They are usually template-driven and use SQL statements embedded in a script language to extract data and dynamically construct XML documents.

Content Management System: The content management system is used to store, extract, and assemble XML documents. They generally include the following features: Editor, version control, and multi-user concurrent processing. They are transparent to the database implementation. They are mainly used to manage documents. Documents are generally in XML format or other forms such as RTF, PDF, or SGML. For a very simple set of documents, the file system can meet your requirements. However, if you have a complex collection of documents, you usually need a content management system. The meaning of the content management system is to allow you to divide a document into specific content fragments, such as examples, procedures, chapters, or toolbar, and other metadata such as the author name and version number. Then you can re-assemble the XML document as needed. You can also synthesize a New XML document based on these fragments.

The content management system usually has the following functions:

1. version and availability Control

2. Search Engine

3. Editor

4. The publishing engine publishes content to books, CD, or web

5. Separation of content and form

6. scale through scripts and interfaces

7. Integration with database data

Using the object-relational model to map Dom to the database, you need to create a corresponding table for each object in the DOM in the database. Generally, the system must include five tables:

1. Attribute definition: defines attributes, including their types and valid values.

2 element/attribute Association: defines which attributes are associated with which elements.

3. Content Model Definition: defines which elements can contain other elements.

4. Attribute Value: contains the attribute value and pointer to the relevant row in the attribute definition table and element/attribute Association table.

5. element value: includes the element value (pcdata or pointer to other element values), the number of times an element appears in its parent node, and the pointer to the row containing the element value of the parent node, pointer to the corresponding row in the element/Attribute Table.

The first three tables and a simple DTD are equivalent. The next two tables contain actual data. By repeatedly querying the last two tables, it is possible to construct any part of the XML document.

Persistent Dom implementation (persistent Dom implementations): database is used for Dom implementation to speed up and avoid insufficient machine memory, especially when XML documents are very large. They store the structure of XML documents. The persistent Dom implementation can be used to store, extract, and query XML documents, or create new documents from the current documents. That is to say, dom-based applications can be implemented through programming.

In fact, you can writeCodeTo integrate middleware, XML-supported databases, original XML databases, XML servers, and permanent Dom implementations. Here, the XML application server requires you to write some script code, and the content management system requires you to make some system configurations.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.