Data applied to the XML management hierarchy

Source: Internet
Author: User
Tags format interface connect ole query xml parser xmlns access
xml| data

  Introduction

In the real world, a large number of data have hierarchical structure, common examples include organization sequence (such as corporate organization, troop preparation, combat), classification system (such as the classification of the map, equipment system), family pedigree and so on. Therefore, the management of hierarchical data in the process of software development is often encountered, is a common problem.

Data management refers to how to classify, organize, encode, store, retrieve, control and maintain the data, which is the central problem. The relational database management system, through the powerful mechanism of index, query optimization, transaction processing, concurrency access control, trigger and error recovery, effectively ensures the high efficiency of data access, guarantees the integrality and consistency of data, and provides the reliability and security of data, Thus has other data management way incomparable superiority. Therefore, in the enterprise computing environment, the relational database occupies a dominant position for data management, especially for large-scale business data management. At present, a large number of large-scale information systems are based on the relational database platform. However, the relational model also exposes some inherent deficiencies in the management of hierarchical data, as well as the increasingly pervasive semi-structured and unstructured data.

  Lack of relational models in managing hierarchical data

The theoretical basis of relational database is relation theory, and its data model is relational model. In the view of the user, the logical structure of a relational model is a two-dimensional table of rules, composed of rows and columns. With the continuous emergence of new computer technology, as well as the data distribution, heterogeneity and semi-structured characteristics, as well as the degree of unstructured characteristics of the continuous strengthening of the relational database system in the data management has also shown some deficiencies:

    • The real world is forced to map into a set of tables (i.e. a set of two-dimensional relationships), so many of the semantics of complex objects, such as aggregation and specificity, are discarded.
    • The order semantics of data in the real world must be described by additional fields in the relational table to be able to index the order of the same elements (in the recordset of the relational database, the order of the records is meaningless.) Of course, the meaning of order is not what the theory of relationship requires. This makes it easier to search for records by indexing, and is optimized for performance. Also, it is more cumbersome to maintain this order semantics.
    • Similarly, the hierarchical structure of data in the real world (such as the parent/child, ancestor/descendant hierarchy) must also be described by additional fields in the relational table. In fact, the hierarchy can also be regarded as a special order semantics.
    • Relational models require that relationships must be normalized. In order to improve the degree of normalization of the relational model (the current highest level is the fifth paradigm) to reduce data redundancy and avoid data updates (including inserts, modifications, and deletions), it is necessary to split the relationship (i.e., the normalized design) at design time (design-time). , and then connect (join) at run time (run-time). This increases the difficulty of system design and affects the performance of the system, because connectivity is the most time-consuming operation in relational databases. Therefore, you need to strike a balance between the design-time performance of the database and Run-time performance.
    • Once the database schema has changed, you may need to rewrite the interface program.
    • Each record contains a fixed number of fields, each of which occupies a constant amount of storage space. This can be a waste of space when storing descriptive information. Furthermore
    • The structure of the database should be relatively stable, not easy to be changed, the changes in the structure of the database (that is, the database restructuring, different from the database of the organization) can lead to extremely high costs, can be poor scalability. This also explains to a certain extent why the analysis and design of the database has such an important role. This contrasts with the structural flexibility of XML.

Therefore, it is necessary to explore new ideas and methods of data management. XML has some unique advantages and functions, which makes it emerge in the management of hierarchical data, and has aroused wide attention.

  Data applied to the XML management hierarchy

The data that applies XML management hierarchies has several notable advantages: easy to build and maintain the tree structure of data, easy to maintain the order semantics of data, and easy to combine with the development platform's tree control.

  The characteristics of the XML data model

XML is the standard established by the Lingua and is designed as a lingua-franca to exchange information between users and programs. It has a range of fine features, such as scalability, simplicity, self-describing, structure, content, and performance separation. As a result, XML has been strongly supported in both free software and the business software industry to provide developers with great flexibility.

In terms of data modeling, XML provides two tools: a DTD and an XML Schema. They enable developers to crystallize their ideas and develop specifications for a group of documents with the same logical structure, not just a single document. It is through these two modeling tools that XML promises to help create smarter documents, such as providing a certain degree of error-checking that makes it easy to extract useful information from it and show it to people's needs.

  Advantages of the XML data model

The XML data model itself is a tree model, and a well-formed (well-formed) and valid (validated) XML document is parsed by the DOM, creating a tree in memory. Therefore, for the order semantics and hierarchy of the data in the real world, the XML data model can be well and conveniently maintained by the XML document parser, and the developer does not have to bother with any effort. Moreover, this kind of order semantics and the dynamic maintenance of the hierarchical structure are also relatively simple.

In addition, the XML parser is a component with a standard interface that allows developers to avoid the duplication of development and distribution of interface programs, as well as to reduce the cost of testing and maintaining the corresponding interface programs.

  The combination of XML and tree-like controls

XML is an international standard, the tree-like control is a common and important interface elements, both in the industry have been widely supported; and, as mentioned earlier, the XML data model itself is a tree model, this structure of intrinsic similarity, so that both have a natural good binding.

Because the management of hierarchical data is often encountered during the software development process, in order to avoid duplication of labor and avoid redevelopment of possible errors, thereby improving the efficiency and quality of software development, cutting development costs and shortening development cycles, the author uses component technology to combine XML with tree controls, and to apply OLE, User-drawn technology, developed a component, in the organization of the Coding system, task space concept model (CMMS) management system and other projects have been reused, and achieved the expected results.

  This component has the following features and features:

      facilitates the construction of the tree structure while preserving the order semantics and hierarchy of the data
      • the data source is an XML document.
      • can quickly determine the order semantics and hierarchy of data in IE to meet the requirements.
      • If the component loads the XML document successfully, the application displays the tree structure with the tree control, otherwise the error prompts the user for modification. The
      • supports persistence (persistence) and saves the user's modifications to the XML document.
    • graphically maintains the hierarchy
      • to increase nodes, and to add child nodes or sibling nodes
      • Delete nodes at the specified location. The
      • modifies the node. A quick lookup and positioning of the
      • node. The
    • supports in-place edit when you modify an item with OLE tightly bound
      • . When
      • modifies a project, it supports filtering for specific characters.
      • supports the use of drag-and-drop (drag & Drop) to complete a node's replication or move operations between trees or the tree itself. The encoding of the
    • node is the key to combining a tree structure with a relational database, which provides a very flexible way to encode the code for the
      • node in a printable character form. The length of the
      • encoding can be extended arbitrarily, allowing the tree to have sufficient height (the maximum number of levels of each node in the tree). The value range of
      • is approximately 100, allowing the tree to have enough degrees (the maximum number of subtrees each node in the tree has). The
    • uses the user-drawn (custom draw) method to make the tree control have powerful performance
      • visually adjusts the font and font size of the node. The
      • can visually set different colors for each node. The
      • prints out the tree structure.

  Example

For clarity, listing 1 shows sample data (XML document format) for a bank's organization sequence:

   Listing 1. A Bank Organization Sequence Example (XML document format)
<?xml version='1.0' encoding='GB2312' standalone='yes'?><A银行总部>S<B1省分行>SA<C1县支行>SAA<D1镇分理处>SAAA<E1村储蓄所>SAAAA</E1村储蓄所><E2村储蓄所>SAAAB</E2村储蓄所><E3村储蓄所>SAAAC</E3村储蓄所></D1镇分理处><D2镇分理处>SAAB</D2镇分理处></C1县支行><C2县支行>SAB</C2县支行></B1省分行><B2省分行>SB</B2省分行><B3省分行>SC</B3省分行><B4省分行>SD</B4省分行><B5省分行>SE</B5省分行></A银行总部>

In contrast, listing 2 provides sample data (relational model data format) for a bank's organization sequence, so it is not difficult to find the advantages of XML document format management hierarchy data:

  Listing 2. A bank organization sequence example (relational model data format)

Please note that the "leaf node", "hierarchies, upper-level encodings, top two encodings, three-level encodings, and four-level encodings are all added to describe the hierarchical structure of the data, and a large number of null values have to be introduced in the table above.

A bank organization sequence a performance effect in this component is shown in the following illustration:


  The combination of XML and relational database

From the above discussion, the application of XML to data management has a special advantage, can effectively make up the relational database in the hierarchical structure of data management deficiencies and existing problems, has a very broad prospects.

Of course, this solution is not perfect, there are several problems. First, all the data in the XML is stored as strings. This can result in additional time overhead when searching through documents or when data type conversions are required. Developers must pay enough attention to this extra time overhead when the data is large or the application is demanding time. The only way to solve this problem is to build a typical application and perform a strength test on it (stress test). Secondly, XML documents have potential security risks when they bring people readability. In addition, many of the relevant standards and technologies of XML are still in the draft stage, and there is no final stereotypes, and the conflict of interests among different technology manufacturers may bring more serious problems.

Therefore, a natural idea is to combine XML with relational database, in order to play their strengths and avoid weaknesses. Specifically, it is the use of XML to manage small-scale hierarchical data, and the relational database to manage large-scale business data, both through the tree's node code to connect.

For example, based on the organization sequence provided by the organization coding system, the function of information collection, query, statistics and maintenance of the Organization can be further accomplished by using the relational database technology.

  Summary

The application of XML in data management has special advantages, which can effectively make up the shortage and existing problems of relational database in the management of hierarchical data. However, the relationship between XML technology and relational database technology is not competitive, mutually marginalized, but mutually complementary and mutually reinforcing. In fact, XML and relational databases are highly complementary, and they seem to be designed for collaboration, and will coexist down. Almost all of a comprehensive data management system requires simultaneous use of both XML and relational databases. The good news is that almost all major relational database products provide support for XML.

  reference materials

    • Wang Shan the Database tutorial "the Principle of database system" in detail discusses the characteristics and advantages of relational database in data management, and emphatically analyzes the deficiencies of the relational model in managing the hierarchical data and the increasingly universal semi-structured and unstructured data.

    • Xu Jianzhong's paper, "XML application in Data Management" discusses in detail the possible applications of XML in data management, including basic storage of data, archiving and backup of data, intermediate data Interchange Format, data mining and data representation.

    • The application of data exchange based on XML in simulation system can be found in the paper "Xml-based Data Interchange Format (Xu Jianzhong) in advanced distributed simulation system".

    • XML's features, grammatical rules, components, and industry applications can be found in "XML A Primer (2nd Edition)" written by Simon St. Laurent.

    • Michael Morrison's "XML Unleashed" details the ways in which component technology is used to access components of XML, such as sax, DOM, and XSLT, on various development platforms, especially C + +.

    • The paper "An empirical Study of Xml/edi", written by E.j.lu, elaborates the opportunities and benefits that enterprises, especially small and medium-sized enterprises, can bring to the adoption of XML as solutions in electronic data interchange systems, as well as potential pitfalls.

  About the author

Xu Jianzhong is a software engineer and simulation system developer who writes articles on relational databases, software development for simulation data visualization, System modeling and simulation. He has extensive programming using compilations, C + +, Matlab, Fortran, and various web tools. You can contact him through the xuxz02@21cn.com .



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.