Detailed XML various analytic methods

Source: Internet
Author: User
Tags object model

Even developers with extensive experience with advanced XML issues are not necessarily fully aware of some of the most fundamental aspects of XML. To lay a solid foundation for you, this article discusses the most basic XML services: parsing. In this paper, various analytic methods are introduced, and their advantages and disadvantages are emphatically explained.

Understanding the Basics

It has been about 9 years since the advent of XML. This is a very short process for extensible Markup Language. It's hard to find an application that doesn't use XML at all.

But when you're with a client, it's still inevitable that the basics are not fully understood. It's a bit surprising that developers who understand the complexities of XML topics have recently found that there are still a lot of gaps in the grasp of foundational things like parsing.

And where does the processing of XML begin? Yes, it's parsing. Parsing may be the most basic service that a developer can use. The parser reads the XML document, interprets the syntax, and delivers meaningful objects to the application. The parser may also provide other services, such as validation (which guarantees that the document conforms to an XML Schema or DTD) or namespace resolution.

This paper introduces various analytic methods, analyzes their advantages and disadvantages, and helps you select the appropriate tools in the next project. The article contains a lot of links, the selection of tools can be a detailed study of the given API.

The importance of parsing

Why is parsing important? Because all XML processing starts with parsing. Whether using high-level programming languages (such as XSLT) or low-level Java programming, the first step is to read XML files, decode structures and retrieve information, and so on, which is parsing.

The first option faced when parsing a document is to use a ready-made parsing library (basically every programming language, including COBOL [Common Business oriented Language]) or create one yourself. The answer is simple: choose a ready-made library.

Frankly, XML is not a complex syntax, so it is understandable that ideas can be parsed by regular expressions or other special methods. But it's hard to be successful: XML syntax requires support for multiple encodings and many elusive features, such as CDATA sections and entities. Custom implementations are almost impossible to take care of all these aspects and thus create incompatibilities.

Conversely, the parsers that are provided with the development environment are mostly tested for compatibility. The main reason for using standard syntax such as XML is to be compatible with other applications and toolkits, which is one of the things that really makes it worthwhile to use a well tested library.

Most parsers provide at least two APIs, typically an object model API and an event API (also known as a streaming API). For example, the Java platform provides both the DOM (Document Object model) and SAX (simple APIs for XML).

The two sets of APIs provide the same services: document decoding, optional validation, namespace resolution, and so on. The difference is not in the service but in the data model used by the API.

Key choice: The first method

The object model API defines a hierarchical object model to represent an XML document. In other words, each concept in the corresponding XML syntax defines the corresponding class: element, attribute, entity, document. When the parser reads an XML document, it establishes a one-to-one mapping between the XML syntax and the class. For example, an element class is instantiated whenever a tag is encountered.

Not surprisingly, there is some controversy about which data model is best. The main advantage of the DOM, which is normalized by the consortium, is portability: it is defined as a CORBA interface and is mapped to many languages. So if you know the DOM in JavaScript, you know the DOM in Java, C + +, Perl, Python, and other languages.

Another data model is JDOM, a Java-optimized DOM (designed for Java), which is more tightly integrated with the Java language, but with a defined lack of portability.

Although one can continue to discuss which data model is best for XML syntax, I don't think it makes much sense because the pros and the disadvantages of various object-based APIs are essentially the same. On the good side, the object model API is easier to understand if you are familiar with XML syntax. Because it maps directly from XML syntax to classes, it's easy to learn, use, and Debug.

The simple price is efficiency, at least for many projects. When you read the document, the parser creates the object based on the syntax structure. XML syntax is not appropriate for many applications:

XML syntax is verbose, and even if the document is small, the parser creates many objects.

The optimization of an XML vocabulary is usually directed at storage and data transfer efficiency rather than processing, so the application may need to preprocess the data, for example, to calculate portions and or merge data from other sources before the actual processing begins. In many cases, data must be copied from the XML object model to the application-specific object model or database before processing.

Because this object model is generic, it contains references between objects that are not needed by applications (for example, from child elements to parent elements). These references further increase memory consumption.

Working with small documents on the desktop This may not be a big problem, but in other environments, such as servers, the inefficiencies inherent in the object model are unacceptable.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.