Use 10 good habits

Source: Internet
Author: User
Tags xml parser xslt xquery

Define XML and encoding

Frequently used acronyms
  • DOM: Document Object Model)
  • DTD: Document Type Definition)
  • HTML: Hypertext Markup Language)
  • IDE: integrated development environment)
  • Sax: Simple API for XML)
  • XSD: XML Schema Definition)
  • XML: Extensible Markup Language (Extensible Markup Language)
  • XSLT: Extended style sheet language conversion (Extensible Stylesheet Language Transformations)

When creating an XML document quickly, it is generally inclined to create a basic structure and avoid some common XML document requirements, including specifying the XML Document declaration and the encoding type of the data contained in the XML document.

Consider the XML document shown in Listing 1.

Listing 1. XML documents that do not contain XML declarations and data encoding types

                
<phrases>
<phrase lang="en">Hello</phrase>
<phrase lang="it">Buongiorno</phrase>
<phrase lang="fr">Salut!</phrase>
</phrases>

For ordinary people, you can view the document and recognize it as XML, but it is difficult for computers to make such judgments. Add an XML Declaration on the top of the file to make it clearer and easier to recognize. A simple line of code indicates that the document is XML and indicates the character encoding type used by the version number and XML data. For example:

<?xml version="1.0" encoding="us-ascii"?>

The content in the encoding instructions should also be correct. The XML parser uses encoding to ensure that a single character in the XML document is correctly loaded. For example, if a phrase-based example in Listing 1 is added to the document, a problem occurs, because the currently specified encoding does not support extended character sets (the extended character set is required when a Russian phrase is used to represent Hello ).

Specifying the wrong encoding means that the parser cannot process the document correctly. For example, if you read a multi-byte extended character into a single-byte sequence, data corruption and poor output will occur.



Back to Top

Use DTD or XSD

After adding an XML declaration, make sure that you use DTD or XSD to define the structure of a valid XML file. Both methods allow the XML parser to check and determine that the content of the XML file matches the structure of the modeling data.

For example, a simple XML structure is provided for the contact database. You want to define a structure to specify the contact name, address, and phone number. Using the DTD method, you can map this structure and ensure that every contact in the structure matches the layout.

For example, the DTD for the contact database is shown in List 2.

List 2. DTD used to contact the database

                
<!ELEMENT phone (#PCDATA)>
<!ATTLIST phone type (home | work | mobile) #REQUIRED>
<!ELEMENT contact (#PCDATA | name | phone | address)*>
<!ELEMENT contacts (#PCDATA | contact)*>
<!ELEMENT country (#PCDATA)>
<!ELEMENT road (#PCDATA)>
<!ELEMENT address (#PCDATA | road | city | state | postcode | country)*>
<!ATTLIST address type (home | work) #REQUIRED>
<!ELEMENT state (#PCDATA)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT postcode (#PCDATA)>
<!ELEMENT city (#PCDATA)>

DTD defines the elements and attributes required to describe contacts (and the values supported by these attributes ). For example, in Listing 2, you can see that the phone element has a type attribute, while the address and its component element also have attributes.

Using DTD can help ensure the validity of attributes and identify any problems when used together with the verification process. When used together with an editor that supports XML, DTD can help edit and automatically complete the content.

Xsds, a mode, can perform many of the same functions as DTD, but it has its unique purpose. For example, some XML Editors need to use DTD to automatically complete the content, but the pattern is more flexible in designing the actual hierarchical structure of the document. You can select a tool based on the specific environment.



Back to Top

Remember to verify

Check listing 3 to find out the problem?

Listing 3. Verification example

                
<contacts>
<contact>
<name>Martin</name>
<phone type="home">123 456 7890</phone>
<phone type="mobile">123 456 7890</phone>
<phone type="work">123 456 7890</phone>
<address type="home">
<road>Home road</road>
<city>Home city</city>
<state>Home state</state>
<zipcode>12434</zipcode>
<country>USA</country>
</address>
</contact>
<contact>
<name>Sharon</name>
<phone type="work">234 567 8901</phone>
<phone>234 567 8901</phone>
<address type="home">
<road>Other home road</road>
<city>Other city</city>
<state>Other state</state>
<zipcode>39487</zipcode>
<country>USA</country>
</address>
<address type="work>
<road>Work building, work road</road>
<city>Work city</city>
<state>Work state</state>
<zipcode>12347</zipcode>
<country>USA</country>
</address>
</contact>
</contacts>

Manual troubleshooting is very troublesome. However, you can use xmllint (a free tool that can check the content and structure of an XML file) to view the output of the file, as shown in Listing 4.

Listing 4. Output from xmllint running listing 3

                
$ xmllint contacts.xml
contacts.xml:27: parser error : Unescaped '<' not allowed in attributes values
<road>Work building, work road</road>
^
contacts.xml:27: parser error : attributes construct error
<road>Work building, work road</road>
^
contacts.xml:27: parser error : Couldn't find end of Start Tag address line 26
<road>Work building, work road</road>
^
contacts.xml:32: parser error : Opening and ending tag mismatch: contact line 15
and address
</address>
^
contacts.xml:33: parser error : Opening and ending tag mismatch: contacts line 1
and contact
</contact>
^
contacts.xml:34: parser error : Extra content at the end of the document
</contacts>

Although it seems much more complicated than the initial problem (one of the attributes is not over), it provides a starting point for you to locate the problem.

Xmllint also supports various command line options to help you select diagnostic methods and results. One of the most useful options is--nooutIt prevents xmllint from returning content when parsing the file. It does not affect short files, but it is a problem for large files.

If you are using a DTD, use --postvalidThe option tells xmllint to verify the content against the DTD to ensure that the content is not only valid XML, but also matches the structure of the DTD. If you add the DTD generated for the contact file using DTD or XSD to the file, the attribute definition error will be corrected, but a different error will be generated later, as shown in listing 5.

Listing 5. xmllint finding different errors

                
$ xmllint --noout --postvalid contacts.xml
contacts.xml:9: element address: validity error : Element zipcode is not declared
in address list of possible children
contacts.xml:21: element address: validity error : Element zipcode is not declared
in address list of possible children
contacts.xml:28: element address: validity error : Element zipcode is not declared
in address list of possible children
Document contacts.xml does not validate

Xmllint
You can easily and quickly determine whether the document structure is valid. Xmllint is part of the libxml2 toolkit, which has been bound to Linux, UNIX, and Mac
OS X, but Windows needs to be downloaded independently. For more information about xmllint and libxml2, see references.



Back to Top

Verification does not always solve the problem

Using xmllint and similar tools to verify the XML file (especially if DTD is used) is a good way to verify the content of the XML file. However, this method also has its limitations. For example, how to process the content of an XML file?

Using DTD or XSD, you can specify specific content for the attribute. You only created an attribute with a string or ID (which can be part of a list of restricted available options), but you cannot control or limit the content of the element in this way.

Example
For example, in the contact example, the telephone numbers element contains numbers and spaces. However, you cannot add letters or characters to the element. In use
Xmllint does not check for errors during verification, and the editor and other support XML
The solution cannot solve or identify this problem. An application failure may be the same as you expected because it identifies a non-standard data type.

In short, XML verification can only ensure the structure is correct, but cannot ensure the effectiveness of data.

The easiest way to solve this problem is to write a parser that can read XML files and actually verify the data content. But do not over-verify the content, just make sure that the data meets the requirements of the application.



Back to Top

XML structure and attributes

There are different opinions on whether to use attributes or elements to describe the information that you want to present in an XML file.

The general practice is to use elements (that is, data between tags) to define the information contained in the file, and properties are used to provide the extended limit of the described data.

Each element and attribute has its own drawbacks. For example, an attribute cannot be repeated in a tag. This is a typical example of an element that is superior to an attribute. This feature makes it very practical. On the contrary, it is sometimes complicated to use element to restrict data.

The phone number in the contact example is a good explanation of the advantages of attributes. In this example, as shown in Listing 6, use properties to restrict the type of phone numbers (such as office, residential, or mobile phones ).

Listing 6. Restrict the phone number type

                
<phone type="home">123 456 7890</phone>
<phone type="mobile">123 456 7890</phone>
<phone type="work">123 456 7890</phone>

Using this structure, you can easily take the number as a whole (ignore attributes), or select a specific phone number type (use attributes ).

Compare this structure with the structure in listing 7 that uses only the element design.

Listing 7. Use Only elements to restrict phone numbers

                
<phone>
<type>home</type>
<number>123 456 7890</number>
</phone>
<phone>
<type>mobile</type>
<number>123 456 7890</number>
</phone>
<phone>
<type>work</type>
<number>123 456 7890</number>
</phone>

It is still difficult to judge whether the primary node is superior to the secondary node. Although theoretically, any XML Parser or appropriate XPath definition can extract the information you need. However, this has little benefit and makes it difficult to read XML documents.



Back to Top

Use XPath to find information

When processing XML data, it is very complicated to find the required information. You can write a parser to select the desired information, but in some cases, you only need to quickly find a small segment of information in the file.

For example, if you need to list all countries from the contact XML file to view the worldwide distribution of contacts, you can use XPath to select information.

By using the structure of the XML file as part of the query, XPath enables you to extract data from the XML file. For example, by providing a path for a specific element in an XML file, you can extract the data of the element:

$ xpath contacts.xml '//contact/address/country'

You can analyze the content as follows:

  • The double slash (//) at the beginning indicates to search for the specified contact element anywhere in the document ).
  • The next Slash and element name specify the next element (Address) to be searched-that is, to find the address element in the contact element.
  • The last slash repeats this process. The country element is searched for this time.

Note: In this example, you have defined the type of the address from which information is selected, so all addresses are selected. You can view the XPath query results in listing 8.

Listing 8. XPath query results

                
$ xpath contacts.xml '//contact/address/country'
Found 3 nodes:
-- NODE --
<country>USA</country>-- NODE --
<country>USA</country>-- NODE --
<country>USA</country>

To select more specific data, you can specify the content of the element or attribute to be matched. For example, if you only select a mobile phone number, you must specify the property type and value. Therefore, use the (@) symbol to search for an attribute and specify the value to be matched (in listing 9 ).

Listing 9. select only the mobile phone number

                
$ xpath contacts.xml '//contact/phone[@type="mobile"]'
Found 1 nodes:
-- NODE --
<phone type="mobile">123 456 7890</phone>

Both List 8 and 9 use a command line tool. Many XML toolboxes provide native methods to process XPath elements. You can use the XPath specification to extract data and directly use it in applications without using a parser to obtain information.



Back to Top

It is not always necessary to use a parser to extract information

Although unexpected, you must use a fully functional XML Parser using sax, Dom, or other technologies (such as XPath or XQuery) extract the required information from the XML file.

XML files contain data in a structured format, but sometimes you need to use your own structured format for information. To quickly find an information segment, you can usually use a simpler solution.

Generally, you only need to use grep, Perl, or other similar tools to extract the required data, without actually interpreting the structure or content of the document in the form of an XML file.

For example, you can use grep to select a phone number (see list 10 ).

Listing 10. Use grep to select a phone number

                
$ grep '<phone' contacts.xml

<phone type="home">123 456 7890</phone>
<phone type="mobile">123 456 7890</phone>
<phone type="work">123 456 7890</phone>
<phone type="work">234 567 8901</phone>
<phone>234 567 8901</phone>

You have used grep to select the required information, and you do not need to consider that the information is in XML format or structure.

If you need to find short information fragments, the simplified processing technology can find the required information and avoid the overhead of using traditional parsing methods.



Back to Top

When to use SAX instead of Dom Parsing

When building a parser for a document to obtain the required information, it is often difficult to decide when to use a sax-based processing program and when to use a dom-based processing program.

The simplest solution to this problem is to consider the complexity of the document and the purpose of the information to be searched. If you want to convert a document, or the document is very large, it is the best choice.

Sax parses document elements one by one and calls methods or functions when identifying elements. If you want to convert an XML document to another format, such as converting XML to HTML, Sax is the most effective method. You do not need to add the entire document to the memory, but only need to respond to the identified elements and structures.

The disadvantage of Sax is that if you need to save or record the structure, or understand the entire document and select a single element from it (for example, selecting a single contact from all records ), you must construct a complex processing program to load data, record the data to the structure, and then identify the elements in the output target.



Back to Top

When to use Dom instead of sax Parsing

Dom can load the entire document and its structure into the memory, and allow you to reference and use the XML document structure within the application. For example, in the contact example, you can read the entire contact database into the memory, traverse the contact to select all phone numbers, and then traverse each phone number within each contact.

Since Dom retains the structure, it is more important to understand and process the structure, and you can easily process the structure as a whole or separately. Taking the contact example as an example, it is very complicated to insert a new contact using sax. However, if you use Dom, you only need to insert a new XML Element indicating a new contact to an existing XML document.

The defect of Dom is that it is too complicated to use stream-based file processing-for example, converting to HTML-because each element must be traversed one by one in the structure to process the document.

This
In addition, because Dom loads the entire XML file into the memory during parsing, the DOM parser will become very slow and require more memory. But dom
This method also has some advantages. For example, you can perform multiple processing on XML documents parsed using Dom during a parsing process. And use
To achieve the same effect, you need to repeat the parsing process multiple times.

Visit references to learn more about using Dom and sax.



Back to Top

Good XML editor

If you often need to write and use XML, you must have a good XML editor. The XML editor is different from the standard text editor. The former can understand the structure and layout of XML. The rich features provided by the XML editor make XML processing easier. These features include:

  • Complete-enter a character for a completed element, and the editor will automatically help you enter the remaining content.
  • Internal
    Capacity completion-if you use a DTD for an XML file, the editor can fill in and format part of the content for you. For example, in the contacts DTD
    The Type attribute of an element is a required element. In the smart XML editor, this property (with a blank value) is automatically introduced to text when the phone tag is created.
  • Inline formatting-the editor makes your xml easier to read and understand. This can be implemented immediately during editing, or by using a separate format command. Finally, we can get the XML that can be understood and can be identified more quickly.
  • Built-in verification-when entering content, the editor can verify XML document errors and highlight various problems in the editor immediately, so that you know how to solve these problems.
  • Built-in translation and conversion-some XML editors include XPath and XQuery interfaces, and in some cases XSLT and other conversion interfaces. Therefore, you can view the conversion results in the editing environment.
  • Learning and operations-sometimes you create an XML structure before the DTD. In this case, the editor can read the XML file, learn its structure, and create a DTD for verification. This saves you a lot of time and effort.

A good XML editor includes eclipse and oxygenxml, but there are many other options.



Back to Top

Conclusion

Developing good XML processing habits will make everything quite different, including the basic knowledge of using the functions provided by XML, breaking XML standards for verification, and correct processing and parsing. This article helps you learn these 10 good habits and improve the efficiency of processing XML documents and data.

From above: http://www.ibm.com/developerworks/cn/xml/x-tengoodxmlhabits? S_tact = 105agx52 & s_cmp = tec-csdn

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.