Brief introduction
People like XML and the flexibility and interoperability it offers, but by using some tricks, it makes it easier to interoperate with XML and tools that work with XML. Develop some good habits when working with XML to ensure the most efficient use of your XML documents and applications.
Use 10 kinds of good habits
Here are 10 of the best XML habits:
Defining XML and encoding
Using DTDs or XSD
Remember to verify
Validation does not always solve the problem
XML Structure and attributes
Using XPath to find information
It is not always necessary to extract information using the parser
When to use SAX rather than DOM parsing
When to use DOM rather than SAX parsing
Use a good XML editor
Defining XML and encoding
When you quickly create an XML document, you tend to create a basic structure and avoid some common XML document requirements, including the encoding type that specifies the XML document declaration and the data that the XML document contains.
Consider the XML document shown in Listing 1.
Listing 1. XML documents that do not contain XML declarations and data encoding types
<phrases>
<phrase lang="en">Hello</phrase>
<phrase lang="it">Buongiorno</phrase>
<phrase lang="fr">Salut!</phrase>
</phrases>
For the average person, you can view the document and recognize it as XML, but it's hard for a computer to make such judgments. Adding an XML declaration at the top of a file makes it more explicit and easier to identify. A simple line of code can explain that the document is XML and indicates the version number and the character encoding type used by the XML data. For example:
<?xml version="1.0" encoding="us-ascii"?>
The contents of the coding instructions should also ensure correctness. The XML parser uses encoding to ensure that a single character of an XML document is loaded correctly. For example, continuing with the phrase-based example in Listing 1, if you add a Russian entry to the document, the problem occurs because the encoding currently specified does not support the extended character set (the extended character set is required when the Russian phrase is used to represent Hello).
Specifying the wrong encoding means that the parser does not handle the document correctly; For example, if you read a multi-byte extended character as a single byte sequence, it can result in corrupted data and bad output.