When should you not use XML (1)

Source: Internet
Author: User
Tags date format command line data structures interface log sort xml parser
Xml

Today's computing world tends to use XML for any formal specification and data description. The author of this article-a staunch supporter of XML-raises a blasphemous question: "Is XML totalitarianism a good idea?" "In this point of view, the co-founder of Terence Parr,jguru demonstrated the poor human-computer interface of XML formation. He also raises questions that let you decide for yourself whether the XML is even appropriate for your project's program interface.
Remember what it was like to be in front of a scrapbook? (as my friend Gary Funck said, "If you're not too old to remember ...) So good for You "). Each program stores data in different ways, and rarely wants to send large amounts of data to another application, certainly not to another running program. In modern operating systems, the paste buffer maintains data in a standard manner, and each program is free to interpret buffer data when it feels appropriate. For example, you can cut a piece of data from a database program and paste it meaningfully into a graphics program.

Similarly, there is a standard way of sharing data, called XML, between programs on the internet and between machines. Without XML or similar standards, two of programs would not be able to share information-for data portability, the basic syntax for formatting data must be the same. Of course, you may not be able to interpret the data, but at least you can read them in. Take a look at techniques such as SOAP and Xbean to understand how XML facilitates interoperability (see resources).

Now that we have reached a consensus that XML is, or should be, the common language of Program data interchange, I would like to turn to the question of: when does it make sense to use XML? First, you need to be reminded of what XML looks like and how it differs from other data formats. If this is the case, then when determining the project data format, I propose a series of questions that can prove useful. Finally, I'll demonstrate my main recommendation: XML forms a bad Man-machine interface.

You said "Date-uh" and I said "Dat-uh."
XML is a means of highlighting data structures, which makes it easier for computer programs to check for it. Of course, it's not the first data format. The old comma-separated value (CSV) format has been used for decades to describe rows of data. For example, there might be a three-line integer that describes three date records:


8, 17, 1964
12, 30, 1975
9, 1, 1970



CSV is hard to beat in terms of readability and simplicity (presumably you must write a program in the first computer class to read the CSV data). The problem is that CSV has strict data order, and CSV can't easily describe nested structures and different types of elements. Adding curly braces to represent nested, aggregated data, such as C and VRML, improves the expressiveness while making the data readable. For example, to associate each row of data with an identifier, you might write:


{{8, 1964}, instructor}
{{A, 1975}, student}
{{9, 1, 1970}, student}



This format is still fairly easy to parse, but it continues to enforce strict data order. The way to resolve order limitations is to mark all data, such as:


{date={m=8, d=17, y=1964}, Title=instructor}
{date={d=30, m=12, y=1975}, Title=student}
{title=student, date={m=9, d=1, y=1970}}



Although the data is now location-independent (you can see I've mixed some elements), the redundant tags increase the cost of storage and make it more difficult to parse it.

Currently, most of us encode data in XML as follows:


<record>
<date><m>8</m> <d>17</d> <y>1964</y></date> <title> Instructor</title>
</record>
<record>
<date><d>30</d> <m>12</m&gt <y>1975</y></date> <title> Student</title>
</record>
<record>
<title>student</title> <date><m>9</m> <d>1</d> <y>1970</y> </date>
</record>



My view is that there are countless descriptive formats (languages), with varying degrees of readability, ease of implementation, efficiency, universality and presentation, and so on. Even so, we need to quickly focus on the full control of the XML format.

XML complexity and inter-program communication
The growth of the Web and the general familiarity of HTML lead us to XML, which is actually just another structured data format. Unfortunately, the more we label data, the greater the impact on computer time and space efficiency. On the other hand, data is a standard form that any program with an XML parser can read. For different situations, trade-offs need to be addressed between simplicity and standardization. You might want to use XML when you can't make up your mind. Another obvious rule is that if your data is highly structured, then use XML. For example, when you export Jguru FAQ content (see Resources) to our partners, you provide an XML data file. Conversely, if the data is not so complex, XML may not be the best choice. By asking simple questions, you can usually make decisions between XML and another data format for your program. For any of the four questions in this section, if you answer yes, consider replacing the standard XML data in a simpler format.

Does the XML parser account for far more than the rest of the program?
If programming tasks and related data are fairly simple, why add yourself to the burden by using a large XML parser and the glue code necessary to pull data out of the result tree? Keep in mind that your goal is to complete the task rather than fiddling with the DOM tree yourself. The more components in your program, the more likely you are to fail. Now, on the other hand, if you already have an XML parser in your program, you might as well use it to keep it consistent in your program.

Also recall that most programming languages have small, built-in parsers for non-XML data formats. For example, Java has a standard way to read a property file, as well as a Streamtokenizer class that is easy to extract data from a simple data string.

Does the program run on a small machine or on a large data set?
If the data is not highly structured/nested and is very large relative to your machine, you should avoid using XML because you may need additional disk and memory storage. Back in 1993, I worked at the Supercomputing Center, where physicists often stored trillions of files (1 trillion bytes per file!). )。 Adding XML tags to the data will make those files too large. I dare say that putting the entire tree into memory, even for today's parallel supercomputers, is a challenge. The extra processing time for parsing XML tags can also be reached to a degree that is not allowed. To be sure, you rarely need to simulate high-energy lasers to synthesize fluid dynamics, but even regular files such as logs can become large.

Does the XML data format prevent you from using a number of line-based tools such as grep, SED, awk, and WC?
If you use LS instead of Dir, you may be familiar with Unix based on line tools, such as grep and WC. I can say for sure that without grep, I am a wreck, without SED, awk, and WC, and I feel very helpless (believe me, that would be bad). Storing information into location-related data by one record per line is a great advantage over tagged XML data because it can perform stunning transformations and operations on the data from the command line or with simple scripts. I don't need to build a development environment just to check the data and write a use XML parser.

Consider the Jguru Web site, which generates a number of logs and transaction records. I decided to write a record for each line without the tag information-the position of the item on the line determines what it is. For example, to write a site logon event to a file in this form:

Timestamp:user-id
Because the format is so simple, I can manipulate the data with all the Unix tools. If I want to know how many times user 1290 has logged in today, you can use the following command:


$ grep 1290 Login.log | Wc-l



This command filters the log file to find any rows that contain 1290, and then calculates the number of rows in the result. If I want a histogram of all the users who log on today, you can call the following command:


$ Awk ' {print $} ' Login.log | Sort | uniq-c | Sort-r-N



Not everyone is so familiar with Unix data tools, but to be sure, choosing a simple data format allows me to manipulate the data without resorting to a program.

This data is stored using XML records in the following ways:


<login><timestamp>2001-04-06</timestamp><id>1290</id></login>



Then, it's hard to manipulate that data without writing a program.

Is your program "disposable", a truly unique application?
You can never predict what will happen to future programs or data (just ask COBOL programmers from the 60 's), but occasionally you have to write programs that perform tasks that never occur again. For example, if the schema of the database is being further processed while the software is being upgraded, and the mode converter may not be used again, this is normal (although some code may be borrowed later). When you dump information from a database to make it a new pattern, you should optimize for execution speed and ease of implementation, rather than keeping it consistent with XML.

Syntax is not semantic
Finally, in terms of communication between programs, I would like to remind you that syntax is not equal to semantics. The semantics (meaning) of the data depend entirely on the application. The parser only handles syntax (formatting). Consider that all human languages use exactly the same data format-strings (sentences) or sound streams (speech). But if you've ever asked for directions in Paris with no French speakers, you know that communication is far more difficult than asking for permission to communicate verbally--a useful reminder for travellers: talk with your hands; it seems to help.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.