Convert HTML files to XML

Source: Internet
Author: User
Tags tidy tostring
xml| translation in the first article written by Java expert Michael Geisler for Builder Australia, he showed readers how to use Jtidy to convert HTML files into XML.

The most important thing about Java is that there are many extended standard libraries that can be part of a standard platform, and there is a lot of support for XML in those libraries. However, for some specific needs, there may not be direct support in the standard library.



In fact, you have two choices:

To build something entirely by yourself. This is generally painful and time-consuming.
Go to the "community" and see if someone has the same problem (which is very likely to happen) and see if he is happy to share his results with you.
For this case, SourceForge has a very useful small project, called Jtidy. The Jtidy Web site is located in http://sourceforge.net/projects/jtidy/.

Jtidy provides HTML syntax checking and HTML "pretty printing (pretty print)", but for you it also allows you to use an HTML file as input and then convert it into XML. Jtidy reads the input file, and if it finds any closed tags that do not match or omit, the tags are corrected and a well-formed XML document is finally exported.

As you can see from the example code below, the use of jtidy is fairly straightforward. Simply set the Jtidy instance to output XML, provide an input URL, output files and error files, and then start the conversion process.

Import Java.net.URL;
Import java.io.*;
Import Org.w3c.tidy.Tidy;

public class Testhtml2xml {
Private String URL;
Private String Outfilename;
Private String Erroutfilename;

Public testhtml2xml (string URL, string outfilename, String
Erroutfilename) {
This.url = URL;
This.outfilename = Outfilename;
This.erroutfilename = Erroutfilename;
}

public void convert () {
URL u;
Bufferedinputstream in;
FileOutputStream out;

Tidy Tidy = new Tidy ();

Tell Tidy to convert HTML to XML
Tidy.setxmlout (TRUE);

try {
Set file for error messages
Tidy.seterrout (New PrintWriter (New FileWriter (Erroutfilename), true);
u = new url (URL);

Create input and output streams
in = new Bufferedinputstream (U.openstream ());
out = new FileOutputStream (outfilename);

Convert files
Tidy.parse (in, out);

Clean up
In.close ();
Out.close ();

catch (IOException e) {
System.out.println (this.tostring () + e.tostring ());
}
}

public static void Main (string[] args) {
/*
* Parameters are:
* URL of HTML file
* Filename of output file
* Filename of error file
*/
Testhtml2xml t = new Testhtml2xml (Args[0], args[1], args[2]);
T.convert ();
}
}





Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.