Organize HTML documents into canonical XML documents

Source: Internet
Author: User

There are several ways to do HTML file parsing and data extraction on the. NET platform, the simplest and most secure way is to use tools to organize HTML documents into XML documents, and then manipulate the data flexibly with XML DOM models or XPath. SGML is an HTML document sorting tool class library:

Microsoft's XML guru Chris Lovett has developed a SGML parser called Sgmlreader, which parses HTML files and even transforms them into a format specification structure. Sgmlreader is derived from XmlReader, which means that you can parse an HTML file as you would with a class such as XmlTextReader to parse an XML file.

Here is a sample code:

Public static XmlDocument converthtmltoxml (string html)

using (sgmlreader sgmlreader = new Sgmlreader ()) {

Sgmlreader.doctype = "HTML";

Sgmlreader.inputstream = new StringReader (HTML);

using (StringWriter StringWriter = new StringWriter ()) {

using (XmlTextWriter xmlWriter = new XmlTextWriter (StringWriter))

{

While (!sgmlreader.eof) {

Xmlwriter.writenode (Sgmlreader, true);

}

}

}

}

XmlDocument xmldoc = new XmlDocument ();

Xmldoc.loadxml (Stringwriter.tostring ());

return xmldoc;

Home: Http://code.msdn.microsoft.com/SgmlReader

Language: English Licensing form: Open source

Related website:


Http://msdn.microsoft.com/en-us/library/aa302299.aspx

Download page (SourceForge) sgmlreader 1.8 MSDN Code Gallery

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.