Use XPath to parse HTML to get webpage content

Source: Internet
Author: User
Tags xml xpath

Sometimes, the applications we develop need to capture the content of web pages for their own use, such as the weather information and news of QQ websites, unlike the search crawler mechanism such as Google, the crawling target page is known to developers. We have reason to avoid the tedious analysis process of using regular expressions too much. It would be nice to parse HTML through DOM after obtaining the HTML of the target webpage. There are two problems here. DOM operations can only be performed on the client using Javascript, VBScript, and other scripting languages. In addition, HTML itself is not in a non-strong format, you cannot use methods similar to XSL for XML parsing. However, since this Blog is written, there must be a solution J.

Thank you very much for the open-source SgmlReader project brought to us by Microsoft XML Master Chris Lovett. We know that XML and HTML are both a subset of Sgml. Through SgmlReader, you can convert HTML to generate a standard HTML (Well-Formed HTML, although this is not called, but this is what we will say for the time being ), the XML XPath syntax can be used to read webpage data. In. NET Framework, the problems we encounter become so easy.

A simple example program is written to capture the weather information of the QQ website and obtain the webpage content by changing the city name and XPath.


Code:Download

PS: In addition to SgmlReader, the. NET Html Agility Pack of Simon Mourier also has similar functions.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.