Parse Html data in wIndows phone 7

Source: Internet
Author: User

In my previous article, I introduced gb2312 Decoding for windows phone 7,

Http://www.cnblogs.com/qingci/archive/2011/11/25/2263124.html

This article describes how to parse Html data in windows phone 7 to obtain the desired data.

Here, I will first introduce a class library HtmlAgilityPack (this tool was used to decode in the previous article). The dll file of the class library will be provided along with the demo

Here, I use Sina news as an example to parse data

 

Let's take a look at Sina news on the webpage

Http://news.sina.com.cn/w/sd/2011-11-27/070023531646.shtml

Then let's take a look at his source file,

The structure of news content is found to be


The result looks like this:

Most tags on web pages have no ID attribute, but fortunately HtmlAgilityPack supports XPath

Then you need to find the matching node through XPATH language

XPath tutorial: http://www.w3school.com.cn/xpath/index.asp

Zh

Case download:

http://115.com/file/dn87dl2d#
MyFramework_Test.zip




Most also have the ID attribute, which is more suitable for us to parse.

Next we start to parse

First: Reference the HtmlAgilityPack.dll file

Second: Use the WebClient or WebRequest class to download the HTML page and process it into a string.


 public  delegate void CallbackEvent(object sender, DownloadEventArgs e);
        public  event CallbackEvent DownloadCallbackEvent;
        public void HttpWebRequestDownloadGet(string url)
        {
            
            Thread _thread = new Thread(delegate()
            {
                Uri _uri = new Uri(url, UriKind.RelativeOrAbsolute);
                HttpWebRequest _httpWebRequest = (HttpWebRequest)WebRequest.Create(_uri);
                 _httpWebRequest.Method="Get";
              
                _httpWebRequest.BeginGetResponse(new AsyncCallback(delegate(IAsyncResult result)
                {
                    HttpWebRequest _httpWebRequestCallback = (HttpWebRequest)result.AsyncState;
                    HttpWebResponse _httpWebResponseCallback = (HttpWebResponse)_httpWebRequestCallback.EndGetResponse(result);
                    Stream _streamCallback = _httpWebResponseCallback.GetResponseStream();

                    StreamReader _streamReader = new StreamReader(_streamCallback,new HtmlAgilityPack.Gb2312Encoding());
                    string _stringCallback = _streamReader.ReadToEnd();
                 
                    Deployment.Current.Dispatcher.BeginInvoke(new Action(() =>
                    {
                        if (DownloadCallbackEvent != null)
                        {
                            DownloadEventArgs _downloadEventArgs = new DownloadEventArgs();
                            _downloadEventArgs._DownloadStream = _streamCallback;
                            _downloadEventArgs._DownloadString = _stringCallback;
                            DownloadCallbackEvent(this, _downloadEventArgs);

                        }
                    }));

                }), _httpWebRequest);
            }) ;
            _thread.Start();
        }
       // }






O (∩_∩) O! I am more complicated. In short, we just download the html data.

Post a simple download method


WebClient webClenet = new WebClient ();

          webClenet.Encoding = new HtmlAgilityPack.Gb2312Encoding (); // Add this sentence to set the encoding

          webClenet.DownloadStringAsync (new Uri ("http://news.sina.com.cn/s/2011-11-25/120923524756.shtml", UriKind.RelativeOrAbsolute));

          webClenet.DownloadStringCompleted + = new DownloadStringCompletedEventHandler (webClenet_DownloadStringCompleted);







Now handle e.Result of callback function



string _result = e._DownloadString;

            HtmlDocument _doc = new HtmlDocument (); // Instantiate HtmlAgilityPack.HtmlDocument object
            _doc.LoadHtml (_result); // Load HTML

            HtmlNode _htmlNode01 = _doc.GetElementbyId ("artibodyTitle"); // Div for news title
            string _title = _htmlNode01.InnerText;

            HtmlNode _htmlNode02 = _doc.GetElementbyId ("artibody"); // Get content div
            string _content = _htmlNode02.InnerText;
           // int _count = _htmlNode02.ChildNodes.Where (new Func <HtmlNode, bool> ("div"));
            int _divIndex = _content.IndexOf (".blkComment");

            _content = _content.Substring (0, _divIndex);

            #region Sina tags
            HtmlNode _htmlNodo03 = _doc.GetElementbyId ("art_source");
            string _www = _htmlNodo03.FirstChild.InnerText;
            string _wwwInt = _htmlNodo03.FirstChild.Attributes [0] .Value;
            #endregion
            // string _source = _htmlNodo03;
            //_htmlNodo03.ChildNodes

            #region release time
            HtmlNode _htmlNodo04 = _doc.GetElementbyId ("pub_date");
            string _pub_date = _htmlNodo04.InnerText;
            #endregion


            #region Source site information
            HtmlNode _htmlNodo05 = _doc.GetElementbyId ("media_name");
            string _media_name = _htmlNodo05.FirstChild.InnerText;
            string _modia_source = _htmlNodo05.FirstChild.Attributes [0] .Value;
            #endregion

            Media_nameHyperlinkButton.Content = _pub_date + "" + _media_name;
            Media_nameHyperlinkButton.NavigateUri = new Uri (_modia_source, UriKind.RelativeOrAbsolute);
            TitleTextBlock.Text = _title;
            ContentTextBlock.Text = _content;







The result looks like this:

Most tags on web pages have no ID attribute, but fortunately HtmlAgilityPack supports XPath

Then you need to find the matching node through XPATH language

XPath tutorial: http://www.w3school.com.cn/xpath/index.asp

 

Case download:

http://115.com/file/dn87dl2d#
MyFramework_Test.zip

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.