HTML data parsing with open source code HTMLPARSER:HTMLNODE.M HTMLNode.h htmlparser.m HTMLParser.h
To this URL can be found: https://github.com/
There are three steps ahead of parsing your data:
1 adding LIBXML2 libraries to your project
2: Add/USR/INCLUDE/LIBXML2 in header Search path
3: Add the Open source code to the project. And in the introduction header file
So we can start parsing the HTML data.
First, we download a random HTML data. (Here is an example so just download with a simple sync, in your own application to use asynchronous download)
Nsurl *url=[nsurl urlwithstring:@"http://vip.astro.sina.com.cn/iframe/astro/view/cancer/day/"]; NSString*htmlstr=[[NSString Alloc]initwithcontentsofurl:url encoding:nsutf8stringencoding Error:nil]; Nserror*error; //Parsing HTML documents//to create an Htmlparser objectHtmlparser *parser = [[Htmlparser alloc] Initwithstring:str error:&ERROR]; if(Error) {NSLog (@"%@", error); return; }
It puts the HTML data in a Htmlparser object.
The content of this HTML data is shown here (a certain limitation here is completely useless for us to remove)
<! DOCTYPE HTML Public"-//W3C//DTD XHTML 1.0 transitional//en" "HTTP://WWW.W3.ORG/TR/XHTML1/DTD/XHTML1-TRANSITIONAL.DTD">"http://www.w3.org/1999/xhtml">"Content-type"Content="text/html; Charset=utf-8"/><title> Cancer _ Daily Horoscope _ Constellation Channel _ Sina </title><link href="Http://vip.astro.sina.com.cn/app/astro/css/mindcity_utf8.css"Rel="stylesheet"Type="Text/css"/>ID="West"><divID="Middle2"><divclass="IoT"><divclass=" Left" ID="weiboimg">"Http://image2.sina.com.cn/ast/2007index/tmp/star_php/cancer_b.gif"Border="0"align=" Left"/><span> Cancer <em> ./ A- -/ A</em></span></div>author Yu-Love Sina exclusive writer</cite></div><divclass="Clear"></div><ulclass="Daysnav"><liclass="Buton"><a href='/astro/view/cancer/day/20140816'> Today's horoscope </a></li><liclass="Butof"><a href='/astro/view/cancer/day/20140817'> Tomorrow's horoscope </a></li><liclass="Datea"> Valid date: the- ,- -</li></ul><divclass="Tab">class="Clear"></div><divclass="Tab"> -%</p></div><divclass="Tab"> -%</p></div><divclass="Clear"></div><divclass="Tab">class="Tab">2</p></div><divclass="Clear"></div><divclass="Tab">class="Tab">class="Clear"></div></div><divclass="Clear"></div><divclass="lotconts"> Shape on the gorgeous let the inner also add sexy charm, in addition to intelligence together with the multiplier effect, not only beautiful, but also beautiful smart yo.
And today, for women, you will be able to get a good evaluation of your sensibility, such as handicrafts, and the need for skillful and thoughtful interest.
Hobbies have good works can be a premonition of the period. </div></div><!--Horoscope Content End--><divclass="Clear"></div></div></div></body>Now we can use the method in the Htmlparse to do a step-by-step analysis of the
//gets the body part of the HTMLHtmlnode *node =[parser body]; //On the basis of node, find andThe words that have been summed upHtmlnode *sum=[node Findchildofclass:@"lotconts"];//This method is to find a property named "lotconts" node//On the basis of node, find and fetch Get valid dateHtmlnode *effectdate=[node Findchildofclass:@"Datea"]; //On the basis of node, find and fetch get the constellation nameHtmlnode *name=[node Findchildtag:@"span"]; //This method is to find a label called " span the node
//On the basis of node, find and fetch Get constellation time period
Htmlnode *time=[name Findchildtag:@"em"];
//get a link to the constellation Picture
Htmlnode *image=[node Findchildtag:@"img"];
NSString*pic=[image getattributenamed:@"src"];
Nsurl*url1=[nsurl Urlwithstring:pic];
Get the content of the node
[Name contents];//returns a string here is the content: cancer
Similarly
[Sum contents];//content is: The shape of the gorgeous let the inner also add sexy charm, in addition to the wisdom of the multiplication function together, not only beautiful, but also beautiful wisdom yo.
And today, for women, you will be able to get a good evaluation of your sensibility, such as handicrafts, and the need for skillful and thoughtful interest.
Hobbies have good works can be a premonition of the period.
The parsing of HTML data is based on: tags, attributes (Attribute) to use the method in Htmlparser, to find the child nodes we need.
Remember that the last we found are child nodes, we want to get content or to-(nsstring*) contents; method to obtain.