Use objective-C to parse HTML or XML. The system uses either libxml or nsxmlparser. However, both of these methods require writing a lot of code to process captured content, which is not intuitive.
There is a better class library, hpple, which is a lightweight packaging framework that can solve this problem well. It uses XPath to locate and parse HTML or XML.
Installation steps:
-Add libxml2 to your project.
Menu project-> edit Project Settings
Search for "header search paths"
Add a new search path "$ {sdkroot}/usr/include/libxml2 ″
Enable recursive Option
-Add libxml2 library to your project
Menu project-> edit Project Settings
Search for "Other linker flags"
Add a new search flag "-lxml2 ″
-Add the following hpple source code to your project:
Htfpple. h
Htfpple. m
Htfppleelement. h
Htfppleelement. m
Xpathquery. h
Xpathquery. m
-XPath learning address http://www.w3schools.com/XPath/default.asp
Sample Code:
<Br/> # import "tfhpple. H "</P> <p> nsdata * Data = [[nsdata alloc] initwithcontentsoffile: @" example.html "]; </P> <p> // create parser <br/> xpathparser = [[tfhpple alloc] initwithhtmldata: Data]; </P> <p> // get all the cells of the 2nd row of the 3rd table <br/> nsarray * elements = [xpathparser search: @ "// table [3]/TR [2]/TD"]; </P> <p> // access the first cell <br/> tfhppleelement * element = [elements objectatindex: 0]; </P> <p> // get the text within the cell tag <br/> nsstring * content = [element content]; </P> <p> [xpathparser release]; <br/> [data release]; <br/>
In addition, there is a similar solution for Reference
Elementparser http://github.com/Objective3/ElementParser