1 third-party API TFHpple package: Click to open the link
3. Use TFHpple.
1) import class file: # import TFHpple. h
2) package URL to get data
NSString *urlString = @http://www.weiphone.com/apple/news/index_1.shtml; NSData *htmlData = [[NSData alloc]initWithContentsOfURL:[NSURL URLWithString:urlString]];
If the webpage is not UTF-8 encoded, it may not be recognized. We need to convert. We need to see the Header Format of the source code to be converted.
// NSData *toHtmlData = [self toUTF8:htmlData];
- (NSData *) toUTF8:(NSData *)sourceData{ CFStringRef gbkStr = CFStringCreateWithBytes(NULL, [sourceData bytes], [sourceData length], kCFStringEncodingGB_18030_2000, false); if (gbkStr == NULL) { return nil; } else { NSString *gbkString = (NSString *)gbkStr; NSString *utf8_string = [gbkString stringByReplacingOccurrencesOfString:@META http-equiv=X-UA-Compatible content=IE=EmulateIE7 withString:@META http-equiv=Content-Type content=ext/html; charset=UTF-8]; return [utf8_string dataUsingEncoding:NSUTF8StringEncoding]; }
3) convert the Data file to a TFHpple object
TFHpple *xpathparser = [[TFHpple alloc]initWithHTMLData:htmlData];
4) write the corresponding syntax rules for conversion to get the corresponding array
NSArray *array1 = [xpathparser searchWithXPathQuery:@//div[@id='news']//div//div[2]//h3//a[1]]; NSLog(@%@,[array1 objectAtIndex:0]);
4. Learn the Xpath syntax.
XPath uses path expressions to select nodes or node sets in XML documents. Nodes are selected by following the path or step (steps.
XML instance document
We will use this XML document in the following example.
29.99
39.95
Select Node
XPath uses path expressions to select nodes in XML documents. Nodes are selected by following the path or step.
The most useful path expressions are listed below:
Expression |
Description |
Nodename |
Select all child nodes of the node. |
/ |
Select from the root node. |
// |
Select the nodes in the document from the current node that matches the selected node, regardless of their location. |
. |
Select the current node. |
.. |
Select the parent node of the current node. |
@ |
Select attributes. |
Instance
In the following table, we have listed some path expressions and expression results:
Path expression |
Result |
Bookstore |
Select All subnodes of the bookstore element. |
/Bookstore |
Select the root element bookstore. Note: If the path starts with a forward slash (/), the path always represents the absolute path to an element! |
Bookstore/book |
Select all the book elements that belong to the sub-elements of bookstore. |
// Book |
Select All book child elements regardless of their location in the document. |
Bookstore // book |
Select all the book elements belonging to the descendant of the bookstore element, regardless of where they are located under the bookstore. |
// @ Lang |
Select all properties named lang. |
Predicates)
It is used to find a specific node or a node that contains a specified value.
The predicates are embedded in square brackets.
Instance
In the following table, we list some path expressions with predicates and the results of the expressions:
Path expression |
Result |
/Bookstore/book [1] |
Select the first book element that belongs to the bookstore sub-element. |
/Bookstore/book [last ()] |
Select the last book element that belongs to the bookstore sub-element. |
/Bookstore/book [last ()-1] |
Select the penultimate book element that belongs to the bookstore sub-element. |
/Bookstore/book [position () <3] |
Select the first two bookstore sub-elements. |
// Title [@ lang] |
Select all the title elements with the lang attribute. |
// Title [@ lang = 'eng'] |
Select All title elements and these elements have the lang attribute whose value is eng. |
/Bookstore/book [price> 35.00] |
Select all the book elements of the bookstore element, and the value of the price element must be greater than 35.00. |
/Bookstore/book [price> 35.00]/title |
Select all the title elements of the book element in the bookstore element, and the value of the price element must be greater than 35.00. |
Select unknown Node
The XPath wildcard can be used to select unknown XML elements.
Wildcard |
Description |
* |
Match any element node. |
@* |
Match any attribute node. |
Node () |
Match any type of nodes. |
Instance
In the following table, we list some path expressions and the results of these expressions:
Path expression |
Result |
/Bookstore /* |
Select all child elements of the bookstore element. |
//* |
Select all elements in the document. |
// Title [@ *] |
Select All title elements with attributes. |
Select several paths
You can select several paths by using the "|" operator in the path expression.
Instance
In the following table, we list some path expressions and the results of these expressions:
Path expression |
Result |
// Book/title | // book/price |
Select all the title and price elements of the book element. |
// Title | // price |
Select all the title and price elements in the document. |
/Bookstore/book/title | // price |
Select all the title elements of the book element that belongs to the bookstore element and all the price elements in the document. |