[Mobile platform project learning and analysis] use of TFHpple package in HTML Parsing

Source: Internet
Author: User

1 third-party API TFHpple package: Click to open the link

3. Use TFHpple.

1) import class file: # import TFHpple. h

2) package URL to get data

 

    NSString *urlString = @http://www.weiphone.com/apple/news/index_1.shtml;        NSData *htmlData = [[NSData alloc]initWithContentsOfURL:[NSURL URLWithString:urlString]];

If the webpage is not UTF-8 encoded, it may not be recognized. We need to convert. We need to see the Header Format of the source code to be converted.

 

 

 // NSData *toHtmlData = [self toUTF8:htmlData];
- (NSData *) toUTF8:(NSData *)sourceData{    CFStringRef gbkStr = CFStringCreateWithBytes(NULL, [sourceData bytes], [sourceData length], kCFStringEncodingGB_18030_2000, false);        if (gbkStr == NULL) {        return nil;    }    else    {        NSString *gbkString = (NSString *)gbkStr;                NSString *utf8_string = [gbkString                stringByReplacingOccurrencesOfString:@META http-equiv=X-UA-Compatible content=IE=EmulateIE7                                 withString:@META http-equiv=Content-Type content=ext/html; charset=UTF-8];                return [utf8_string dataUsingEncoding:NSUTF8StringEncoding];        }

3) convert the Data file to a TFHpple object

 

 

TFHpple *xpathparser = [[TFHpple alloc]initWithHTMLData:htmlData];
4) write the corresponding syntax rules for conversion to get the corresponding array

 

 

 NSArray *array1 = [xpathparser searchWithXPathQuery:@//div[@id='news']//div//div[2]//h3//a[1]];    NSLog(@%@,[array1 objectAtIndex:0]);

4. Learn the Xpath syntax.

 

 

XPath uses path expressions to select nodes or node sets in XML documents. Nodes are selected by following the path or step (steps.

XML instance document

We will use this XML document in the following example.

 
 
  
     
   
    29.99
   
  
  
     
   
    39.95
   
  
 
Select Node

XPath uses path expressions to select nodes in XML documents. Nodes are selected by following the path or step.

The most useful path expressions are listed below:
Expression Description
Nodename Select all child nodes of the node.
/ Select from the root node.
// Select the nodes in the document from the current node that matches the selected node, regardless of their location.
. Select the current node.
.. Select the parent node of the current node.
@ Select attributes.
Instance

In the following table, we have listed some path expressions and expression results:

Path expression Result
Bookstore Select All subnodes of the bookstore element.
/Bookstore

Select the root element bookstore.

Note: If the path starts with a forward slash (/), the path always represents the absolute path to an element!

Bookstore/book Select all the book elements that belong to the sub-elements of bookstore.
// Book Select All book child elements regardless of their location in the document.
Bookstore // book Select all the book elements belonging to the descendant of the bookstore element, regardless of where they are located under the bookstore.
// @ Lang Select all properties named lang.
Predicates)

It is used to find a specific node or a node that contains a specified value.

The predicates are embedded in square brackets.

Instance

In the following table, we list some path expressions with predicates and the results of the expressions:

Path expression Result
/Bookstore/book [1] Select the first book element that belongs to the bookstore sub-element.
/Bookstore/book [last ()] Select the last book element that belongs to the bookstore sub-element.
/Bookstore/book [last ()-1] Select the penultimate book element that belongs to the bookstore sub-element.
/Bookstore/book [position () <3] Select the first two bookstore sub-elements.
// Title [@ lang] Select all the title elements with the lang attribute.
// Title [@ lang = 'eng'] Select All title elements and these elements have the lang attribute whose value is eng.
/Bookstore/book [price> 35.00] Select all the book elements of the bookstore element, and the value of the price element must be greater than 35.00.
/Bookstore/book [price> 35.00]/title Select all the title elements of the book element in the bookstore element, and the value of the price element must be greater than 35.00.
Select unknown Node

The XPath wildcard can be used to select unknown XML elements.

Wildcard Description
* Match any element node.
@* Match any attribute node.
Node () Match any type of nodes.
Instance

In the following table, we list some path expressions and the results of these expressions:

Path expression Result
/Bookstore /* Select all child elements of the bookstore element.
//* Select all elements in the document.
// Title [@ *] Select All title elements with attributes.
Select several paths

You can select several paths by using the "|" operator in the path expression.

Instance

In the following table, we list some path expressions and the results of these expressions:

Path expression Result
// Book/title | // book/price Select all the title and price elements of the book element.
// Title | // price Select all the title and price elements in the document.
/Bookstore/book/title | // price Select all the title elements of the book element that belongs to the bookstore element and all the price elements in the document.

 



 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.