iOS解析HTML

來源:互聯網
上載者:User

xml,json都有大量的庫來解析,我們如何解析html呢?

TFHpple是一個小型的封裝,可以用來解析html,它是對libxml的封裝,文法是xpath。

今天我看到一個直接用libxml來解析html,參看:http://www.cocoanetics.com/2011/09/taming-html-parsing-with-libxml-1/#comment-3090 那張圖畫得一目瞭然,很值得收藏。這個文章中的源碼不能遍曆所有的html,我做了一點修改可以將html遍曆列印出來

// NSData data contains the document data// encoding is the NSStringEncoding of the data// baseURL the documents base URL, i.e. location  CFStringEncoding cfenc = CFStringConvertNSStringEncodingToEncoding(encoding);CFStringRef cfencstr = CFStringConvertEncodingToIANACharSetName(cfenc);const char *enc = CFStringGetCStringPtr(cfencstr, 0); htmlDocPtr _htmlDocument = htmlReadDoc([data bytes],      [[baseURL absoluteString] UTF8String],      enc,      XML_PARSE_NOERROR | XML_PARSE_NOWARNING);if (_htmlDocument){   xmlFreeDoc(_htmlDocument);}xmlNodePtr currentNode = (xmlNodePtr)_htmlDocument;while (currentNode) {// output node if it is an elementif (currentNode->type == XML_ELEMENT_NODE){NSMutableArray *attrArray = [NSMutableArray array];for (xmlAttrPtr attrNode = currentNode->properties; attrNode; attrNode = attrNode->next){xmlNodePtr contents = attrNode->children;[attrArray addObject:[NSString stringWithFormat:@"%s='%s'", attrNode->name, contents->content]];}NSString *attrString = [attrArray componentsJoinedByString:@" "]; if ([attrString length]){attrString = [@" " stringByAppendingString:attrString];}NSLog(@"<%s%@>", currentNode->name, attrString);}else if (currentNode->type == XML_TEXT_NODE){//NSLog(@"%s", currentNode->content);NSLog(@"%@", [NSString stringWithCString:(const char *)currentNode->content encoding:NSUTF8StringEncoding]);}else if (currentNode->type == XML_COMMENT_NODE){NSLog(@"/* %s */", currentNode->name);}if (currentNode && currentNode->children){currentNode = currentNode->children;}else if (currentNode && currentNode->next){currentNode = currentNode->next;}else{currentNode = currentNode->parent;// close nodeif (currentNode && currentNode->type == XML_ELEMENT_NODE){NSLog(@"</%s>", currentNode->name);}if (currentNode->next){currentNode = currentNode->next;}else {while(currentNode){currentNode = currentNode->parent;if (currentNode && currentNode->type == XML_ELEMENT_NODE){NSLog(@"</%s>", currentNode->name);if (strcmp((const char *)currentNode->name, "table") == 0){NSLog(@"over");}}if (currentNode == nodes->nodeTab[0]){break;}if (currentNode && currentNode->next){currentNode = currentNode->next;break;}}}}if (currentNode == nodes->nodeTab[0]){break;}}

不過我還是喜歡用TFHpple,因為它很簡單,也好用,但是它的功能不是很完完善。比如,不能擷取children node,我就寫了兩個方法,一個是擷取children node,一個是擷取所有的contents.  還有node的屬性content的key與node's content的key一樣,都是@"nodeContent", 正確情況下屬性的應是@"attributeContent",

所以我寫了這個方法,同時修改node屬性的content key.

NSDictionary *DictionaryForNode2(xmlNodePtr currentNode, NSMutableDictionary *parentResult){NSMutableDictionary *resultForNode = [NSMutableDictionary dictionary];if (currentNode->name)    {NSString *currentNodeContent =        [NSString stringWithCString:(const char *)currentNode->name encoding:NSUTF8StringEncoding];[resultForNode setObject:currentNodeContent forKey:@"nodeName"];    }if (currentNode->content){NSString *currentNodeContent = [NSString stringWithCString:(const char *)currentNode->content encoding:NSUTF8StringEncoding];if (currentNode->type == XML_TEXT_NODE){if (currentNode->parent->type == XML_ELEMENT_NODE){[parentResult setObject:currentNodeContent forKey:@"nodeContent"];return nil;}if (currentNode->parent->type == XML_ATTRIBUTE_NODE){[parentResult setObject: [currentNodeContent  stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]] forKey:@"attributeContent"];return nil;}}}xmlAttr *attribute = currentNode->properties;if (attribute)    {NSMutableArray *attributeArray = [NSMutableArray array];while (attribute)        {NSMutableDictionary *attributeDictionary = [NSMutableDictionary dictionary];NSString *attributeName =            [NSString stringWithCString:(const char *)attribute->name encoding:NSUTF8StringEncoding];if (attributeName)            {[attributeDictionary setObject:attributeName forKey:@"attributeName"];            }if (attribute->children)            {NSDictionary *childDictionary = DictionaryForNode2(attribute->children, attributeDictionary);if (childDictionary)                {[attributeDictionary setObject:childDictionary forKey:@"attributeContent"];                }            }if ([attributeDictionary count] > 0)            {[attributeArray addObject:attributeDictionary];            }attribute = attribute->next;        }if ([attributeArray count] > 0)        {[resultForNode setObject:attributeArray forKey:@"nodeAttributeArray"];        }    }xmlNodePtr childNode = currentNode->children;if (childNode)    {NSMutableArray *childContentArray = [NSMutableArray array];while (childNode)        {NSDictionary *childDictionary = DictionaryForNode2(childNode, resultForNode);if (childDictionary)            {[childContentArray addObject:childDictionary];            }childNode = childNode->next;        }if ([childContentArray count] > 0)        {[resultForNode setObject:childContentArray forKey:@"nodeChildArray"];        }    }return resultForNode;}

TFHppleElement.m裡加了兩個key 常量

NSString * const TFHppleNodeAttributeContentKey  = @"attributeContent";NSString * const TFHppleNodeChildArrayKey        = @"nodeChildArray";

並修改擷取屬性方法為:

- (NSDictionary *) attributes{  NSMutableDictionary * translatedAttributes = [NSMutableDictionary dictionary];  for (NSDictionary * attributeDict in [node objectForKey:TFHppleNodeAttributeArrayKey]) {    [translatedAttributes setObject:[attributeDict objectForKey:TFHppleNodeAttributeContentKey]                             forKey:[attributeDict objectForKey:TFHppleNodeAttributeNameKey]];  }  return translatedAttributes;}

並添加擷取children node 方法:

- (BOOL) hasChildren{NSArray *childs = [node objectForKey: TFHppleNodeChildArrayKey];if (childs) {return  YES;}return  NO;}- (NSArray *) children{    if ([self hasChildren])        return [node objectForKey: TFHppleNodeChildArrayKey];    return nil;}

最後我還加了一個擷取所有content的主法:

- (NSString *)contentsAt:(NSString *)xPathOrCss;

請看源碼。

參看:http://giles-wang.blogspot.com/2011/08/iphoneansi.html

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.