Recently just Web page crawl, crawl down need to parse, so found some information on the internet, also asked my brother, finally combined with the open source knowledge on the Internet, completed the use of Htmlcxx.
vs2013.
The first thing to do is download Htmlcxx:
Https://github.com/dhoerl/htmlcxx
Or you can Baidu down to download one.
Next, unzip the file file, I will open the htmlcxx.vcproj with vs2013, click Generate.
Build good can, click Debugging there will be errors, we do not need to debug.
Create a Win32 console and click Done directly.
Next will debug the Htmlcxx.lib and file CSS, html
Pull into the project file you said to create:
Next
Adding code to the CPP in the file source
#include"stdafx.h"#include<string.h>#include<iostream>#include"html/parserdom.h"#include"Html/utils.h"#include<fstream>#ifDefined (WIN32) && defined (_DEBUG)Char* locale = setlocale (Lc_all,". OCP");#endif #pragmaComment (lib, "Htmlcxx.lib")#defineSTRCASECMP _stricmpusing namespacestd;using namespaceHtmlcxx;int_tmain (intARGC, _tchar*argv[]) { //usehtmlcxxanalysishtmlstringtestcase (); //parsing a section of HTML code stringHTML =""; HTML::P arserdom parser; Tree<HTML::Node> dom =parser.parsetree (HTML); //output entire DOM treecout<< Dom <<Endl; //all hyperlink nodes in the output treeTreeDom.begin (); TreeDom.end (); for(; it! = end; + +)it) { if(STRCASECMP (It->tagname (). C_STR (),"A") ==0) {It-parseattributes (); cout<< It->attribute ("href"). Second <<Endl; } } //Output All text nodesit=Dom.begin (); End=Dom.end (); for(; it! = end; + +)it) { if((!it->istag ()) && (!it->iscomment ())) {cout<< it->text (); } }*/cout<<Endl; Cin.Get(); return 0;}
Results:
That's it.
Thank bloggers here:
Http://www.cnblogs.com/zhanglanyun/archive/2011/10/21/2220647.html
Http://www.cppblog.com/luonjtu/archive/2009/03/13/76332.html
http://blog.csdn.net/farcall/article/details/20378475
At the same time thank the open source and the efforts to share.