Title: C + + Extract Web content series four
Author: itdef
Links: http://www.cnblogs.com/itdef/p/4173833.html
Welcome Reprint please keep the text complete and indicate the source
After downloading the contents of the Web page into a string or local file, we begin to search and query for information.
There are regular libraries for use with regular-use vs2008 with their own TR1 libraries (prep-standard libraries)
Lead file/*******************************************************************************
* @file
* @author def< QQ group:324164944 >
* @blog http://www.cnblogs.com/itdef/
* @brief
/*******************************************************************************/
#include <regex>
using namespace Std::tr1;
using namespace Std;
A regular tutorial is recommended here
Regular Expressions 30-minute introductory tutorial
http://www.cnblogs.com/deerchao/... zhongjiaocheng.html
C++:regex Regular Expressions
Http://blog.sina.com.cn/s/blog_ac9fdc0b0101oow9.html
Let's start with a simple example.
#include <string>#include<iostream>#include<regex>using namespacestd::tr1;using namespacestd;stringStrcontent ="onclick=\ "verycd.trackevent (' base ', ' home big push ', ' Condor ');";voidTest1 () {stringStrText =strcontent; stringStrregex ="Home Big Push"; Regex regexpress (Strregex); Smatch MS; cout<<"*****************************"<<Endl; cout<<"Test 1"<< Endl <<Endl; while(Regex_search (StrText, MS, regexpress)) { for(string:: Size_type i =0; i < ms.size (); + +i) {cout<< ms.str (i). C_STR () <<Endl; } StrText=ms.suffix (). STR (); } cout<<"*****************************"<< Endl <<Endl;}voidTest2 () {stringStrText =strcontent; stringStrregex ="home big Push. * ' (. *) '"; Regex regexpress (Strregex); Smatch MS; cout<<"*****************************"<<Endl; cout<<"Test 2"<< Endl <<Endl; while(Regex_search (StrText, MS, regexpress)) { for(string:: Size_type i =0; i < ms.size (); + +i) {cout<< ms.str (i). C_STR () <<Endl; } StrText=ms.suffix (). STR (); } cout<<"*****************************"<< Endl <<Endl;}int_tmain (intARGC, _tchar*argv[]) {Test1 (); Test2 (); return 0;}
In Test1 we are just searching for a string and then printing out where we found it. Test2 we use the home page push. * ' (. *) '
. And so any non-whitespace newline character * means repeat any number of times (0-Infinity).
and parentheses denote a character set, which is what we need to find.
Note that this parenthesis is the content between ' two ' after the ' search ' for the first page of any character
The effect is as follows:
And we also found that the display law of MS He first shows a string that matches the condition and then actually conforms to the () substring of the condition
Here's an in-depth analysis of this string.
String strContent0 = "alt=\" naruto \ "/><div class=\" play_ico_middle\ "></div><div class=\" cv-title\ " Style=\ "width:85px;\" > Update to 612 episodes </div> ";
The regular rules we use are string Strregex = "Alt=\" ([^\ "]*) \". *width:85px;\ "> (. *) </div>";
Note that there are two parentheses in one of the contents after alt= in two "" Content one is in width:85px;\ "> and </div> Content
Note "The display of the C + + language must be written as \"
Now parse two parenthesis contents ([^\ "]*) (. *)
(. *) Needless to say is any non-whitespace character and is in Width:85px;\ "> and </div> Content
([^\ "]*) means that the content of the" is not "repeated several times and this parenthesis is after alt= in two" "content
The results of the operation are as follows: (in order not to show too much content that matches the condition, not all displays only substrings that meet the parentheses requirements)
/******************************************************************************** @file * @author def< QQ group:324164944 >* @bloghttp://www.cnblogs.com/itdef/* @brief/*******************************************************************************/#include<string>#include<iostream>#include<regex>using namespacestd::tr1;using namespacestd;stringStrcontent ="onclick=\ "verycd.trackevent (' base ', ' home big push ', ' Condor ');";stringStrContent0 ="alt=\ "naruto \"/><div class=\ "play_ico_middle\" ></div><div class=\ "cv-title\" style=\ "width : 85px;\ "> Update to 612 episodes </div>";voidTest1 () {stringStrText =strcontent; stringStrregex ="Home Big Push"; Regex regexpress (Strregex); Smatch MS; cout<<"*****************************"<<Endl; cout<<"Test 1"<< Endl <<Endl; while(Regex_search (StrText, MS, regexpress)) { for(string:: Size_type i =0; i < ms.size (); + +i) {cout<< ms.str (i). C_STR () <<Endl; } StrText=ms.suffix (). STR (); } cout<<"*****************************"<< Endl <<Endl;}voidTest2 () {stringStrText =strcontent; stringStrregex ="home big Push. * ' (. *) '"; Regex regexpress (Strregex); Smatch MS; cout<<"*****************************"<<Endl; cout<<"Test 2"<< Endl <<Endl; while(Regex_search (StrText, MS, regexpress)) { for(string:: Size_type i =0; i < ms.size (); + +i) {cout<< ms.str (i). C_STR () <<Endl; } StrText=ms.suffix (). STR (); } cout<<"*****************************"<< Endl <<Endl;}voidTest3 () {stringStrText =strContent0; stringStrregex ="alt=\ "([^\"]*) \ ". *width:85px;\" > (. *) </div>"; Regex regexpress (Strregex); Smatch MS; cout<<"*****************************"<<Endl; cout<<"Test 3"<< Endl <<Endl; while(Regex_search (StrText, MS, regexpress)) { for(string:: Size_type i =0; i < ms.size (); + +i) {if(I >0) cout<< ms.str (i). C_STR () <<Endl; } StrText=ms.suffix (). STR (); } cout<<"*****************************"<< Endl <<Endl;}int_tmain (intARGC, _tchar*argv[]) {Test1 (); Test2 (); Test3 (); return 0;}
C + + Extract Web content series four