This is actually not a complete network of worms, just pick up a social network, and then get the adjacency matrix, and the corresponding avatar and other information.
The main steps:
1, pick up the information
2, regular match
Regular expressions are mainly for reference: http://deerchao.net/tutorials/regex/regex.htm
Picking up information is WebClient this method is more concise than the HttpRequest HttpResponse.
The difficulty is to overcome the site's authentication mechanism, using the method of saving cookies.
Bug Part code:
Usingsystem;
Usingsystem.net;
Usingsystem.text;
UsingSystem.Text.RegularExpressions;
Publicclasscrawler
{
Publicstaticstringgetcont (StringUrl)//Pick up page
{
stringcookies= "_r01_=;d epovince=bj;p=;ap=;t=;societyguester=55a777838c4286ab5f657382dbd25c736;id=;xnsid=";
Webclientwebc=newwebclient ();
WEBC.HEADERS.ADD ("cookies", cookies);
Byte[]webpa=webc.downloaddata (URL);
Stringpagehtml=encoding.utf8.getstring (WEBPA);
returnpagehtml;
}
Publicstaticvoidgetimag (stringimgurl,stringusername)/download small picture
{
Stringimagefilename;
Stringimagefilepath;
Webclientmyclient=newwebclient ();
Regexregex=newregex ("//w*");
Matchcollectionusnamematches=regex. Matches (UserName);
Imagefilename=usnamematches[0]. Value.tostring () + ". jpg";
imagefilepath=@ "D:/picture/" +imagefilename;
Try
{
Myclient.downloadfile (Imgurl,imagefilepath);
}
Catch
{
}
}
}
Part of regular expressions
Usingsystem.text;
UsingSystem.Text.RegularExpressions;
Publicclassmyregex
{
PUBLICSTATICSTRING[]GETADDR (stringpagehtml)
{
STRING[]PAGEURL=NEWSTRING[24];
Regexregex=newregex ("http://www..com/profile.do//?portal=//w*&id=//d+" (=/"//stitle=)");
Matchcollectionurlmatches=regex. Matches (pagehtml);
for (inti=0;i<urlmatches.count;i++)
{
Pageurl[i]=urlmatches[i]. Value.tostring ();
}
Returnpageurl;
}
PUBLICSTATICSTRING[]GETIMGADDR (stringpagehtml)
{
STRING[]IMGADDR=NEWSTRING[24];
Regexregex=newregex ("(? <=stats=/" pf_friend/"//ssrc//=/"). * (? =/"//swidth=/" 50/"//s/>"));
Matchcollectionimgmatches=regex. Matches (pagehtml);
for (inti=0;i
{
Imgaddr[i]=imgmatches[i]. Value.tostring ();
}
RETURNIMGADDR;
}
Publicstaticstring[]getusname (stringpagehtml)
{
STRING[]USNAME=NEWSTRING[24];
Regexregex=newregex ("<=title=/" view). * (? = Personal homepage/">//w
Matchcollectionusnamematches=regex. Matches (pagehtml);
for (inti=0;i<usnamematches.count;i++)
{
Usname[i]=usnamematches[i]. Value.tostring ();
}
Returnusname;
}
}