Using System. IO;
Using System. text;
Using System. Text. regularexpressions;
Using System. net;
1. Obtain the original webpageCode
Uri URL = New Uri ( " Http://www.blogjava.net/wujun " );
Httpwebrequest request = (Httpwebrequest) webrequest. Create (URL );
Httpwebresponse response = (Httpwebresponse) request. getresponse ();
Stream stream = Response. getresponsestream ();
Streamreader SR = New Streamreader (Stream );
String Str = Sr. readtoend ();
Sr. Close ();
Stream. Close ();
Response. Close ();
Obtain the HTML of the webpageSource codeLater. Analyze all <a href = "url"> according to the source code, and finally obtain the URL of the URL after href.
Regular Expression RegEx regexfindhref = New RegEx ( @" <A \ s + ([^>] * \ s *)? Href \ s * = \ s *(? :""(? <1> [/\ a-z0-9 _] [^ ""] *) "" | '(? <1> [/\ a-z0-9 _] [^ '] *)'
| (? <1> [/\ a-z0-9 _] \ s *) (\ s [^>] *)?> (? <2> .*?) </A> " , Regexoptions. singleline | Regexoptions. ignorecase | Regexoptions. Compiled );
Loop read connection address
For (Match m = Regexfindhref. Match (STR); M. success; m = M. nextmatch ())
{
Textbox1.text + = M. Groups [1 ]. Tostring () + " \ N " ;
}
After running
Textbox1 displays connections to all webpages after analysis:
http://www.dotlucene.net/
http://www.castleproject.org/
http://www.codeplex.com/
http://www.codeproject.com/
http://www.asp.net/
http://www.nhibernate.org/
http://www.blogjava.net/wujun/CommentsRSS.aspx
http://www.blogjava.net/wujun/archive/2006/10/23/47150.html#76745
> http://www.blogjava.net/wujun/archive/2006/10/23.html
http://www.blogjava.net/wujun/archive/2006/10/23 /76769.html
http://www.blogjava.net/wujun/archive/2006/10/23/76769.html
http://www.blogjava.net/wujun/archive/2006/10/23/76769.html#FeedBack
http://www.blogjava.net/wujun/admin/EditPosts.aspx? Postid = 76769
http://www.blogjava.net/wujun/AddToFavorite.aspx? Id = 76769
http://www.blogjava.net/wujun/archive/2006/10/20.html
......
..............
......................... and so on...