Crawler
Also known as spider, it is a way to capture resources from other websites,
C #
. Net
UseCrawler
The method is as follows:
Protected
String
Getpagehtml
(String
URL
)
{
String
Pageinfo
;
Try
{
Webrequest
Myreq
= Webrequest
. Create
(URL
);
Webresponse
Myrep
= Myreq
. Getresponse
();
Streamreader
Reader
= New
Streamreader
(Myrep
. Getresponsestream
(), Encoding
. Getencoding
("Gb2312 ″
));
Pageinfo
= Reader
. Readtoend
();
}
Catch
{
Pageinfo
= ""
;
}
Return
Pageinfo
;
}
You can obtain a URL in the program using the above method.
Page source file.
However, some websites are blocked.Crawler
, You need to simulate the method obtained by the browser. The specific code is as follows:
Protected
String
Getpagehtml
(String
URL
)
{
String
Pageinfo
;
Try
{
Httpwebrequest
Myreq
= (Httpwebrequest
) Httpwebrequest
. Create
(URL
);
Myreq
. Accept
= "Image/GIF,
Image/X-xbitmap, image/JPEG, image/pjpeg,
Application/X-Shockwave-flash, application/vnd. MS-Excel,
Application/vnd. MS-PowerPoint, application/MSWord ,*/*"
;
Myreq
. Useragent
= "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; sv1;. Net CLR 2.0.50727 )"
;
Httpwebresponse
Myrep
= (Httpwebresponse
) Myreq
. Getresponse
();
Stream
Mystream
= Myrep
. Getresponsestream
();
Streamreader
Sr
= New
Streamreader
(Mystream
, Encoding
. Default
);
Pageinfo
= Sr
. Readtoend
(). Tostring
();
}
Catch
{
Pageinfo
= ""
;
}
Return
Pageinfo
;
}