C # Write crawler, version V2.0,

Last Update:2016-06-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This version mainly targets Baidu images and crawls them to achieve the most basic download function. However, it has many defects and will be improved in the future.

Open Baidu images and open developer tools. We will find that Baidu images are loaded using the following ajax.

Http://image.baidu.com/search/index? Tn = baiduimage & ipn = r & ct = 201326592 & cl = 2 & lm =-1 & st =-1 & fm = result & fr = & sf = 1 & fmq = 1466428638972_R & pv = & ic = 0 & nc = 1 & z = & se = 1 & showtab = 0 & fb = 0 & width = & height = & face = 0 & istype = 2 & ie = UTF-8 & word = % E5 % 94% 90% E5 % AB % A3 & f = 3 & oq = tangyan & rsp = 0

Here, we only need to understand that the word is followed by our keywords, so this is better. Combined with a part of the V1.0 code, it will soon be developed, and the principle is similar to V1.0.

The background code is as follows:

Using System; using System. collections. generic; using System. componentModel; using System. data; using System. drawing; using System. IO; using System. linq; using System. net; using System. text; using System. threading. tasks; using System. windows. forms; using Newtonsoft. json. linq; using Newtonsoft. json; using System. text. regularExpressions; namespace Dynamic Web Crawler for Baidu images {public partial class Form1: Form {static int count = 0; public Form1 () {InitializeComponent ();} private void btnDo_Click (object sender, EventArgs e) {int pageCount = 2; string keyword = this. keyWords. text; for (int I = 0; I <pageCount; I ++) {HttpWebRequest request = (HttpWebRequest) HttpWebRequest. create (" http://image.baidu.com /Search/index? Tn = baiduimage & ipn = r & ct = 201326592 & cl = 2 & lm =-1 & st =-1 & fm = result & fr = & sf = 1 & fmq = 1466307565574_R & pv = & ic = 0 & nc = 1 & z = & se = 1 & showtab = 0 & fb = 0 & width = & height = & face = 0 & istype = 2 & ie = UTF-8 & word = "+ keyword. toString (); using (HttpWebResponse response = (HttpWebResponse) request. getResponse () {if (response. statusCode = HttpStatusCode. OK) {using (Stream stream = response. getResponseStream () {try {// Download all images on the specified page Download Page (stream);} catch (Exception ex) {// cross-thread access to the UI thread's txtLogs} else {// MessageBox. show ("Get page" + pageCount + "failed:" + response. statusCode) ;}} MessageBox. show ("executed successfully, total" + count. toString () + "image");} private static string [] getLinks (string html) {const string pattern = @ "http: // ([\ w-] + \.) + [\ w-] + (/[\ w -. /? % & =] *)? "; Regex r = new Regex (pattern, RegexOptions. ignoreCase); // create the regular expression mode MatchCollection m = r. matches (html); // obtain the matching result string [] links = new string [m. count]; int count = 0; for (int I = 0; I <m. count; I ++) {if (isValiable (m [I]. toString () {links [count] = m [I]. toString (); // extract the result count ++;} return links;} private void DownloadPage (Stream stream) {using (StreamReader reader = new StreamReader (stream) {s Tring r1; StringBuilder sb = new StringBuilder (); while (r1 = reader. ReadLine ())! = Null) {sb. append (r1);} FileStream aFile = new FileStream (".. /.. /txt.txt ", FileMode. openOrCreate); StreamWriter sw = new StreamWriter (aFile); // store the webpage to the txt text file sw. writeLine (sb. toString (); sw. close (); string [] s; s = getLinks (sb. toString (); int I = 0; for (I = 0; I <s. count (); I ++) {if (s [I]! = Null | s [I]! = "") {Count ++; savePicture (s [I]) ;}} this. label2.Text = count. toString () ;}} private static bool isValiable (string url) {if (url. contains (". jpg ") | url. contains (". gif ") | url. contains (". png ") {return true; // obtain resources such as images} return false;} private static void savePicture (string path) {DataClasses1DataContext db = new DataClasses1DataContext (); uri url = new Uri (path); HttpWebRequest webRequest = (HttpWebRequest) HttpWebRequest. create (url); webRequest. referer =" http://image.baidu.com "; HttpWebResponse webResponse = (HttpWebResponse) webRequest. GetResponse (); if (isValiable (path) // you can store an image in the database if it is an image. {Bitmap myImage = new Bitmap (webResponse. getResponseStream (); MemoryStream MS = new MemoryStream (); myImage. save (MS, System. drawing. imaging. imageFormat. jpeg); var p = new pictureUrl {pictureUrl1 = ms. toArray ()}; db. pictureUrl. insertOnSubmit (p); db. submitChanges ();}}}}

Demo effect:

This program only solves the problem, there are still many problems, will continue to solve in the future.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

C # Write crawler, version V2.0,

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

C # Write crawler, version V2.0,

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support