C # Write crawler, version V1.0,

Last Update:2016-06-11 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I read the basic data type in SQL Server and found that the image type is quite special.

So I just created a program that stores images in binary streams. Why? Obviously this is not desirable, in this case, it is written with C # A simple crawler, the object we crawl is the astronomical net http://www.tianwenwang.cn/

The principle of the program is to use WebRequest and WebResponse for the corresponding website (do not understand, can only say 0.0), and then use StreamWrite to store the website's source files to txt text files, which we can find

Phenomenon, image address is similar to http://p.tianwenwang.cn/upload/150318/68181426648163.jpg! List.jpg, http://p.tianwenwang.cn/upload/150312/58341426094069.jpg! List.jpg, you can use a regular expression to retrieve all the http: files in the file and put them in a string array. Finally, you can determine that the address contains a typical jpg file, gif and other image types are suffixed (the biggest defect of V1.0). If it is included, it will be stored in the database.

The background code is as follows:

Using System; using System. collections. generic; using System. componentModel; using System. data; using System. drawing; using System. IO; using System. linq; using System. net; using System. text; using System. text. regularExpressions; using System. threading. tasks; using System. windows. forms; namespace web crawler {public partial class Form1: Form {private static string [] getLinks (string html) {const string pattern = @ "h Ttp: // ([\ w-] + \.) + [\ w-] + (/[\ w -./? % & =] *)? "; Regex r = new Regex (pattern, RegexOptions. ignoreCase); // create the regular expression mode MatchCollection m = r. matches (html); // obtain the matching result string [] links = new string [m. count]; for (int I = 0; I <m. count; I ++) {links [I] = m [I]. toString (); // extract result} return links;} private static bool isValiable (string url) {if (url. contains (". jpg ") | url. contains (". gif ") | url. contains (". png ") {return true; // obtain resources such as images} return false ;} Private static void savePicture (string path) {DataClasses1DataContext db = new DataClasses1DataContext (); Uri url = new Uri (path); WebRequest webRequest = WebRequest. create (url); WebResponse webResponse = webRequest. getResponse (); if (isValiable (path) // you can store the image in the database if it is an image. {Bitmap myImage = new Bitmap (webResponse. getResponseStream (); MemoryStream MS = new MemoryStream (); myImage. save (MS, System. drawing. imaging. imageFormat. jpeg); var p = new pictureUrl {pictureUrl1 = ms. toArray ()}; db. pictureUrl. insertOnSubmit (p); db. submitChanges () ;}} public Form1 () {InitializeComponent ();} private void button#click (object sender, EventArgs e) {string rl; string path = thi S. textBox1.Text; Uri url = new Uri (path); WebRequest webRequest = WebRequest. create (url); WebResponse webResponse = webRequest. getResponse (); Stream resStream = webResponse. getResponseStream (); StreamReader sr = new StreamReader (resStream, Encoding. UTF8); StringBuilder sb = new StringBuilder (); while (rl = sr. readLine ())! = Null) {sb. append (rl);} FileStream aFile = new FileStream (".. /.. /txt.txt ", FileMode. openOrCreate); StreamWriter sw = new StreamWriter (aFile); // store the webpage to the txt text file sw. writeLine (sb. toString (); sw. close (); string [] s; s = getLinks (sb. toString (); int I = 0; foreach (string sl in s) {I ++; savePicture (sl );}}}}

This version only Crawlers for websites similar to Skynet. I will upgrade crawlers later and strive to create a general crawler O ~!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

C # Write crawler, version V1.0,

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support