C # write crawler, version V1.0

Source: Internet
Author: User

    Before looking at the basic data types in SQL Server, it is quite special to find the type of image.

So I did a program that stores the image as a binary stream http://www.cnblogs.com/JsonZhangAA/p/5568575.html, now if I want to bulk ed storage online picture of a website, do I have to write n multi-address? Obviously this is not advisable, in response to this situation, in C # wrote a simple crawler, we are crawling object is the astronomical network http://www.tianwenwang.cn/

The principle of the program is to use WebRequest and WebResponse to the corresponding website (do not understand, can only say 0.0), and then use Streamwrite the site's source files stored in a txt text file, which we can find a

phenomenon, the image address is similar to http://p.tianwenwang.cn/upload/150318/68181426648163.jpg!list.jpg,http://p.tianwenwang.cn/upload/ 150312/58341426094069.jpg!list.jpg this, so you can use the regular expression to the inside of the http: all out, put into a string array, the last is to determine the address when the typical JPG is included, A picture type suffix (V1.0 biggest flaw), such as GIF, is stored in the database if it is included.

The background code is as follows:

usingSystem;usingSystem.Collections.Generic;usingSystem.ComponentModel;usingSystem.Data;usingSystem.Drawing;usingSystem.IO;usingSystem.Linq;usingSystem.Net;usingSystem.Text;usingSystem.Text.RegularExpressions;usingSystem.Threading.Tasks;usingSystem.Windows.Forms;namespaceweb crawler { Public Partial classForm1:form {Private Static string[] Getlinks (stringhtml) {            Const stringPattern =@"http://([\w-]+\.) +[\w-]+ (/[\w-./?%&=]*)?"; Regex R=NewRegex (pattern, regexoptions.ignorecase);//New Regular ModeMatchCollection m = r.matches (HTML);//Get matching results            string[] links =New string[M.count];  for(inti =0; i < M.count; i++) {Links[i]= M[i]. ToString ();//Extract the results            }            returnlinks; }        Private Static BOOLIsvaliable (stringURL) {            if(URL.) Contains (". jpg") || Url. Contains (". gif")|| Url. Contains (". PNG"))            {                return true;//get some resources like pictures            }            return false; }        Private Static voidSavePicture (stringpath) {Dataclasses1datacontext db=NewDataclasses1datacontext (); Uri URL=NewUri (path); WebRequest WebRequest=webrequest.create (URL); WebResponse WebResponse=Webrequest.getresponse (); if(isvaliable (path))//determine if it is a picture, store it in a database. {Bitmap MyImage=NewBitmap (WebResponse.GetResponseStream ()); MemoryStream Ms=NewMemoryStream ();                Myimage.save (MS, System.Drawing.Imaging.ImageFormat.Jpeg); varp =NewPictureurl {pictureUrl1=Ms.                ToArray ()};                Db.pictureUrl.InsertOnSubmit (P); Db.            SubmitChanges (); }        }         PublicForm1 () {InitializeComponent (); }        Private voidButton1_Click (Objectsender, EventArgs e) {            stringRL; stringPath = This. TextBox1.Text; Uri URL=NewUri (path); WebRequest WebRequest=webrequest.create (URL); WebResponse WebResponse=Webrequest.getresponse (); Stream Resstream=WebResponse.GetResponseStream (); StreamReader SR=NewStreamReader (Resstream, Encoding.UTF8); StringBuilder SB=NewStringBuilder ();  while(RL = Sr. ReadLine ())! =NULL) {sb.            Append (RL); } FileStream afile=NewFileStream (".. /.. /txt.txt", FileMode.OpenOrCreate); StreamWriter SW=NewStreamWriter (Afile);//Store a webpage in a txt text fileSW. WriteLine (sb.)            ToString ()); Sw.            Close (); string[] s; S=getlinks (sb.)            ToString ()); inti =0; foreach(stringSlinchs) {i++;            SavePicture (SL); }        }    }}

This version can only be similar to the astronomical network of such sites crawler, I will follow up the crawler to strive to make a generic crawler O (∩_∩) o~!

C # write crawler, version V1.0

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.