Use regular expressions in C # to filter out picture URLs and download pictures from picture URLs to local

Source: Internet
Author: User

The main knowledge points used in this function are as follows:

1. Regular expressions

2. The implementation of the download file function in C #

3. Use of generic collections

4, the simple operation of the process (to end the current program)

Here's a brief talk about how to use these points of knowledge. First of all, the main implementation of this program is what the function is, the existing text files are copied from the Web page source code. Now it is necessary to filter out the image URL address which begins with HTTP, HTTPS, and FTP, starting with. Jpg,.png,.gif, and go to these links to download the corresponding images in the URL. After analysis. Decides to filter the URL address using a regular expression. and use the WebClient class to implement the download function. The code is as follows:

1 usingSystem.Text.RegularExpressions;2 usingSystem;3 usingSystem.Net;4 usingSystem.IO;5 usingSystem.Diagnostics;6 usingSystem.Collections.Generic;7 namespaceUrlregex8 {9     class ProgramTen     { One          Public Staticlist<string> GetUrl (stringdata) A         { -list<string> strurl=Newlist<string> ();//defines generics, which are used to hold crawled URLs -             stringRegexstr =@"(HTTP|FTP|HTTPS)://([\w-]+\.) +[\w-]+ (/[\w-./?%&=]*) +\. (png|jpg|gif)";//find a regular expression for a URL theRegex reg =NewRegex (Regexstr, regexoptions.ignorecase);//class instantiation of regular Expressions -MatchCollection mc = Reg. Matches (data);//to match -             if(MC. Count <=0)//to determine that a valid URL was not crawled -             { +Console.WriteLine ("do not crawl to a qualifying URL, press any key to exit the program"); - Console.readkey (); + process.getcurrentprocess (). Kill (); A             } at              for(inti =0; I < MC. Count; i++) -             { -Strurl.add (Mc[i]. groups[0]. Value);//to load matching data into a generic collection -             } -             returnstrURL;//returns this generic collection -  in}//Get URL -  to          Public Static voidDownLoad (list<string>Tempurl) +         { -  the             stringCurrentpath = System.Environment.CurrentDirectory;//Get current directory *Directory.CreateDirectory (Currentpath +@"\photos\");//Create a Photos folder under the current directory $             stringCurrentpathphotos = Currentpath +@"\photos\";//get the path to photosPanax Notoginseng  -WebClient mydownload =NewWebClient ();//instantiate the WebClient class for download the             inti =1;//name for the picture +Regex regjpg =NewRegex (". jpg", Regexoptions.righttoleft);//determine if the picture is. jpg format ARegex regpng =NewRegex (". PNG", Regexoptions.righttoleft);//determine if the picture is. png format the  +             foreach(stringTempinchTempurl)//traverse to get to the picture URL and download and save -             { $Match MJPG =Regjpg.match (temp); $                 if(mjpg.success) -                 { -                     stringFilepathjpg = Currentpathphotos + i +". jpg"; the                     Try -                     {Wuyi mydownload.downloadfile (temp, filepathjpg); theConsole.WriteLine ("Download Successful"); -i++; Wu                     } -                     Catch About                     { $Console.WriteLine ("Download Failed"); -                     } -  -                 } A                 Else +                 { theMatch mpng =Regpng.match (temp); -  $                     if(mpng.success) the                     { the                         stringFilepathpng = Currentpathphotos + i +". PNG"; the                         Try the                         { - mydownload.downloadfile (temp, filepathpng); inConsole.WriteLine ("Download Successful"); thei++; the                         } About                         Catch the                         { theConsole.WriteLine ("Download Failed"); the                         } +  -                     } the                     ElseBayi                     { the                         stringFilepathgif = Currentpathphotos + i +". gif"; the                         Try -                         { - mydownload.downloadfile (temp, filepathgif); theConsole.WriteLine ("Download Successful"); thei++; the                         } the                         Catch -                         { theConsole.WriteLine ("Download Failed"); the                         } the                     }94  the                 } the  the             }98  AboutProcess.Start ("Explorer", Currentpathphotos);//render results immediately after completion -}//implementation Download101 102          Public Static voidMain ()103         {104                stringCurrentpath =environment.currentdirectory;  the                stringSource= File.readalltext (currentpath+@"\test.txt");//read in file106list<string> temp = GETURL (source);//Filter URLs107Console.WriteLine ("the filtered URL addresses are as follows:");108                foreach(stringTinchtemp)109               { theConsole.WriteLine (T.tostring ());//Enter URL111                } theConsole.WriteLine ("Downloading pictures ...");113DownLoad (temp);//Download Image theConsole.WriteLine ("\ n Download end, press any key to exit"); the Console.readkey (); the}//Main function117     }118}
View Code

The difficulty is:

1, the construction of the regular expression, because only to contact the regular expression, so for its regular expression is not very familiar with the construction of their own Baidu has looked at a lot of information. I've seen some similar regular expressions written by others. To write this regular expression.

2, the handling of the exception. For example, File open failed, download failed. Not getting the correct URL address and so on. (solution: Add up try and catch to use the end of the current process in the catch).

Use regular expressions in C # to filter out picture URLs and download pictures from picture URLs to local

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.