The main knowledge points used in this function are as follows:
1. Regular expressions
2. The implementation of the download file function in C #
3. Use of generic collections
4, the simple operation of the process (to end the current program)
Here's a brief talk about how to use these points of knowledge. First of all, the main implementation of this program is what the function is, the existing text files are copied from the Web page source code. Now it is necessary to filter out the image URL address which begins with HTTP, HTTPS, and FTP, starting with. Jpg,.png,.gif, and go to these links to download the corresponding images in the URL. After analysis. Decides to filter the URL address using a regular expression. and use the WebClient class to implement the download function. The code is as follows:
1 usingSystem.Text.RegularExpressions;2 usingSystem;3 usingSystem.Net;4 usingSystem.IO;5 usingSystem.Diagnostics;6 usingSystem.Collections.Generic;7 namespaceUrlregex8 {9 class ProgramTen { One Public Staticlist<string> GetUrl (stringdata) A { -list<string> strurl=Newlist<string> ();//defines generics, which are used to hold crawled URLs - stringRegexstr =@"(HTTP|FTP|HTTPS)://([\w-]+\.) +[\w-]+ (/[\w-./?%&=]*) +\. (png|jpg|gif)";//find a regular expression for a URL theRegex reg =NewRegex (Regexstr, regexoptions.ignorecase);//class instantiation of regular Expressions -MatchCollection mc = Reg. Matches (data);//to match - if(MC. Count <=0)//to determine that a valid URL was not crawled - { +Console.WriteLine ("do not crawl to a qualifying URL, press any key to exit the program"); - Console.readkey (); + process.getcurrentprocess (). Kill (); A } at for(inti =0; I < MC. Count; i++) - { -Strurl.add (Mc[i]. groups[0]. Value);//to load matching data into a generic collection - } - returnstrURL;//returns this generic collection - in}//Get URL - to Public Static voidDownLoad (list<string>Tempurl) + { - the stringCurrentpath = System.Environment.CurrentDirectory;//Get current directory *Directory.CreateDirectory (Currentpath +@"\photos\");//Create a Photos folder under the current directory $ stringCurrentpathphotos = Currentpath +@"\photos\";//get the path to photosPanax Notoginseng -WebClient mydownload =NewWebClient ();//instantiate the WebClient class for download the inti =1;//name for the picture +Regex regjpg =NewRegex (". jpg", Regexoptions.righttoleft);//determine if the picture is. jpg format ARegex regpng =NewRegex (". PNG", Regexoptions.righttoleft);//determine if the picture is. png format the + foreach(stringTempinchTempurl)//traverse to get to the picture URL and download and save - { $Match MJPG =Regjpg.match (temp); $ if(mjpg.success) - { - stringFilepathjpg = Currentpathphotos + i +". jpg"; the Try - {Wuyi mydownload.downloadfile (temp, filepathjpg); theConsole.WriteLine ("Download Successful"); -i++; Wu } - Catch About { $Console.WriteLine ("Download Failed"); - } - - } A Else + { theMatch mpng =Regpng.match (temp); - $ if(mpng.success) the { the stringFilepathpng = Currentpathphotos + i +". PNG"; the Try the { - mydownload.downloadfile (temp, filepathpng); inConsole.WriteLine ("Download Successful"); thei++; the } About Catch the { theConsole.WriteLine ("Download Failed"); the } + - } the ElseBayi { the stringFilepathgif = Currentpathphotos + i +". gif"; the Try - { - mydownload.downloadfile (temp, filepathgif); theConsole.WriteLine ("Download Successful"); thei++; the } the Catch - { theConsole.WriteLine ("Download Failed"); the } the }94 the } the the }98 AboutProcess.Start ("Explorer", Currentpathphotos);//render results immediately after completion -}//implementation Download101 102 Public Static voidMain ()103 {104 stringCurrentpath =environment.currentdirectory; the stringSource= File.readalltext (currentpath+@"\test.txt");//read in file106list<string> temp = GETURL (source);//Filter URLs107Console.WriteLine ("the filtered URL addresses are as follows:");108 foreach(stringTinchtemp)109 { theConsole.WriteLine (T.tostring ());//Enter URL111 } theConsole.WriteLine ("Downloading pictures ...");113DownLoad (temp);//Download Image theConsole.WriteLine ("\ n Download end, press any key to exit"); the Console.readkey (); the}//Main function117 }118}
View Code
The difficulty is:
1, the construction of the regular expression, because only to contact the regular expression, so for its regular expression is not very familiar with the construction of their own Baidu has looked at a lot of information. I've seen some similar regular expressions written by others. To write this regular expression.
2, the handling of the exception. For example, File open failed, download failed. Not getting the correct URL address and so on. (solution: Add up try and catch to use the end of the current process in the catch).
Use regular expressions in C # to filter out picture URLs and download pictures from picture URLs to local