Write your own tools: Baidu image Batch Downloader

Source: Internet
Author: User

Opening : In some scenarios, we want to Baidu image search out of the east to save, but a one to download save not only time-consuming and laborious, there is a way to simplify our workload, and let us in the offline mode can also be shuangshuang to browse a large number of beautiful map it? So, we think of using the Web crawl to help us to download pictures, and saved to our Settings folder, and now we will look at how to design and develop a batch of such images download.

First, about the network crawl and crawler

The main role of web spiders is to continuously download network resources from the Internet. Its basic idea is to get more URLsthrough one or more portal URLs, and then download and analyze the network resources that the URLs point to, then get the URLs that are contained in those network resources, and so on, until there are no more URLs available.

The general steps to achieve the Web spider can be divided into the following steps:

(1) Specify one (or more) portal URLs {such as http://www.xx.com) and add this URL to the download queue (at which point there is only one or more entry URLs in the download queue)}.
(2) The thread responsible for downloading the network resources takes one or more URLs from the download queue and downloads the network resources they point to locally {before downloading, it is generally necessary to determine if the URL has been downloaded, and if it has been downloaded, ignore the URL}. If there is no URL in the download queue and all the download threads are dormant, it means that all the network resources that were brought out by the portal URL have been downloaded. The Web spider will then prompt the download to complete and stop the download.
(3) Analyze these downloaded to the local non-parsed network resource {generally HTML code}, and get the value of url{such as label <a> href attribute}.
(4) Add the URL of the 3rd step to the download queue and then perform step 2nd again.

Second, about the picture Volume Downloader 2.1 manual download workload Large

In the normal use, we often go to Baidu image search images, and then save to browse the local or two times use. However, if we need to use a lot of pictures of the same subject matter, a single manual to the download and save efficiency will appear very low. At this time, we can not help but want to find a way to let the computer to do this thing for us!

But I didn't think of a way to break my head. So, we opened the F12 developer tool and found such an AJAX request, with a little meaning:

Looking at the HTTP message information for this AJAX request, it found that it returned a large string of JSON data and copied it to the JSON online viewer (http://www.bejson. com/jsonview2/) to view, All of the original picture list information is returned to the browser side in this JSON.

2.2 Bulk Download Shuangshuang See figure

(1) Seeing the request above, we probably have a spectrum in our hearts. Here, let's start by analyzing the address of the AJAX request just now:

Request url:http://image.baidu.com/i?tn=resultjsonavatarnew&ie=utf-8&word=%e5%ae%8b% e6%99%ba%e5%ad%9d&CG=star&pn=60&rn=60&z=&ITG= 0&fr=&width=&height=&lm=-1&ic =0& s=0request method:getstatus code:200 OK

① This AJAX request is first passed by the Get method, all parameters are through the QueryString way after the URL address, that is, all the parameters are followed, including the search terms we entered, page size per page, the current number of pages and other parameters;

② then take a look at the parameters behind this request address to find out some of the important parameters we need. Where word is the keyword of the search, just behind the URL encoding, RN is the page capacity (or page size, that is, how many pictures on a page, you can see the default is 60 pictures), and PN represents the total number of images requested, you can get the current page by Pn/rn, such as here pn= 60,rn=60, then the first page is requested.

(2) Now let's comb our workflow for this downloader:

(3) Let's take a look at our implementation of the image downloader after the appearance of how to:

Implementation of key code 3.1 declare an asynchronous delegate to perform a picture download operation, separate from the UI thread to prevent the interface from sticking to the dead
            //declaring an asynchronous delegate to handle a picture download operationAction downloadaction =NewAction (() ={processdownload (keyword);            }); //declare a callback function after download is completeAsyncCallback CallBack =NewAsyncCallback (AsyncResult ={downloadaction.endinvoke (asyncResult); Progressbar.begininvoke (NewAction (() ={Progressbar.value=Progressbar.maximum;                })); Txtlogs.begininvoke (NewAction (() ={Txtlogs.appendtext ("Download picture operation is over! "+Environment.NewLine);                })); Btnstart.begininvoke (NewAction (() ={btnstart.enabled=true;            }));            }); //executes the asynchronous delegateIAsyncResult result = Downloadaction.begininvoke (CallBack,NULL); //The main thread continues to do its own thingTxtlogs.appendtext ("Downloading pictures ..."+ Environment.NewLine);

The key to using an asynchronous delegate is to set its callback function, which ends the thread operation in the callback function, and is securely invoked across threads through the UI control's BeginInvoke implementation (similar to working with delegates).

3.2 using WebRequest to make HTTP requests to the specified server side
        Private voidProcessdownload (stringkeyword) {            intPageCount = (int) Numpagecount.value; Sumcount= PageCount * -;  for(inti =0; i < PageCount; i++) {HttpWebRequest Request= (HttpWebRequest) httpwebrequest.create ("http://image.baidu.com/i?tn=resultjsonavatarnew&ie=utf-8&word="+ uri.escapedatastring (keyword) +"&pn="+ PageCount * -+"&cg=girl&rn=60&itg=0&lm=-1&ic=0&s=0"); using(HttpWebResponse response =(HttpWebResponse) request. GetResponse ()) {if(Response. StatusCode = =Httpstatuscode.ok) {using(Stream stream =Response. GetResponseStream ()) {Try                            {                                //Download all pictures of the specified pageDownloadpage (stream); }                            Catch(Exception ex) {//Txtlogs to access the UI thread across threadsTxtlogs.begininvoke (NewAction (() ={Txtlogs.appendtext (ex. Message+Environment.NewLine);                            })); }                        }                    }                    Else{MessageBox.Show ("get the first"+ PageCount +"page failed:"+Response.                    StatusCode); }                }            }        }

The try is used here. Catch fills the textbox text box with the exception information that was encountered during the download.

3.3 Parsing JSON data using third-party JSON components
        Private voidDownloadpage (Stream stream) {using(StreamReader reader =NewStreamReader (Stream)) {                stringJsondata =Reader.                ReadToEnd (); //parse JSON, parse JSONJobject objectroot = Jsonconvert.deserializeobject (jsondata) asJobject; Jarray Imgsarray= objectroot["IMGs"] asJarray;  for(inti =0; i < Imgsarray.count; i++) {jobject img= Imgsarray[i] asJobject; stringObjurl = (string) img["Objurl"]; //Txtlogs.appendtext (Objurl + Environment.NewLine);//test Get Picture path                    Try                    {                        //download a specific picturedownloadimage (Objurl); //Update progress barProgressbar.begininvoke (NewAction (() ={Progressbar.value= i * -/Sumcount;                        })); //Update text BoxTxtlogs.begininvoke (NewAction (() ={Txtlogs.appendtext ("have downloaded:"+ Objurl +Environment.NewLine);                    })); }                    Catch(Exception ex) {//Txtlogs control that accesses the UI thread across threadsTxtlogs.begininvoke (NewAction (() ={Txtlogs.appendtext (""Exception:"+ ex. Message +"""+Environment.NewLine);                    })); }                }            }        }

The Newtonsoft.json component is used here, in the returned JSON data, find the IMGs collection, traverse it, find the Objurl and download it locally.

3.4 Forge Urlrerfer and use FileStream to save it locally
        Private voidDownloadimage (stringObjurl) {            stringdestFileName =Path.Combine (DestDir, Path.getfilename (Objurl)); HttpWebRequest Request=(HttpWebRequest) httpwebrequest.create (Objurl); //Spoofing server Judgment UrlrefererRequest. Referer ="http://image.baidu.com"; using(HttpWebResponse response =(HttpWebResponse) request. GetResponse ()) {if(Response. StatusCode = =Httpstatuscode.ok) {using(Stream stream =Response. GetResponseStream ()) {using(FileStream FileStream =NewFileStream (destFileName, FileMode.Create)) {stream.                        CopyTo (FileStream); }                    }                }                Else                {                    Throw NewException ("Download"+ Objurl +"failure, error code:"+Response.                StatusCode); }            }        }

Here, by forging urlrerfer on the client side, the server mistakenly thinks it is its own station request (forgery our request is not deceiving it traffic), and then saves the returned picture response stream to the specified folder through FileStream.

Iv. Personal Development Summary 4.1 running result demo

Here we bulk download a page (60 photos) of beautiful pictures to the specified folder, to see if the downloader really helped us to download the image:

(1) The process of running the program:

(2) After downloading the picture folder:

4.2 Changing the search term

Here we changed "Beauty" to "Song Ji Hyo", found that the downloader did not successfully download the picture. After analysis, the original Baidu image search, each search word generated by the Ajax requests are different, so this downloader is not universal, that is, each time the replacement of the search term needs to change the code, mainly to change HttpWebRequest that URL address.

(1) Change the code at the URL:

(2) The process of running the program:

(3) Download the image file:

4.3 is not a summary of the summary

This time we have implemented a gadget that can help us download the images we want to search into the executed Pictures folder, so that we can shuangshuang offline to see the beauty map. Design and development of such a tool, the most important is: analysis of HTTP messages, parsing return data, thread creation and synchronization, asynchronous operations, file flow, progress bar updates (cross-thread calls), etc., this development is more or less involved in some of the things. Of course, there are many shortcomings, such as the lack of versatility of tools, each time the replacement of the search terms need to change the code, configurable type is not high and so on. Here is a demo of my Code implementation, interested friends can also modify and expand themselves.

Resources

(1) Yang Zhengko, "Write your own beautiful picture downloader": HTTP://WWW.RUPENG.COM/COURSES/INDEX/14

(2) Frozen Heart, "c#2.0 network spider to realize crawling network resources": http://www.cnblogs.com/yibinboy/articles/1236356.html

Accessories download

Mypicturedownloader V1.0:HTTP://PAN.BAIDU.COM/S/1KTVFLJP

Zhou Xurong

Source: http://edisonchou.cnblogs.com/

The copyright of this article is owned by the author and the blog Park, welcome reprint, but without the consent of the author must retain this paragraph, and in the article page obvious location to give the original link.

Write your own tools: Baidu image Batch Downloader

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.