Retrieving and downloading image links in excess of the limit

Source: Internet
Author: User

In the past, I used to go to some wallpaper websites in windows, QQ albums for beautiful girls, or download images from a passionate beauty image website. I often need to right-click and "Save as". If I meet a classic image set, such repeated operations will definitely cause you to lose motivation for downloading. Later, I used a Firefox plug-in, which seems to be DownloadThemAll (I can't remember it, but it is a batch download of webpage links, and images can be downloaded in a screening format ), using this function can greatly improve the efficiency of image downloading (however, if there are too many small images on the page, it will take a long time to filter and delete waste files ). I have been using ubuntu for many years without windows or thunder. How can we batch download mets from webpages? In chrome, I always wanted to find such a plug-in. Unfortunately, I only found the IMG inspector plug-in. This plug-in works by defining a Reference URL and using placeholders, define the step size and cycle range to regenerate the link and Preview It. I have to say that this feature is too weak. Even the URLs of a set of images are not necessarily regular, so this method is not desirable or practical.
As a result, I learned how to develop Chrome plug-ins, but I had no motivation to do it myself. I do not know whether the chrome plug-in can solve the problem of image download: whether to call the client to download the software or call the native browser to download the software. In this way, I have to study some advanced methods of chrome APIs, development costs increase. Then give up. So the thought of switching continues to be pondered. Then we split several difficult questions one by one (Environment: UBUNTU + CHROME/FIREFOX ):

1) how to obtain the image address of the current page?

The simplest thing is to execute the script program in the chrome console or firebug, I have also thought about using web crawler tools such as SimpleHtmlDom, a powerful open-source framework (if you are familiar with jquery, It is very convenient for you to use jquery on the server side to obtain tags ), however, the execution efficiency may be poor. Complexity also increases.

2) How can I determine whether the image size of the current page meets my "appetite "?

There are two considerations for this problem: the Web Images are generally divided into thumbnails and source images. Thumbnails are usually accompanied by source image links. That is, to wrap the img tag with the tag, we need to obtain the href value of the tag instead of the src value of the img tag. The source image is generally just an img tag, in this way, you can use the width and height of the Image object to filter these images. For thumbnail filtering, you can set the src value for a new Image object and then filter its height and width. However, I usually do not filter thumbnails. Such images are usually very large. On the contrary, you are most likely to filter img objects in tag, because many logo and button images are small images of link packages.

3) how to download images?

The above two steps are completed by using the console script program. The entire Code contains no more than 10 lines (including the Code loaded by the jquery camera ). Finally, you can get the image address after filtering on the current page without making any effort. It is a pity that the address is useless at this step. The actual waste operation step is how to download these images to a local machine at a time. If I develop chrome plug-ins, I know how chrome calls the system method (in fact, I am not sure if chrome is OK, if the browser's security restrictions are strict enough, then this will certainly not work ), and then get familiar with the powerful download command wget. This is an easy solution. Unfortunately, I am not familiar with the first two of them, but it doesn't matter if there is no such thing as a path to Rome. There must be another way.

4) fly over the Console

Now our thinking is stuck in the chrome console. There are a lot of image links empty, but we don't know how to download them (in fact, it's just the image link in the current window ). I have always had a bit of fantasies about chrome plug-in development. Unfortunately, I don't have the motivation to learn it, and I have always questioned whether chrome's secure and strict browser will allow js to interact with clients?

So I began to think backwards. without downloading files, I could store these files locally wherever I could read them in the future. So the idea continued to the local storage localStorage and local database of html5, and also considered the local database of Google Gears. Later, it was found that it was either too complicated or not feasible. The idea begins to shift toward jQuery in a simple direction. That's right. -- $. getJSON (). If I can send the image address to a local website across domains and then download it in the background, isn't that all right? So I immediately created a website using Code Igniter: I only added a php controller file with only one method to the Controller. The Code still contains no more than 10 lines. Code is only used to link all images to a folder (urls.txt ).

5) invincible download Device

This is almost the end. Someone may ask me, aren't you still downloading images? Hey, the nb character is coming soon: wget-I-B urls.txt. Enter the website directory in the terminal and execute this command to automatically download the image address of each line in the text file in the background.

PS: I am not familiar with this command a few days ago. I used a 1 H network to perform an experiment. I misused the parameter and silently downloaded a 1g yellow image in the background. Later I found that the process was forcibly killed. In short, this command cannot download the website content. If you use linux, this command is very evil and you want to do something bad!

6) Can I not renew it?

This operation process is really awesome.

Tip 1: a) deploy a local website -- B) Right-click chrome console or firebug -- c) copy the script -- d) paste -- e) enter -- f Open the terminal -- g> wget. This operation is very convenient for a webpage with many images (thumbnails and screenshots. However, if I open ten web pages, each page needs to perform operations B, d, and e. If this script can be used as a chrome plug-in embedded in the browser, at least two steps can be saved: open the page and click the plug-in icon or set it to automatic script execution. This greatly improves the usability.

Tip 2: how to save the wget step and directly execute the download in the php background. This requires consideration of using php to call the ubuntu system method, which is not familiar with and needs to be studied.

Tip 3: the root cause of website deployment is that I cannot store the image addresses of multiple pages together. I have considered cookies but the size limit is a problem, the image address is normal (especially when it contains Chinese characters ). In addition, you must consider the storage before reading and calling.

These issues have not been taken into consideration yet. Now we are only considering implementing limit 2. In fact, this process has not greatly improved the technology itself, but later I thought about the entire process and found that the ideas and methodologies in the face of problems have been significantly improved. Welcome to discuss it!


From Hurry's column

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.