In our daily surfing the Web page, often see some good-looking pictures, we would like to save these images to download, or users to do desktop wallpaper, or used to make design material.
Our most common practice is to choose Save as by right mouse button. But some pictures of the right mouse button is not saved as an option, there are ways to pass through the tool is intercepted, but this reduces the sharpness of the picture. All right ~! In fact, you are very powerful, right-click to view the page source code.
We can use Python to implement such a simple crawler function, to crawl the code we want locally. Here's a look at how to use Python to implement such a feature.
One, get the entire page data
First we can get the entire page information to download the picture.
getjpg.py
#coding =utf-8import urllibdef gethtml (URL): page = urllib.urlopen (URL) html = page.read () return htmlhtml = gethtml ("http://tieba.baidu.com/p/2738151262") Print HTML
The Urllib module provides an interface for reading Web page data, and we can read the data on WWW and FTP as if it were a local file. First, we define a gethtml () function:
The Urllib.urlopen () method is used to open a URL address.
The read () method is used to read the data on the URL, pass a URL to the gethtml () function, and download the entire page. The execution program will print out the entire page.
Second, filter the desired data in the page
Python provides a very powerful regular expression, and we need to know a little bit about Python's regular expressions first.
Http://www.cnblogs.com/fnng/archive/2013/05/20/3089816.html
If we Baidu stick to find a few beautiful wallpaper, through to the previous section to view the tool. Found the address of the image, such as: src= "http://imgsrc.baidu.com/forum......jpg" pic_ext= "JPEG"
Modify the code as follows:
Import reimport urllibdef gethtml (URL): page = urllib.urlopen (URL) html = page.read () return htmldef getimg (HTML): reg = R ' src= "(. +?\.jpg)" Pic_ext ' Imgre = Re.compile (reg) imglist = Re.findall (imgre,html) return imglist html = gethtml ("http://tieba.baidu.com/p/2460150866") print getimg (HTML)
We also created the getimg () function to filter the desired picture connection in the entire page obtained. The RE module consists mainly of regular expressions:
Re.compile () can compile a regular expression into a regular expression object.
The Re.findall () method reads the data in the HTML that contains the Imgre (regular expression).
Run the script to get the URL address of the entire page that contains the picture.
Third, save the page filter data to the local
Pass the filtered picture address through the for loop and save to local, the code is as follows:
#coding =utf-8import urllibimport redef gethtml (URL): page = urllib.urlopen (URL) html = page.read () return Htmldef getimg (HTML): reg = R ' src= "(. +?\.jpg)" Pic_ext ' Imgre = Re.compile (reg) imglist = Re.findall (imgre , html) x = 0 for imgurl in imglist: urllib.urlretrieve (Imgurl, '%s.jpg '% x) x+=1html = gethtml ("http ://tieba.baidu.com/p/2460150866 ") print getimg (HTML)
The core here is to use the Urllib.urlretrieve () method to download remote data directly to the local.
The acquired picture connection is traversed through a for loop, in order to make the picture's file name look more canonical and rename it, and the naming convention is added 1 by the X variable. The saved location defaults to the directory where the program resides.
When the program runs, it will see the files downloaded to the local directory.
Python implements simple crawler functions