When we surf the web on a daily basis, we often see some beautiful pictures, we would like to save these pictures to download, or users used to do desktop wallpaper, or to do design material.
Our most common practice is to choose to save as with the right mouse button. But some pictures of the right mouse button is not saved as an option, there is a way to pass through the screenshot tool to intercept, but this reduces the clarity of the picture. All right, ~!. In fact you are very good, right view the page source code.
We can use Python to implement such a simple reptile function, to crawl the code we want to the local. Let's look at how you can use Python to implement such a feature.
One, get the entire page data
First we can get the entire page information to download the picture.
getjpg.py
#coding =utf-8
Import urllib
def gethtml (URL):
page = urllib.urlopen (URL)
html = page.read ()
return HTML
HTML = gethtml ("http://tieba.baidu.com/p/2738151262")
print HTML
The Urllib module provides an interface for reading Web page data, and we can read data on WWW and FTP as we read local files. First, we define a gethtml () function:
The Urllib.urlopen () method is used to open a URL address.
The read () method is used to read the data on the URL, pass a URL to the gethtml () function, and download the entire page. Executing the program will print out the entire page.
Second, filter the data you want in the page
Python provides a very powerful regular expression, and we need to know a little bit about Python regular expressions.
If we Baidu Bar found a few beautiful wallpaper, through to the front of the viewing tool. Found the address of the picture, such as: src= "http://imgsrc.baidu.com/forum......jpg" pic_ext= "JPEG"
Modify the code as follows:
Import re
import urllib
def gethtml (URL):
page = urllib.urlopen (URL)
html = page.read ()
return HTML
def getimg (HTML):
reg = R ' src= ' (. +?\.jpg) ' Pic_ext '
imgre = Re.compile (reg)
imglist = Re.findall (imgre,html)
return imglist
html = gethtml ("http://tieba.baidu.com/p/2460150866")
print getimg (HTML)
We also created the getimg () function to filter the desired picture connections across the entire page that was fetched. The RE module consists mainly of regular expressions:
Re.compile () can compile a regular expression into a regular expression object.
The Re.findall () method reads data in HTML that contains Imgre (regular expressions).
Running the script will get the URL address of the entire page that contains the picture.
Third, save the page filter data to the local
The filtered picture address is traversed by a for loop and saved to the local code as follows:
#coding =utf-8
Import urllib
import re
def gethtml (URL):
page = urllib.urlopen (URL)
html = Page.read () return
HTML
def getimg (HTML):
reg = R ' src= ' (. +?\.jpg) ' Pic_ext '
imgre = Re.compile (reg)
imglist = Re.findall (imgre,html)
x = 0
for Imgurl in imglist:
urllib.urlretrieve (Imgurl, '%s.jpg ') % x)
x+=1
html = gethtml ("http://tieba.baidu.com/p/2460150866")
print getimg (HTML)
The core here is to use the Urllib.urlretrieve () method to directly download the remote data to the local.
Iterate over the acquired picture connection through a for loop, renaming the picture's file name in order to make it appear more canonical, and naming the rule by adding 1 to the x variable. The saved location defaults to the directory where the program is stored.
When the program runs, it will see files downloaded to the local directory.
Thank you for reading, I hope to help you, thank you for your support for this site!