Crawler Learning--Download images
1. The urllib and re libraries are used mainly
2. Use the Urllib.urlopen () function to get the page source code
3. Use regular matching image type, of course, the more accurate, the more downloaded
4. Download the image using Urllib.urlretrieve () and rename it using%s
5. There should be restrictions on the operator, so it is not possible to download all the pictures, but OK
URL Analysis:
Source:
#coding =utf-8import reimport urllibdef gethtml (URL): page=urllib.urlopen (URL) html=page.read (); return htmldef getImage (HTML): reg=r ' src= ' (. *?\.jpg) "Size ' imgre=re.compile (reg) imgelist =re.findall (imgre,html) X=0 for image in Imgelist: urllib.urlretrieve (image, '%s_hhh.jpg '% x) x+=1html=gethtml ("https:// tieba.baidu.com/p/5256641773 ") getImage (HTML)
Python Learning---web crawler [download image]