The code is rough, mainly the place where the memo is prone to error. For self-inspection later.
#图片下载
Import re
Import Urllib.request #python3中模块名和2. x (urllib) not the same
Site= ' Https://world.taobao.com/item/530762904536.htm?spm=a21bp.7806943.topsale_XX.4.jcjxZC '
Page=urllib.request.urlopen (site)
Html=page.read ()
Html=html.decode (' Utf-8 ') #读取下来的网页源码需要转换成utf-8 format
Reg=r ' src= '//(gd.*?jpg)
Imgre=re.compile (REG)
Imglist=re.findall (imgre,html)
Trueurls=[]
For I in Imglist:
Trueurls.append (I.replace (' gd ', ' http://gd '))
trueurls[2]= ' http://wlgsad.com.jpg '
Print (Trueurls)
x=200
For J in Trueurls:
Try
Urllib.request.urlretrieve (J, '%s.jpg '%x)
Except Exception: #except Exception as E:
Pass # Print (e)
# Print (' with invalid link ')
X=x+1
You can print out some hints in the EXCEPT clause
When downloading the picture, if has the invalid link, may use the try except to skip the invalid link to continue the next picture the download
Python3 web crawler image download Invalid link processing try except