Stroll watercress, found some pictures, too lazy a steak, before the C # and Python version of the picture download, so take the Python code before the change, toss out a watercress version, convenient for you to use
#-*-Coding:utf8-*-Importurllib2, urllib, SocketImportReImportRequests fromlxmlImportetreeImportOS, Timedefault_download_timeout= 30classAppurlopener (urllib. Fancyurlopener): Version="mozilla/4.0 (compatible; MSIE 6.0; Windows NT)"defCheck_save_path (save_path):if notos.path.exists (Save_path): Os.makedirs (Save_path)defGet_image_name (image_link): file_name=os.path.basename (Image_link)returnfile_namedefSave_image1 (Image_link, Save_path): file_name=get_image_name (image_link) File_path= Save_path +"\\"+file_namePrint("preparing to download {0} to {1}". Format (Image_link, file_path))Try: Urllib._urlopener=Appurlopener () socket.setdefaulttimeout (default_download_timeout) urllib.urlretrieve (URL=image_link, Filename=Save_path)returnTrueexceptException, ex:Print(Ex.args)Print("Error Downloading file: {0}". Format (ex.message))returnFalsedefsave_image (Image_link, Save_path): file_name=get_image_name (image_link) File_path= Save_path +"\\"+file_namePrint("preparing to download {0} to {1}". Format (Image_link, file_path))Try: File_handler= Open (File_path,"WB") Image_handler= Urllib2.urlopen (Url=image_link, timeout=default_download_timeout). Read () file_handler.write (Image_handler)returnTrueexceptException, ex:Print("Error Downloading file: {0}". Format (ex.message))returnFalsedefGet_thumb_picture_link (thumb_page_link):Try: Html_content= Urllib2.urlopen (Url=thumb_page_link, timeout=default_download_timeout). Read () Html_tree=etree. HTML (html_content)#print (str (html_tree))Link_tmp_list = Html_tree.xpath ('//div[@class = "Photo_wrap"]/a[@class = "Photolst_photo"]/img/@src') Page_link_list= [] forLink_tmpinchlink_tmp_list:page_link_list.append (link_tmp)returnpage_link_listexceptException, ex:Print(ex.message)return []defdownload_pictures (Album_link, min_page_id, max_page_id, Picture_count_per_page, Save_path): Check_save_path ( Save_path) min_page_id=0 whilemin_page_id <Max_page_id:thumb_page_link= Album_link +"? Start={0}". Format (MIN_PAGE_ID *picture_count_per_page) Thumb_picture_links=Get_thumb_picture_link (Thumb_page_link) forThumb_picture_linkinchThumb_picture_links:full_picture_link= Thumb_picture_link.replace ("Photo/thumb","Photo/large") Save_flag= Save_image (Image_link=full_picture_link, save_path=Save_path)if notSave_flag:full_picture_link= Thumb_picture_link.replace ("Photo/thumb","Photo/photo") save_image (Image_link=full_picture_link, save_path=save_path) Time.sleep (1) min_page_id+ = 1Print("Download Complete")#set up a local folder for picture savingSave_path ="J:\\douban\\meiren2"#set the album address, and note the end of the backslashAlbum_link ="https://www.douban.com/photos/album/43697061/"#set total pages of albumsmax_page_id = 9#set the number of pictures per page, default to 18 sheetsPicture_count_per_page = 18download_pictures (Album_link, max_page_id, Picture_count_per_page, Save_path)
View Code
=============================================================
Relative Urllib2, urllib really compare pits, if not set user-agent, download speed will be super slow, also need to call the socket module to set the timeout time, compare toss, eventually may also step on other pits, for example, I was under the watercress to ' Shield ', so recommended to use URLLIB2.
Related Reference Links:
Http://www.jb51.net/article/57239.htm
Http://www.crifan.com/use_python_urllib-urlretrieve_download_picture_speed_too_slow_add_user_agent_for_ urlretrieve/comment-page-1/
=============================================================
National Day last day, I wish you a happy National day!
python--Bulk Download Watercress pictures