python--Bulk Download Watercress pictures

Source: Internet
Author: User

Stroll watercress, found some pictures, too lazy a steak, before the C # and Python version of the picture download, so take the Python code before the change, toss out a watercress version, convenient for you to use

#-*-Coding:utf8-*-Importurllib2, urllib, SocketImportReImportRequests fromlxmlImportetreeImportOS, Timedefault_download_timeout= 30classAppurlopener (urllib. Fancyurlopener): Version="mozilla/4.0 (compatible; MSIE 6.0; Windows NT)"defCheck_save_path (save_path):if  notos.path.exists (Save_path): Os.makedirs (Save_path)defGet_image_name (image_link): file_name=os.path.basename (Image_link)returnfile_namedefSave_image1 (Image_link, Save_path): file_name=get_image_name (image_link) File_path= Save_path +"\\"+file_namePrint("preparing to download {0} to {1}". Format (Image_link, file_path))Try: Urllib._urlopener=Appurlopener () socket.setdefaulttimeout (default_download_timeout) urllib.urlretrieve (URL=image_link, Filename=Save_path)returnTrueexceptException, ex:Print(Ex.args)Print("Error Downloading file: {0}". Format (ex.message))returnFalsedefsave_image (Image_link, Save_path): file_name=get_image_name (image_link) File_path= Save_path +"\\"+file_namePrint("preparing to download {0} to {1}". Format (Image_link, file_path))Try: File_handler= Open (File_path,"WB") Image_handler= Urllib2.urlopen (Url=image_link, timeout=default_download_timeout). Read () file_handler.write (Image_handler)returnTrueexceptException, ex:Print("Error Downloading file: {0}". Format (ex.message))returnFalsedefGet_thumb_picture_link (thumb_page_link):Try: Html_content= Urllib2.urlopen (Url=thumb_page_link, timeout=default_download_timeout). Read () Html_tree=etree. HTML (html_content)#print (str (html_tree))Link_tmp_list = Html_tree.xpath ('//div[@class = "Photo_wrap"]/a[@class = "Photolst_photo"]/img/@src') Page_link_list= []         forLink_tmpinchlink_tmp_list:page_link_list.append (link_tmp)returnpage_link_listexceptException, ex:Print(ex.message)return []defdownload_pictures (Album_link, min_page_id, max_page_id, Picture_count_per_page, Save_path): Check_save_path ( Save_path) min_page_id=0 whilemin_page_id <Max_page_id:thumb_page_link= Album_link +"? Start={0}". Format (MIN_PAGE_ID *picture_count_per_page) Thumb_picture_links=Get_thumb_picture_link (Thumb_page_link) forThumb_picture_linkinchThumb_picture_links:full_picture_link= Thumb_picture_link.replace ("Photo/thumb","Photo/large") Save_flag= Save_image (Image_link=full_picture_link, save_path=Save_path)if  notSave_flag:full_picture_link= Thumb_picture_link.replace ("Photo/thumb","Photo/photo") save_image (Image_link=full_picture_link, save_path=save_path) Time.sleep (1) min_page_id+ = 1Print("Download Complete")#set up a local folder for picture savingSave_path ="J:\\douban\\meiren2"#set the album address, and note the end of the backslashAlbum_link ="https://www.douban.com/photos/album/43697061/"#set total pages of albumsmax_page_id = 9#set the number of pictures per page, default to 18 sheetsPicture_count_per_page = 18download_pictures (Album_link, max_page_id, Picture_count_per_page, Save_path)
View Code

=============================================================

Relative Urllib2, urllib really compare pits, if not set user-agent, download speed will be super slow, also need to call the socket module to set the timeout time, compare toss, eventually may also step on other pits, for example, I was under the watercress to ' Shield ', so recommended to use URLLIB2.

Related Reference Links:

Http://www.jb51.net/article/57239.htm

Http://www.crifan.com/use_python_urllib-urlretrieve_download_picture_speed_too_slow_add_user_agent_for_ urlretrieve/comment-page-1/

=============================================================

National Day last day, I wish you a happy National day!

python--Bulk Download Watercress pictures

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.