No connotation of satin can be brushed, using Python crawl Chiyo home paste pictures and small video (including source)

Last Update:2018-06-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Due to the latest video rectification of the storm, the connotation of the app was forced to close, the vast number of friends homeless, but recently found a "segment Friends" app, version update is also very fast, is calling the broad Chiyo home, such as, interested can download see (PS: I am not advertising, confiscation of advertising fees)

At the same time, the former colleague also sent a paste of the pieces of the settlement, Sir a little, immediately on the connection:
Chiyo Home https://tieba.baidu.com/f?ie= ...

Then, see above, indeed a lot of Chiyo on top, so, I want to crawl their pictures and small video, there is the topic of this article:

In fact, using Python to crawl the site data is the most basic thing, it is not difficult, but I also want to share to everyone, learning and communication.

The main modules used to crawl the data in these sites are BS4, requests, and OS, which are common modules

The idea is to request the Web page HTML data through the requests module, and then through the BS4 module BeautifulSoup analysis of the requested page, and then through the CSS Finder to find the connotation of the picture of the satin and the address of the small video, the main implementation code is as follows:

  def download_file (web_url): "" "gets the URL of the resource" "# download page print (' downloading page:%s ... '% web_url) result = Request S.get (web_url) soup = bs4.        BeautifulSoup (Result.text, "Html.parser") # Find picture Resource img_list = Soup.select ('. Vpic_wrap img ') if img_list = = []: Print (' No picture resources found! ') Else: # Find resource, start writing for img_info in img_list:file_url = Img_info.get (' bpic ') writ E_file (File_url, 1) # Find video Resource video_list = Soup.select ('. Threadlist_video a ') if video_list = = []: Print (' not Discover Video Resources!            ') Else: # Find resource, start writing for video_info in video_list:file_url = Video_info.get (' Data-video ') Write_file (File_url, 2) print (' Download resource end: ', web_url) Next_link = Soup.select (' #frs_list_pager. Next ') if NEX T_link = = []: print (' Download data end! ') Else:url = next_link[0].get (' href ') download_file (' https: ' + URL) learning python+ 725479218

Get the image and the address of the video, certainly not enough, but also to write these resources locally, by binary way to read the remote file resources, and then write to the local classification, the implementation of the main code is as follows:

def write_file(file_url, file_type):    """写入文件"""    res = requests.get(file_url)    res.raise_for_status()    # 文件类型分文件夹写入    if file_type == 1:        file_folder = ‘nhdz\\jpg‘    elif file_type == 2:        file_folder = ‘nhdz\\mp4‘    else:        file_folder = ‘nhdz\\other‘    folder = os.path.exists(file_folder)    # 文件夹不存在，则创建文件夹    if not folder:        os.makedirs(file_folder)    # 打开文件资源，并写入    file_name = os.path.basename(file_url)    str_index = file_name.find(‘?‘)    if str_index > 0:        file_name = file_name[:str_index]    file_path = os.path.join(file_folder, file_name)    print(‘正在写入资源文件：‘, file_path)    image_file = open(file_path, ‘wb‘)    for chunk in res.iter_content(100000):        image_file.write(chunk)    image_file.close()    print(‘写入完成！‘)学习Python+  725479218

Finally, complete the code. Otherwise, will be said, say half, say welfare, also do not give full, this is not enough meaning. Sir, come on now ...

#!/usr/bin/env python#-*-coding:utf-8-*-"" "Crawl Baidu Paste, Chiyo home pictures and videos author:cuizytime:2018-05-19" "" Import Requestsimport    Bs4import osdef write_file (File_url, File_type): "" "Write file" "res = Requests.get (file_url) res.raise_for_status () # File Type sub-folder Write if File_type = = 1:file_folder = ' nhdz\\jpg ' elif file_type = = 2:file_folder = ' nhdz\\ MP4 ' Else:file_folder = ' nhdz\\other ' folder = Os.path.exists (file_folder) # Folders do not exist, then create folder if not fold Er:os.makedirs (file_folder) # Open File resource and write file_name = Os.path.basename (file_url) Str_index = File_name.fin    D ('? ') If str_index > 0:file_name = file_name[:str_index] File_path = Os.path.join (File_folder, file_name) print (' Writing resource file: ', file_path) image_file = open (File_path, ' WB ') for Chunk in Res.iter_content (100000): Image_file . Write (chunk) image_file.close () print (' Write complete! ') Learn python+ 725479218def download_file (web_url): "" "gets the URL of the resource" "" # download page print (' IsDownload page:%s ... '% web_url) result = Requests.get (web_url) soup = bs4.        BeautifulSoup (Result.text, "Html.parser") # Find picture Resource img_list = Soup.select ('. Vpic_wrap img ') if img_list = = []: Print (' No picture resources found! ') Else: # Find resource, start writing for img_info in img_list:file_url = Img_info.get (' bpic ') writ E_file (File_url, 1) # Find video Resource video_list = Soup.select ('. Threadlist_video a ') if video_list = = []: Print (' not Discover Video Resources!            ') Else: # Find resource, start writing for video_info in video_list:file_url = Video_info.get (' Data-video ') Write_file (File_url, 2) print (' Download resource end: ', web_url) Next_link = Soup.select (' #frs_list_pager. Next ') if NEX T_link = = []: print (' Download data end!    ') Else:url = next_link[0].get (' href ') download_file (' https: ' + URL) # Main program entry if __name__ = = ' __main__ ': Web_url = ' https://tieba.baidu.com/f?ie=utf-8&kw= Chiyo home ' Download_file (web_url)

No content satin can be brushed, using Python crawl Chiyo home paste pictures and small video (including source code)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

No connotation of satin can be brushed, using Python crawl Chiyo home paste pictures and small video (including source)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

No connotation of satin can be brushed, using Python crawl Chiyo home paste pictures and small video (including source)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support