Python uses requests and BeautifulSoup to build crawler instance code,
This article focuses on Python's use of requests and BeautifulSoup to build a web crawler. The specific steps are as follows.
Function Description
In Python, you can use the requests module to request an html file for response from a url, and then use BeautifulSoup to parse an html file.
Case
Assume that you want to obtain the top100movie information of the 4-cat-eye movies at http://maoyan.com/board/. for example:
Obtain the title and url of a movie.
Install requests and BeautifulSoup
Install the two tools using pip.
pip install requests
pip install beautifulsoup4
Program
_ Author _ = 'Qian yang' #-*-coding: UTF-8-*-import requestsfrom bs4 import BeautifulSoupdef get_one_page (url): response = requests. get (url) if response. status_code = 200: return response. content. decode ("utf8", "ignore "). encode ("gbk", "ignore") # parse def bs4_paraser (html): all_value = [] value = {} soup = BeautifulSoup (html, 'html. parser ') # obtain all_div_item = soup for each movie. find_all ('div ', attrs = {'class': 'movie-item-info'}) for r in all_div_item: # Get the movie name and url title = r. find_all (name = "p", attrs = {"class": "name"}) [0]. string movie_url = r. find_all ('P', attrs = {'class': 'name'}) [0]. a ['href '] value ['title'] = title value ['movie _ url'] = movie_url all_value.append (value) value = {} return all_valuedef main (): url = 'HTTP: // maoyan.com/board/4' html = get_one_page (url) all_value = bs4_paraser (html) print (all_value) if _ name _ = '_ main __': main ()
Code testing is available to achieve the following results:
Summary
The above is all about Python's use of requests and BeautifulSoup to build crawler instance code. I hope it will be helpful to you. If you are interested, you can continue to refer to other related topics on this site. If you have any shortcomings, please leave a message. Thank you for your support!