1 #-*-coding:utf-8-*-2 3 ImportRequests4 5 fromBs4ImportBeautifulSoup6 7R = Requests.get ('http://cie.szu.edu.cn/szucie/index.php/category/jsfc/')8 9 #the returned ' R ' is an object that contains a wide variety of things required for the entire HTTP protocolTen OneHTML =r.content A - #get web page source code - theSoup = BeautifulSoup (HTML,'Html.parser')#Html.parser is the parser - - #The following pages are extracted based on what we see. First extract the first line of this part of the code, first to locate this part of the code: - +Div_people_list = Soup.find ('Div', Attrs = {'class':'col-mb-12 col-8 Detail'}) - + #This uses the Find method of the BeautifulSoup object, which means to find the code with the ' div ' tag and the parameter contains "class = ' people_list '". If there is more than one, the Find method takes the first A ata_s = Div_people_list.find_all ('a') - - #this uses the Find_all method to remove all code labeled "A" and returns a list. The "href" parameter in the "a" tag is the information we need for the teacher's profile, and the text in the label is the teacher's name . - - forAincha_s: -URL = a['href'] inName =A.get_text () - PrintName, URL
The results of the operation are as follows:
Python crawler Crawl deep information Engineering College teacher personal introduction URL Small instance