Python crawl get lawyer phone numbers all over the country

Source: Internet
Author: User
Tags xpath

[This article is from the Sky Cloud-owned blog Park]

From the 64365 Web site to get the lawyer phone number across the country, using the Python lxml library for HTML page content parsing. The page content is as follows (the goal is to crawl "name + Phone"):

The code is as follows:

#Coding:utf-8 fromlxmlImportetreeImportRequests,lxml.html,osclassMyerror (Exception):def __init__(self, value): Self.value=valuedef __str__(self):returnrepr (Self.value)defget_lawyers_info (URL): R=requests.get (URL) HTML=lxml.html.fromstring (r.content) Phones= Html.xpath ('//span[@class = "Law-tel"]') Names= Html.xpath ('//div[@class = "FL"]/p/a')    if(len (phones) = =len (names)): List (Zip (names,phones)) Phone_infos= [(Names[i].text, Phones[i].text_content ()) forIinchRange (len (names))]Else: Error="Lawyers amount is not equal to the amount of phone_nums:"+URLRaisemyerror (Error) Phone_infos_list= []     forPhone_infoinchPhone_infos:if(Phone_info[1] = =""):            #print Phone_info[0],u "no phone left."info = phone_info[0]+": "+u"didn't leave the phone. \ n"        #Print Phone_info[0],phone_info[1]        Else: Info= phone_info[0]+": "+phone_info[1]+"\ r \ n"        PrintInfo Phone_infos_list.append (info)returnphone_infos_listdefget_pages_num (URL): R=requests.get (URL) HTML=lxml.html.fromstring (r.content) result= Html.xpath ('//div[@class = "U-page"]/a[last ()-1]') Pages_num=Result[0].textifpages_num.isdigit ():returnPages_numdefGet_all_lawyers (cities): Dir_path= Os.path.abspath (Os.path.dirname (__file__))    PrintDir_path File_path= Os.path.join (Dir_path,"Lawyers_info.txt")    PrintFile_pathifos.path.exists (File_path): Os.remove (File_path)#input ()With open ("Lawyers_info.txt","AB") as file: forCityinchCities:#file.write ("City:" +city+ "\ n")            #Print CityPages_num = Get_pages_num ("http://www.64365.com/"+city+"/lawyer/page_1.aspx")            ifPages_num: forIinchRange (int (pages_num)): URL="http://www.64365.com/"+city+"/lawyer/page_"+str (i+1) +". aspx"Info=get_lawyers_info (URL) foreachinchInfo:file.write (Each.encode ("GBK"))if __name__=='__main__': Cities= ['Beijing','Shanghai','Guangdong','Guangzhou','Shenzhen','Wuhan','Hangzhou','Ningbo','Tianjin','Nanjing','Jiangsu','Zhengzhou','Jinan','Changsha','Shenyang','Chengdu','Chongqing','Xian'] Get_all_lawyers (cities)

This is a crawl of the top cities and the results are as follows (saved to the "Lawyers_info.txt" file in the current directory):

Python crawl get lawyer phone numbers all over the country

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.