China Certificate Board Market page: Http://www.csindex.com.cn/zh-CN/downloads/index-information
Like get a table in the '
index abbreviation |
static P/E |
rolling P/E |
City Net rate |
Dividend Rate |
static P/E ratio at end of year |
rolling earnings at the end of last year |
net rate at the end of last year |
’
and other data.
Not in the JS format loaded, in the Web page source code can not find the corresponding data, so the method of crawling to some changes.
#HANBB#Come on!!!ImportRequestsImportJSON fromBs4ImportBeautifulSoupImportReImportCSV
# First Visit link Note: The link here is not a link to a Web page but a link to JS
defget_html_text (URL):Try: R=requests.get (URL) r.raise_for_status () r.encoding=r.apparent_encodingreturnR.textexcept: return ""
# page parsing, in fact, it is not used heredefgetinfo (URL): HTML=get_html_text (URL) soup= BeautifulSoup (HTML,"Html.parser") returnSoup
# Use of regular expressionsdefRe_find (re_express,url): Zhishu_info= Re.findall (r'zsgz{}\d= ". +?"'. Format (re_express), Get_html_text (URL))returnZhishu_info#Data StoragedefSave (Filename,info): File= Open ('E:\\download2\\{}.csv'. Format (filename),'a', newline="')#Open the file name, append mode, do not write newline= "will appear line spacing becomes largerWriterfile = csv.writer (file)#Write CommandWriterfile.writerow (Info)#Write ContentFile.close ()#Close Fileif __name__=='__main__': URL="Http://www.csindex.com.cn/data/js/show_zsgz.js?str=nG1Rum4NumQpqwaW" Print(Get_html_text (URL)) forIinchRange (1,12): Szzs=re_find (i, URL)#print (Szzs)List2 = [] forJinchSzzs:info=re.findall (R'"(.+?)"', j) [0] # go to the string in the list and combine the strings into a list Info_all=List2.append (Info)Print(List2) Save ('Zhubanzhishu', List2)
JS format processing: https://zhuanlan.zhihu.com/p/24838761
Crawl the headlines today, this is good!!
Main Board index P/E and other data crawling (class JS format processing, with JS processing)