Python3 crawl the basic data of the listed company

Source: Internet
Author: User

At present, the number of listed companies in China and Shenzhen A shares of 3,524 [2018/05/31], if you can obtain the basic information of these companies from a financial website, will help the research work. Before we do this, let's start by combing our data and its origins.

First, what company basic information do we need?

According to the data provided by the giant Tide Information Network, the company's basic information includes the company's full name, English title, registered address, company abbreviation, legal representative, company secretary, registered capital (million), industry category, postal code, company telephone, company fax, company website, time to market, IPO time, issue quantity (million shares), The issue price (yuan), the issue of P/E ratio (times), the mode of issuance, the main underwriter, the listing Recommender and the sponsoring agency, a total of 21 information.

Our goal is to get the 21 or the data we need in bulk.

Second, what is the source of the data?

Before, try to crawl data from the eyes of the sky, but the anti-crawling mechanism of the eye check really makes people headache. Personal feeling giant Tide information network site structure is still relatively standard, and this site does not tyc-num such as anti-crawling, so that the work saved a lot of time.

The data source is the giant tide information network.

Third, how to achieve access?

Our goals and sources are clear, and the next step is to get the problem. This is also a key part of the feature. First, the functional characteristics of the program, that is, enter the stock code (one or more), bulk access to the basic information of these companies. Secondly, the input of the stock code, can be their own target company, can also be sub-industry, sub-concept, sub-region, etc. (demo code using Tushare to the Shanghai and Shenzhen All-a-share listed companies). Again, the bulk of the company's basic information, this is related to the Python crawler part. Crawler part, the basic idea is to use Tushare to obtain the stock code, in order to obtain a link to the company's information, using requests to obtain the Web page source code, using BS4 to parse the structure of the Web page, find the necessary information, and then print or save.

Summed up, here with the TUSHARE,REQUESTS,BS4, and other basic content. The demo code prints the information directly in the window, or you can try to save the information in a file such as Excel,csv.

The next step is the code section. There are imperfections in the code, thanks for the criticism.

#-*-coding:gb2312-*-from BS4 Import beautifulsoupimport requestsimport bs4import tushare as ts# get source def check_link (URL): try:r = Requests.get (URL) r.raise_for_status () r.encoding = r.apparent_encoding ret Urn R.text except:print (' Cannot link server! ') #解析并抓取 '
Print section, get 9 pieces of information, change this part of the parameters, you can get a total of 21 pieces of information.
"Def get_contents (ulist,rurl): Soup = BeautifulSoup (rurl, ' lxml ') TRS = Soup.find_all (' tr ') for TR in TRS: UI = [] for TD in Tr:ui.append (Td.string) Ulist.append (UI) print (Ulist[1][3],ulist[3][3],ul IST[5][3],ULIST[7][3],ULIST[8][3],ULIST[13][3],ULIST[15][3],ULIST[16][3],ULIST[17][3])
#定义主函数def Main (): DF = ts.get_stock_basics () for i in df.index:index_int = Int (i) Urli = [] ur L = "http://www.cninfo.com.cn/information/brief/shmb%d.html"%index_int try:rs = Check_link (URL) Get_contents (URLI,RS) continue Except:pass main ()

  

Python3 crawl the basic data of the listed company

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.