Stock trading day Timing crawl SSE/SZSE all stock market data stored in the database

First, the project mainly consists of the following three steps:
    1. Configure Database Information
    2. Writing a crawler script
    3. Configuring the Jenkins Timer task
    4. View Collection Results
Ii. detailed procedures 1. Configure Database Information

Build a table statement with some of these fields as an example:

CREATE TABLE ' stockmarket ' (' Date ' varchar () Not NULL DEFAULT"'COMMENT'Time', ' stockcode ' varchar () Not NULL DEFAULT"'COMMENT'Stock Code', ' stockname ' varchar ((+) DEFAULT NULL COMMENT'Stock name', ' Close ' Decimal (19,2) DEFAULT NULL COMMENT'Price', ' High ' decimal (19,2) DEFAULT NULL COMMENT'Highest', ' Low ' decimal (19,2) DEFAULT NULL COMMENT'Minimum', ' Amplituderatio ' Decimal (19,2) DEFAULT NULL COMMENT'amplitude', ' Turnoverratio ' Decimal (19,2) DEFAULT NULL COMMENT'rate of change of hand', ' Preclose ' Decimal (19,2) DEFAULT NULL COMMENT'prev. closed', ' Open ' decimal (19,2) DEFAULT NULL COMMENT'Open Price', PRIMARY KEY (' Date ', ' Stockcode ')) ENGINE=innodb DEFAULT Charset=utf8;

Configure JSON data to a. json file for reading configuration information for database connections

"Stockmarket":{      "Host":"localhost",      "Port": 3326,      "User":"Root",      "Password":"Password",      "Database":"Stockmarket",      "CharSet":"UTF8"    }

2. Scripting

The Python library involved

Import re,pymysql,json,time,requests

Code writing

#!/usr/bin/env python#-*-coding:utf-8-*-#@Author: Torre Yang Edit with Python3.6#@Email: [Email protected]#@Time: 2018/6/28 10:50#regularly crawl daily stock market data;#Stock Data content:ImportGetsoupImportPymysqlImportOSImportReImportJSONImportRequestsImportConnect_databaseImport Time#DB connectionConnectdb =connect_database.connectdatabase () get_conf= Connectdb.get_conf ('Databases_conf.json') conn, cur= connectdb.connect_db (get_conf["Stockmarket"]["Host"], get_conf["Stockmarket"]["User"], get_conf["Stockmarket"]["Password"], get_conf["Stockmarket"]["Database"], get_conf["Stockmarket"]["Port"])#The first step is to obtain the stock code of all shares in Shanghai/Shenzhen via East NET and store it in the listURL =''Soup=getsoup.getsoup (URL) uls= ('Div#quotesearch Li')#Regular Expressions Get all the stock codesRe1 = Re.compile (r'href= " +?). HTML "') Stockcodes=Re1.findall (str (ULS))#print (stockcodes)#The second step is to add the stock code to the stock search URL.Stockvalues = [] forStockcodeinchStockcodes:#url = '' +stockcode+ '. html 'URL ='Https:// =json&stock_code='+stockcode+"'    #print (URL)    #url = ' Format=json&stock_code=sh201003 'Response =requests.get (URL) response.raise_for_status () Res=response.contentTry: Jsondatas= Json.loads (res, encoding='Utf-8')    except:        Print('resolve to Empty') Datas= jsondatas['Data']   )     forDatainchdatas:#Add today's date (trading day)Date = Time.strftime ("%y-%m-%d", Time.localtime ()) Stockcode= data['Stockcode'] StockName= data['StockName'] Close= data['Close'] High= data[' High'] Low= data[' Low'] Amplituderatio= data['Amplituderatio'] Turnoverratio= data['Turnoverratio'] Preclose= data['Preclose'] Open= data['Open'] SQL='INSERT INTO stockmarket (Date,stockcode,stockname,close,high,low,amplituderatio,turnoverratio,preclose,open) VALUES ("'+STR (date) +'","'+str (Stockcode) +'","'+str (StockName) +'","'+STR (Close) +'","'+str (High) +'","'+STR (Low) +'","'+str (Amplituderatio) +'","'+str (Turnoverratio) +'","'+str (Preclose) +'","'+STR (Open) +'")'        Print(SQL)if 'None' inchsql:Print('Jump this data')        Else:            Try: Connectdb.get_fetch (conn, cur, sql)except:                Print('data exceptions, skipping')Print('Data acquisition completed')
3. Configure Jenkins

Remote SSH configuration, configuring timed tasks (Tip: Recommended night to collect (or close time), as trading time, stock data in the dynamic change)

jenkins> System Configuration >ssh remote hosts ( I am installed virtual machine, CENTOS7 version, has been configured to Jdk,python3,mysql,tomcat and other common software services )

4. Verify the results

Source Address: Https://

