First, the project mainly consists of the following three steps:
- Configure Database Information
- Writing a crawler script
- Configuring the Jenkins Timer task
- View Collection Results
Ii. detailed procedures 1. Configure Database Information
Build a table statement with some of these fields as an example:
CREATE TABLE ' stockmarket ' (' Date ' varchar () Not NULL DEFAULT"'COMMENT'Time', ' stockcode ' varchar () Not NULL DEFAULT"'COMMENT'Stock Code', ' stockname ' varchar ((+) DEFAULT NULL COMMENT'Stock name', ' Close ' Decimal (19,2) DEFAULT NULL COMMENT'Price', ' High ' decimal (19,2) DEFAULT NULL COMMENT'Highest', ' Low ' decimal (19,2) DEFAULT NULL COMMENT'Minimum', ' Amplituderatio ' Decimal (19,2) DEFAULT NULL COMMENT'amplitude', ' Turnoverratio ' Decimal (19,2) DEFAULT NULL COMMENT'rate of change of hand', ' Preclose ' Decimal (19,2) DEFAULT NULL COMMENT'prev. closed', ' Open ' decimal (19,2) DEFAULT NULL COMMENT'Open Price', PRIMARY KEY (' Date ', ' Stockcode ')) ENGINE=innodb DEFAULT Charset=utf8;
Configure JSON data to a. json file for reading configuration information for database connections
"Stockmarket":{ "Host":"localhost", "Port": 3326, "User":"Root", "Password":"Password", "Database":"Stockmarket", "CharSet":"UTF8" }
2. Scripting
The Python library involved
Import re,pymysql,json,time,requests
Code writing
#!/usr/bin/env python#-*-coding:utf-8-*-#@Author: Torre Yang Edit with Python3.6#@Email: [Email protected]#@Time: 2018/6/28 10:50#regularly crawl daily stock market data;#Stock Data content:ImportGetsoupImportPymysqlImportOSImportReImportJSONImportRequestsImportConnect_databaseImport Time#DB connectionConnectdb =connect_database.connectdatabase () get_conf= Connectdb.get_conf ('Databases_conf.json') conn, cur= connectdb.connect_db (get_conf["Stockmarket"]["Host"], get_conf["Stockmarket"]["User"], get_conf["Stockmarket"]["Password"], get_conf["Stockmarket"]["Database"], get_conf["Stockmarket"]["Port"])#The first step is to obtain the stock code of all shares in Shanghai/Shenzhen via East NET and store it in the listURL ='http://quote.eastmoney.com/stocklist.html#'Soup=getsoup.getsoup (URL) uls= Soup.select ('Div#quotesearch Li')#Regular Expressions Get all the stock codesRe1 = Re.compile (r'href= "http://quote.eastmoney.com/(. +?). HTML "') Stockcodes=Re1.findall (str (ULS))#print (stockcodes)#The second step is to add the stock code to the stock search URL.Stockvalues = [] forStockcodeinchStockcodes:#url = ' https://gupiao.baidu.com/stock/' +stockcode+ '. html 'URL ='Https://gupiao.baidu.com/api/rails/stockbasicbatch?from=pc&os_ver=1&cuid=xxx&vv=100&format =json&stock_code='+stockcode+"' #print (URL) #url = ' https://gupiao.baidu.com/api/rails/stockbasicbatch?from=pc&os_ver=1&cuid=xxx&vv=100& Format=json&stock_code=sh201003 'Response =requests.get (URL) response.raise_for_status () Res=response.contentTry: Jsondatas= Json.loads (res, encoding='Utf-8') except: Print('resolve to Empty') Datas= jsondatas['Data'] ) forDatainchdatas:#Add today's date (trading day)Date = Time.strftime ("%y-%m-%d", Time.localtime ()) Stockcode= data['Stockcode'] StockName= data['StockName'] Close= data['Close'] High= data[' High'] Low= data[' Low'] Amplituderatio= data['Amplituderatio'] Turnoverratio= data['Turnoverratio'] Preclose= data['Preclose'] Open= data['Open'] SQL='INSERT INTO stockmarket (Date,stockcode,stockname,close,high,low,amplituderatio,turnoverratio,preclose,open) VALUES ("'+STR (date) +'","'+str (Stockcode) +'","'+str (StockName) +'","'+STR (Close) +'","'+str (High) +'","'+STR (Low) +'","'+str (Amplituderatio) +'","'+str (Turnoverratio) +'","'+str (Preclose) +'","'+STR (Open) +'")' Print(SQL)if 'None' inchsql:Print('Jump this data') Else: Try: Connectdb.get_fetch (conn, cur, sql)except: Print('data exceptions, skipping')Print('Data acquisition completed')
3. Configure Jenkins
Remote SSH configuration, configuring timed tasks (Tip: Recommended night to collect (or close time), as trading time, stock data in the dynamic change)
jenkins> System Configuration >ssh remote hosts ( I am installed virtual machine, CENTOS7 version, has been configured to Jdk,python3,mysql,tomcat and other common software services )
4. Verify the results
Source Address: Https://github.com/Testworm/stockMarket.git
Stock trading day Timing crawl SSE/SZSE all stock market data stored in the database