The Arima algorithm is used to predict time series. __ algorithm

Source: Internet
Author: User
Tags diff mongoclient truncated statsmodels

This paper takes Hongyong China as an example, extracts the data and uses the ARIMA algorithm to predict the time series.

Crawl data


# Crawl Line Kanhong China Fund
From BS4 import BeautifulSoup
Import requests

headers = {' Accept ': ' Text/javascript, Application/javascript, */*; q=0.01 ',
' accept-encoding ': ' gzip, deflate ',
' Accept-language ': ' zh-cn,zh;q=0.8 ',
' Connection ': ' Keep-alive ',
' Cookie ': ' vjuids=148cf0186.15e03abf2ac.0.c311af0ddaa6c; Advs=358187b0bd1a65; asl=17431,000pn,7010519170105191; JRJ_UID=15060593555978DJCIWMVNB; jrj_z3_newsid=723; ADVC=35686F6CAEEDF3; wt_fpc=id=2ef30c6a0af7eaf3a501506059355507:lv=1506063782501:ss=1506063782501; Channelcode=3763bexx; ylbcode=24s2az96; vjlast=1503300154.1506059356.23; hm_lvt_a07bde197b7bf109a325eebaee445939=1506059356; hm_lpvt_a07bde197b7bf109a325eebaee445939=1506063783 ',
' Host ': ' fund.jrj.com.cn ',
' Referer ': ' http://fund.jrj.com.cn/archives,968006,jjjz.shtml ',
' User-agent ': ' mozilla/5.0 (Windows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/60.0.3112.90 safari/537.36 ',
' X-requested-with ': ' XMLHttpRequest '}

params = {' Fundcode ': ' 968006 ',
' obj ': ' obj ',
' Date ': 2017}

r = Requests.get (' Http://fund.jrj.com.cn/json/archives/history/netvalue? ', Params=params,headers=headers)
r.encoding = ' Utf-8 '
MyData = R.text

Storing data


# Extract standard JSON format data from a string
Table = mydata[8:]

# Convert strings to JSON without manual parsing
Myjson = json.loads (table)

# Extract Net Worth data
myjson[' Fundhistorynetvalue ']
From Pymongo import mongoclient

db = Mongoclient (' localhost ', 27017) [' Fund ']
Collect = db.get_collection (' hjhy ')
Collect.insert (myjson[' Fundhistorynetvalue ')
Print (' Done ')

Extract & Process data


From Pymongo import mongoclient
Import Pandas as PD
Import Time,datetime

db = Mongoclient (' localhost ', 27017) [' Fund ']
data = Dict ()

For item in Db.get_collection (' hjhy '). Find ():
Data[datetime.datetime.fromtimestamp (Time.mktime (Time.strptime (item[' enddate '), '%y-%m-%d '))] = item[' accum_net ' ]

Using the Arima model to predict


1. Build Time Series

# Build Time Series
My_series = PD. Series (data, Data.keys ())

# processing data types, converting str to float
My_series = my_series.apply (lambda x:float (x))

# Chronological ORDER by date
My_series = My_series.sort_index ()


2. View Trend Chart

Since the establishment of the Fund, the trend of price growth has changed.

%pylab
# Plot (my_series)
My_series.plot ()

The direct use of plot (my_series) will be more than a line to draw the first and last connection. or use My_series.plot () to call the object's own plot method.


3. Perform differential operation

From matplotlib import Pyplot as Plt

# First Order Difference
Fig = Plt.figure ()
diff1 = My_series.diff (1)
Diff1.plot ()

# Second Order Difference
Fig = Plt.figure ()
DIFF2 = My_series.diff (2)
Diff2.plot ()


4. First-order differential



5. Second Order Difference



6. View descriptive statistics

# first-order differential descriptive statistics
Diff1.dropna (Inplace=true)
Diff1.describe ()

Each time you do a differential, you will produce an NA, so remember to remove Na. The following results are descriptive statistics for DIFF1:

# second-order difference descriptive statistics
Diff2.dropna (Inplace=true)
Diff2.describe ()

The following results are descriptive statistics for DIFF2:

So it's enough to make a difference.


7. Determine p, q parameter values


Import Statsmodels.api as SM

Fig = Plt.figure ()

ax0 = Fig.add_subplot (211)
Fig = SM.GRAPHICS.TSA.PLOT_ACF (diff1, lags=30, ax=ax0)

Ax1 = Fig.add_subplot (212)
Fig = SM.GRAPHICS.TSA.PLOT_PACF (diff1, lags=30, AX=AX1)


This is the first order difference autocorrelation and partial correlation trend graph, although the first order difference's smoothness is slightly better than the second order difference, but P>0,MR (q) truncated; Q>0,ar (p) truncated.


Choose to use the second-order difference, the autocorrelation and partial correlation trend of the second-order difference is shown below:


5. Forecast

From Statsmodels.tsa.arima_model import Arima

Model = ARIMA (History_price, (2, 1)). Fit ()

Model.forecast (10) [0]

Actual value

Forecast value

Array ([1.41013409, 1.4134152, 1.41570651, 1.41638723, 1.42131414, 1.42299673, 1.42647455, 1.42795939, 1.43 099336, 1.43316138])



Welcome all onlookers, long according to identify two-dimensional code, focus on "data analysis notes" ~

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.