Some of the things that have recently looked at time series analysis are commonly used in the middle of a bag called pandas, so take time alone to learn.
See Pandas official documentation http://pandas.pydata.org/pandas-docs/stable/index.html
and related Blogs http://www.cnblogs.com/chaosimple/p/4153083.html
Pandas introduction
Pandas is a Python data analysis package originally developed by AQR Capital Management in April 2008 and open source at the end of 2009, and is currently being developed and maintained by the Pydata development team focused on Python packet development, Part of the Pydata project. Pandas was originally developed as a financial data analysis tool, so pandas provides a good support for time series analysis.
Pandas is a data analysis package built on Numpy that contains more advanced structures and tools, similar to the core of Numpy, where Ndarray,pandas is also built around the two core data structures of Series and DataFrame.
Series and DataFrame correspond to one-dimensional sequences and two-dimensional table structures, respectively.
The required packages are usually introduced in the following ways
1 Import Pandas as PD 2 Import NumPy as NP 3 import Matplotlib.pyplot as Plt
Series
series can be regarded as a fixed-length ordered dictionary, and the basic arbitrary one-dimensional data can be used to construct Series objects. You can create a Series by passing a list object ,Pandas will create an integer index by default
>>> s = PD. Series ([1,2,3.0,'a','BC'])>>> S0 a4 bcdtype:object
The Series object contains two main attributes: Index and values, respectively, of the left and right columns in the previous example. Because a list is passed to the constructor, the value of index is an integer incremented from 0, and if a key-value pair structure of a dictionary is passed in, a index-value corresponding Series is generated, or an Index object is explicitly specified with the keyword argument at initialization time.
>>> s = PD. Series (data=[1,2,3.0, ' A ', ' bc '],index = [' A ', ' B ', ' C ', ' d ', ' e '])
>> > s.name = ' example '
>>> s.index.name= ' New_index '
Span style= "FONT-SIZE:18PX;" >>>> s
a 1
b 2
c 3
d a
e BC
name:example, Dtype: Object
>>> S.index
Index ([u ' a ', U ' b ', U ' C ', U ' d ', U ' e '], dtype= ' object ')
>>> s.values
Array ([1, 2, 3.0, ' A ', ' BC '], Dtype=object)
The elements of a Series object are constructed strictly according to the index given, which means that if the data parameter is a key-value pair, only the keys contained in index are used, and if the key for the response is missing in data, the key is added even if the NaN value is given.
Note that there is a correspondence between the index of the Series and the elements of values, but this is different from the dictionary mapping. Index and values are actually still separate ndarray arrays, so the performance of the Series object is completely OK.
The greatest benefit of this series of data structures using key-value pairs is that index is automatically aligned when arithmetic operations are performed between series.
Both Series and index have the Name property.
DataFrame
DataFrame is a tabular data structure that contains a set of ordered columns (similar to index), each of which can be a different value type (unlike Ndarray can have only one dtype). You can basically think of DataFrame as a collection of Series that shares the same index.
DataFrame is constructed in a similar way to Series, except that it can accept multiple one-dimensional data sources at the same time, each of which becomes a separate column
You can create a DataFrame by passing a Dictionary object that can be converted to a similar sequence structure
Data_dic = ({'A': 1., 'B': PD. Timestamp ('20130102'), 'C': PD. Series (1,index=list (range (4)), dtype='float32'), 'D': Np.array ([3] * 4,dtype='Int32'), 'E': PD. Categorical (["Test","Train","Test","Train"]), 'F':'Foo'}) DF=PD. DataFrame (data_dic)>>>DF A B C D E F01 2013-01-02 1 3Test Foo1 1 2013-01-02 1 3train Foo2 1 2013-01-02 1 3Test Foo3 1 2013-01-02 1 3 train foo
Although Data_dic is a dictionary, the dictionary's key A B C D is not the role of index in DataFrame, but rather the "name" property of the Series. The index of the DataFrame generated here is 0 1 2 3.
>>>1, 2, 3], dtype='int64')>>> DF. B0 2013-01-021 2013-01-022 2013-01-023 2013-01-02name:b, Dtype:datetime64[ns]
Can pass DF. b Such a form of access to the different Series represented by a B C D
You can also create a DataFrame by passing a numpy array, a time index, and a column label
Dates = Pd.date_range ('20130101', periods=6) DF2= PD. DataFrame (Np.random.randn (6,4), Index=dates, Columns=list ('ABCD'))>>>df2 A B C D2013-01-01-0.941915-1.304691-0.837790-0.8051012013-01-02-0.665522-2.935955 1.249425 0.9023902013-01-03-0.419268 0.750735-0.547377-0.0751512013-01-04 1.362527-1.059686-1.564129-1.2675062013-01-05 0.719452-0.152727 0.319914-0. 4485352013-01-06-0.863264-0.548317 0.277112 1.233825>>>Df.indexint64index ([0,1, 2, 3], dtype='Int64')>>>Df.valuesarray ([[1.0, Timestamp ('2013-01-02 00:00:00'), 1.0, 3,'Test','Foo'], [1.0, Timestamp ('2013-01-02 00:00:00'), 1.0, 3,'Train','Foo'], [1.0, Timestamp ('2013-01-02 00:00:00'), 1.0, 3,'Test','Foo'], [1.0, Timestamp ('2013-01-02 00:00:00'), 1.0, 3,'Train','Foo']], dtype=object)
Basic operations
1. Head () and tail () and a few lines in the tail
>>> Df.head (2) A B C D E F0 1 2013-01-02 1 3 Test foo1 1 2013-01-02 1 3 train foo>>> df.tail (2) A B C D E F2 1 2013-01-02 1 3 Test foo3 1 2013-01-02 1 3 train foo
2. Describe () Quick statistics
>>> df.describe () A C Dcount 4 4 4mean 1 1 3std 0 0 0min 1 1 325% 1 1 350% 1 1 375% 1 1 3max 1 1 3
3. Transpose
>>>DF. T 01 2A1 1 1B2013-01-02 00:00:00 2013-01-02 00:00:00 2013-01-02 00:00:00C1 1 1D3 3 3E test train test F foo foo Foo3A1B2013-01-02 00:00:00C1D3 E train F foo
4. Sort by axes
>>> Df.sort_index (axis=1,ascending=True) A B C D E F0 1 2013-01-02 1 3 Test foo1 1 2013-01-02 1 3 train foo2 1 2013-01-02 1 3 Test foo3 1 2013-01-02 1 3 train foo>>> df.sort_index (axis=1,ascending=False) F E D C B A0 foo test 3 1 2013-01-02 foo train 3 1 2013-01-02 foo Test 3 1 2013-01-02 foo train 3 1 2013-01-02 1
5. Sort by value
>>> Df2.sort (columns='B', ascending=True) A B C D2013-01-02-0.665522-2.935955 1.249425 0.9023902013-01-01-0.941915-1.304691-0.837790-0.8051012013-01-04 1.362527- 1.059686-1.564129-1.2675062013-01-06-0.863264-0.548317 0.277112 1.2338252013-01-05 0.719452-0.152727 0.319914-0. 4485352013-01-03-0.419268 0.750735-0.547377-0.075151>>> Df2.sort (columns='B', ascending=False) A B C D2013-01-03-0.419268 0.750735-0.547377-0.0751512013-01-05 0.719452-0.152727 0.319914-0.4485352013-01-06-0.863264- 0.548317 0.277112 1.2338252013-01-04 1.362527-1.059686-1.564129-1.2675062013-01-01-0.941915-1.304691-0.837790-0. 8051012013-01-02-0.665522-2.935955 1.249425 0.902390
Quickly learn the pandas of Python data analysis packages