Quickly learn the pandas of Python data analysis packages

Last Update:2015-12-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

　Some of the things that have recently looked at time series analysis are commonly used in the middle of a bag called pandas, so take time alone to learn.

See Pandas official documentation http://pandas.pydata.org/pandas-docs/stable/index.html

and related Blogs http://www.cnblogs.com/chaosimple/p/4153083.html

Pandas introduction

　　Pandas is a Python data analysis package originally developed by AQR Capital Management in April 2008 and open source at the end of 2009, and is currently being developed and maintained by the Pydata development team focused on Python packet development, Part of the Pydata project. Pandas was originally developed as a financial data analysis tool, so pandas provides a good support for time series analysis.

　　Pandas is a data analysis package built on Numpy that contains more advanced structures and tools, similar to the core of Numpy, where Ndarray,pandas is also built around the two core data structures of Series and DataFrame.

Series and DataFrame correspond to one-dimensional sequences and two-dimensional table structures, respectively.

The required packages are usually introduced in the following ways

1 Import Pandas as PD 2 Import NumPy as NP 3 import Matplotlib.pyplot as Plt

Series

series can be regarded as a fixed-length ordered dictionary, and the basic arbitrary one-dimensional data can be used to construct Series objects. 　　You can create a Series by passing a list object ,Pandas will create an integer index by default

>>> s = PD. Series ([1,2,3.0,'a','BC'])>>> S0               a4    bcdtype:object

The Series object contains two main attributes: Index and values, respectively, of the left and right columns in the previous example. Because a list is passed to the constructor, the value of index is an integer incremented from 0, and if a key-value pair structure of a dictionary is passed in, a index-value corresponding Series is generated, or an Index object is explicitly specified with the keyword argument at initialization time.

>>> s = PD. Series (data=[1,2,3.0, ' A ', ' bc '],index = [' A ', ' B ', ' C ', ' d ', ' e '])
>> > s.name = ' example '
>>> s.index.name= ' New_index '
Span style= "FONT-SIZE:18PX;" >>>> s
a 1
b 2
c 3
d a
e BC
name:example, Dtype: Object

>>> S.index
Index ([u ' a ', U ' b ', U ' C ', U ' d ', U ' e '], dtype= ' object ')
>>> s.values
Array ([1, 2, 3.0, ' A ', ' BC '], Dtype=object)

The elements of a Series object are constructed strictly according to the index given, which means that if the data parameter is a key-value pair, only the keys contained in index are used, and if the key for the response is missing in data, the key is added even if the NaN value is given.

Note that there is a correspondence between the index of the Series and the elements of values, but this is different from the dictionary mapping. Index and values are actually still separate ndarray arrays, so the performance of the Series object is completely OK.

The greatest benefit of this series of data structures using key-value pairs is that index is automatically aligned when arithmetic operations are performed between series.

Both Series and index have the Name property.

DataFrame

DataFrame is a tabular data structure that contains a set of ordered columns (similar to index), each of which can be a different value type (unlike Ndarray can have only one dtype). 　　 You can basically think of DataFrame as a collection of Series that shares the same index.

DataFrame is constructed in a similar way to Series, except that it can accept multiple one-dimensional data sources at the same time, each of which becomes a separate column

You can create a DataFrame by passing a Dictionary object that can be converted to a similar sequence structure

Data_dic = ({'A': 1.,            'B': PD. Timestamp ('20130102'),            'C': PD. Series (1,index=list (range (4)), dtype='float32'),            'D': Np.array ([3] * 4,dtype='Int32'),            'E': PD. Categorical (["Test","Train","Test","Train"]),            'F':'Foo'}) DF=PD. DataFrame (data_dic)>>>DF A B C D E F01 2013-01-02 1 3Test Foo1 1 2013-01-02 1 3train Foo2 1 2013-01-02 1 3Test Foo3 1 2013-01-02 1 3 train foo

　　Although Data_dic is a dictionary, the dictionary's key A B C D is not the role of index in DataFrame, but rather the "name" property of the Series. The index of the DataFrame generated here is 0 1 2 3.

>>>1, 2, 3], dtype='int64')>>> DF. B0   2013-01-021   2013-01-022   2013-01-023   2013-01-02name:b, Dtype:datetime64[ns]

Can pass DF. b Such a form of access to the different Series represented by a B C D

You can also create a DataFrame by passing a numpy array, a time index, and a column label

Dates = Pd.date_range ('20130101', periods=6) DF2= PD. DataFrame (Np.random.randn (6,4), Index=dates, Columns=list ('ABCD'))>>>df2 A B C D2013-01-01-0.941915-1.304691-0.837790-0.8051012013-01-02-0.665522-2.935955 1.249425 0.9023902013-01-03-0.419268 0.750735-0.547377-0.0751512013-01-04 1.362527-1.059686-1.564129-1.2675062013-01-05 0.719452-0.152727 0.319914-0. 4485352013-01-06-0.863264-0.548317 0.277112 1.233825>>>Df.indexint64index ([0,1, 2, 3], dtype='Int64')>>>Df.valuesarray ([[1.0, Timestamp ('2013-01-02 00:00:00'), 1.0, 3,'Test','Foo'],       [1.0, Timestamp ('2013-01-02 00:00:00'), 1.0, 3,'Train','Foo'],       [1.0, Timestamp ('2013-01-02 00:00:00'), 1.0, 3,'Test','Foo'],       [1.0, Timestamp ('2013-01-02 00:00:00'), 1.0, 3,'Train','Foo']], dtype=object)

Basic operations

1. Head () and tail () and a few lines in the tail

>>> Df.head (2)   A          B  C  D      E    F0  1 2013-01-02  1  3    Test  foo1  1 2013-01-02  1  3  train  foo>>> df.tail (2)   A          B  C  D      E    F2  1 2013-01-02  1  3   Test  foo3  1 2013-01-02  1  3  train  foo

2. Describe () Quick statistics

>>> df.describe ()       A  C  Dcount  4  4  4mean   1  1  3std    0  0  0min    1  1  325%    1  1  350%    1  1  375%    1  1  3max    1  1  3

3. Transpose

>>>DF. T 01 2A1 1 1B2013-01-02 00:00:00 2013-01-02 00:00:00 2013-01-02 00:00:00C1 1 1D3 3 3E test train test F foo foo Foo3A1B2013-01-02 00:00:00C1D3 E train F foo

　　4. Sort by axes

>>> Df.sort_index (axis=1,ascending=True)   A          B  C  D      E    F0  1 2013-01-02  1  3   Test  foo1  1 2013-01-02  1  3  train  foo2  1 2013-01-02  1  3   Test  foo3  1 2013-01-02  1  3  train  foo>>> df.sort_index (axis=1,ascending=False)     F      E  D  C          B  A0  foo   test  3  1 2013-01-02  foo  train  3  1 2013-01-02  foo   Test  3  1 2013-01-02  foo  train  3  1 2013-01-02  1

5. Sort by value

>>> Df2.sort (columns='B', ascending=True) A B C D2013-01-02-0.665522-2.935955 1.249425 0.9023902013-01-01-0.941915-1.304691-0.837790-0.8051012013-01-04 1.362527- 1.059686-1.564129-1.2675062013-01-06-0.863264-0.548317 0.277112 1.2338252013-01-05 0.719452-0.152727 0.319914-0. 4485352013-01-03-0.419268 0.750735-0.547377-0.075151>>> Df2.sort (columns='B', ascending=False) A B C D2013-01-03-0.419268 0.750735-0.547377-0.0751512013-01-05 0.719452-0.152727 0.319914-0.4485352013-01-06-0.863264- 0.548317 0.277112 1.2338252013-01-04 1.362527-1.059686-1.564129-1.2675062013-01-01-0.941915-1.304691-0.837790-0. 8051012013-01-02-0.665522-2.935955 1.249425 0.902390

Quickly learn the pandas of Python data analysis packages

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Quickly learn the pandas of Python data analysis packages

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Quickly learn the pandas of Python data analysis packages

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support