Quickly learn the pandas of Python data analysis packages

Source: Internet
Author: User

 Some of the things that have recently looked at time series analysis are commonly used in the middle of a bag called pandas, so take time alone to learn.

See Pandas official documentation http://pandas.pydata.org/pandas-docs/stable/index.html

and related Blogs http://www.cnblogs.com/chaosimple/p/4153083.html

Pandas introduction

  Pandas is a Python data analysis package originally developed by AQR Capital Management in April 2008 and open source at the end of 2009, and is currently being developed and maintained by the Pydata development team focused on Python packet development, Part of the Pydata project. Pandas was originally developed as a financial data analysis tool, so pandas provides a good support for time series analysis.

  Pandas is a data analysis package built on Numpy that contains more advanced structures and tools, similar to the core of Numpy, where Ndarray,pandas is also built around the two core data structures of Series and DataFrame.

Series and DataFrame correspond to one-dimensional sequences and two-dimensional table structures, respectively.

The required packages are usually introduced in the following ways

1 Import Pandas as PD 2 Import NumPy as NP 3 import Matplotlib.pyplot as Plt

Series

series can be regarded as a fixed-length ordered dictionary, and the basic arbitrary one-dimensional data can be used to construct Series objects.   You can create a Series by passing a list object ,Pandas will create an integer index by default
>>> s = PD. Series ([1,2,3.0,'a','BC'])>>> S0               a4    bcdtype:object

The Series object contains two main attributes: Index and values, respectively, of the left and right columns in the previous example. Because a list is passed to the constructor, the value of index is an integer incremented from 0, and if a key-value pair structure of a dictionary is passed in, a index-value corresponding Series is generated, or an Index object is explicitly specified with the keyword argument at initialization time.

>>> s = PD. Series (data=[1,2,3.0, ' A ', ' bc '],index = [' A ', ' B ', ' C ', ' d ', ' e '])
>> > s.name = ' example '
>>> s.index.name= ' New_index '
Span style= "FONT-SIZE:18PX;" >>>> s
a 1
b 2
c 3
d a
e BC
name:example, Dtype: Object

>>> S.index
Index ([u ' a ', U ' b ', U ' C ', U ' d ', U ' e '], dtype= ' object ')
>>> s.values
Array ([1, 2, 3.0, ' A ', ' BC '], Dtype=object)

The elements of a Series object are constructed strictly according to the index given, which means that if the data parameter is a key-value pair, only the keys contained in index are used, and if the key for the response is missing in data, the key is added even if the NaN value is given.

Note that there is a correspondence between the index of the Series and the elements of values, but this is different from the dictionary mapping. Index and values are actually still separate ndarray arrays, so the performance of the Series object is completely OK.

The greatest benefit of this series of data structures using key-value pairs is that index is automatically aligned when arithmetic operations are performed between series.

Both Series and index have the Name property.

DataFrame

DataFrame is a tabular data structure that contains a set of ordered columns (similar to index), each of which can be a different value type (unlike Ndarray can have only one dtype).    You can basically think of DataFrame as a collection of Series that shares the same index.

DataFrame is constructed in a similar way to Series, except that it can accept multiple one-dimensional data sources at the same time, each of which becomes a separate column

You can create a DataFrame by passing a Dictionary object that can be converted to a similar sequence structure

Data_dic = ({'A': 1.,            'B': PD. Timestamp ('20130102'),            'C': PD. Series (1,index=list (range (4)), dtype='float32'),            'D': Np.array ([3] * 4,dtype='Int32'),            'E': PD. Categorical (["Test","Train","Test","Train"]),            'F':'Foo'}) DF=PD. DataFrame (data_dic)>>>DF A B C D E F01 2013-01-02 1 3Test Foo1 1 2013-01-02 1 3train Foo2 1 2013-01-02 1 3Test Foo3 1 2013-01-02 1 3 train foo

  Although Data_dic is a dictionary, the dictionary's key A B C D is not the role of index in DataFrame, but rather the "name" property of the Series. The index of the DataFrame generated here is 0 1 2 3.

>>>1, 2, 3], dtype='int64')>>> DF. B0   2013-01-021   2013-01-022   2013-01-023   2013-01-02name:b, Dtype:datetime64[ns] 

Can pass DF. b Such a form of access to the different Series represented by a B C D

You can also create a DataFrame by passing a numpy array, a time index, and a column label

Dates = Pd.date_range ('20130101', periods=6) DF2= PD. DataFrame (Np.random.randn (6,4), Index=dates, Columns=list ('ABCD'))>>>df2 A B C D2013-01-01-0.941915-1.304691-0.837790-0.8051012013-01-02-0.665522-2.935955 1.249425 0.9023902013-01-03-0.419268 0.750735-0.547377-0.0751512013-01-04 1.362527-1.059686-1.564129-1.2675062013-01-05 0.719452-0.152727 0.319914-0. 4485352013-01-06-0.863264-0.548317 0.277112 1.233825>>>Df.indexint64index ([0,1, 2, 3], dtype='Int64')>>>Df.valuesarray ([[1.0, Timestamp ('2013-01-02 00:00:00'), 1.0, 3,'Test','Foo'],       [1.0, Timestamp ('2013-01-02 00:00:00'), 1.0, 3,'Train','Foo'],       [1.0, Timestamp ('2013-01-02 00:00:00'), 1.0, 3,'Test','Foo'],       [1.0, Timestamp ('2013-01-02 00:00:00'), 1.0, 3,'Train','Foo']], dtype=object)

Basic operations

1. Head () and tail () and a few lines in the tail

>>> Df.head (2)   A          B  C  D      E    F0  1 2013-01-02  1  3    Test  foo1  1 2013-01-02  1  3  train  foo>>> df.tail (2)   A          B  C  D      E    F2  1 2013-01-02  1  3   Test  foo3  1 2013-01-02  1  3  train  foo

2. Describe () Quick statistics

>>> df.describe ()       A  C  Dcount  4  4  4mean   1  1  3std    0  0  0min    1  1  325%    1  1  350%    1  1  375%    1  1  3max    1  1  3

3. Transpose

>>>DF. T 01 2A1 1 1B2013-01-02 00:00:00 2013-01-02 00:00:00 2013-01-02 00:00:00C1 1 1D3 3 3E test train test F foo foo Foo3A1B2013-01-02 00:00:00C1D3 E train F foo 

  4. Sort by axes

>>> Df.sort_index (axis=1,ascending=True)   A          B  C  D      E    F0  1 2013-01-02  1  3   Test  foo1  1 2013-01-02  1  3  train  foo2  1 2013-01-02  1  3   Test  foo3  1 2013-01-02  1  3  train  foo>>> df.sort_index (axis=1,ascending=False)     F      E  D  C          B  A0  foo   test  3  1 2013-01-02  foo  train  3  1 2013-01-02  foo   Test  3  1 2013-01-02  foo  train  3  1 2013-01-02  1

5. Sort by value

>>> Df2.sort (columns='B', ascending=True) A B C D2013-01-02-0.665522-2.935955 1.249425 0.9023902013-01-01-0.941915-1.304691-0.837790-0.8051012013-01-04 1.362527- 1.059686-1.564129-1.2675062013-01-06-0.863264-0.548317 0.277112 1.2338252013-01-05 0.719452-0.152727 0.319914-0. 4485352013-01-03-0.419268 0.750735-0.547377-0.075151>>> Df2.sort (columns='B', ascending=False) A B C D2013-01-03-0.419268 0.750735-0.547377-0.0751512013-01-05 0.719452-0.152727 0.319914-0.4485352013-01-06-0.863264- 0.548317 0.277112 1.2338252013-01-04 1.362527-1.059686-1.564129-1.2675062013-01-01-0.941915-1.304691-0.837790-0. 8051012013-01-02-0.665522-2.935955 1.249425 0.902390

Quickly learn the pandas of Python data analysis packages

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.