International - English

Cart Console

Topic Center

Contact Sales

Home > Others

Pandas detailed A

Last Update:2018-07-24 Source: Internet

Author: User

Tags scalar

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Pandas Introduction

Pandas is a numpy based tool that is created to resolve data analysis tasks. Pandas incorporates a large number of libraries and standard data models that provide the tools needed to efficiently manipulate large datasets. Pandas provides a number of functions and methods that enable us to process data quickly and easily.

Series: A one-dimensional array similar to a one-dimensional array in NumPy. The two are similar to Python's basic data Structure list, and the difference is that the elements in the list can be different data types, while the array and series only allow the same data type to be stored, which makes it more efficient to use memory and improve efficiency. Time-series: A Series that is indexed by time. Dataframe: Two-dimensional tabular data structure. Many functions are similar to the Data.frame in R. Dataframe can be understood as a series container. The following content is mainly based on Dataframe. Panel: A three-dimensional array that can be understood as a dataframe container. Series

A series data structure is an object similar to a one-dimensional array, consisting of a set of data (various numpy data types) and a set of related labels (that is, indexes). Create series

In most cases, the series data structure is captured directly from the Dataframe data structure, but we can also create the series ourselves. The syntax is as follows:

s = PD. Series (data, Index=index)

Where data can be different content: Dictionary Ndarray scalar

Index is the list of axis labels, and the content passed in varies according to the circumstances. built by Ndarray

If data is Ndarray, the index must be the same length as the data. If you do not enter an index, a value of [0,...,len (data)-1] is created.

>>> ser = pd.   Series (NP.RANDOM.RANDN (5), index=[' A ', ' B ', ' C ', ' d ', ' e ']) >>> ser a-0.063364 b 0.907505 c-0.862125 D -0.696292 e 0.000751 dtype:float64 >>> ser.index index ([' A ', ' B ', ' C ', ' d ', ' e '], dtype= ' object ') &GT;&GT;&G T Ser.index[[true,false,true,true,true]] Index ([' A ', ' C ', ' d ', ' e '], dtype= ' object ') >>> PD. Series (NP.RANDOM.RANDN (5)) 0-0.854075 1-0.152620 2-0.719542 3-0.219185 4 1.206630 dtype:float64 &GT;&GT;&G T Np.random.seed >>> SER=PD. Series (Np.random.rand (7)) >>> ser 0 0.543405 1 0.278369 2 0.424518 3 0.844776 4 0.004719 5 0.12 1569 6 0.670749 dtype:float64 >>> import calendar as Cal >>> Monthnames=[cal.month_name[i] for i in Np.arange (1,6)] >>> monthnames [' January ', ' February ', ' March ', ' April ', ' may '] >>> months=pd.
Series (Np.arange (1,6), index=monthnames);   >>> months January 1 February 2 March 3 April    4 May 5 Dtype:int32

built by Dictionaries

If data is a dict, if the index is passed, the values in the index corresponding to the label will be listed. Otherwise, the index is constructed from the Dict sort key, if possible.

>>> d = {' A ': 0., ' B ': 1., ' C ': 2.}
>>> PD. Series (d)
a    0.0
B    1.0
C    2.0
dtype:float64
>>> PD. Series (d, index=[' B ', ' C ', ' d ', ' a '])
B    1.0
C    2.0
D    NaN
a    0.0
dtype: Float64
>>> stockprices = {' GOOG ': 1180.97, ' FB ': 62.57, ' TWTR ': 64.50, ' AMZN ': 358.69, ' AAPL ': 500.6}
>>> STOCKPRICESERIES=PD. Series (stockprices,index=[' GOOG ', ' FB ', ' YHOO ', ' TWTR ', ' AMZN ', ' AAPL '],name= ' stockprices ')
>>> Stockpriceseries
GOOG    1180.97
FB        62.57
YHOO        NaN
twtr
64.50 AMZN     358.69
AAPL     500.60
name:stockprices, Dtype:float64

Note: NaN (not a number) is a standard missing data marker for pandas.

>>> stockpriceseries.name
' stockprices '
>>> stockpriceseries.index
index ([' GOOG ', ' FB ', ' YHOO ', ' TWTR ', ' AMZN ', ' AAPL ', dtype= ' object ')
>>> dogseries=pd. Series (' Chihuahua ', index=[' breed ', ' countryoforigin ', ' name ', ' Gender '])
>>> dogseries breed              Chihuahua
Countryoforigin    chihuahua
name               Chihuahua
Gender             Chihuahua
: Object

created by scalar

If the data is a scalar value, you must provide an index. Repeat the value to match the length of the index.

>>> PD. Series (5., index=[' A ', ' B ', ' C ', ' d ', ' e '])
a    5.0
b    5.0
C    5.0
D    5.0
E    5.0
Dtype:float64

In addition to the above, class Ndarray objects are converted to Ndarray to create series

>>> ser = pd. Series ([5,4,2,-3,true])
>>> ser
0       5
1       4
2       2
3      -3
4    True
dtype:object
>>> ser.values
Array ([5, 4, 2, -3, True], dtype=object)
>>> Ser.index
Rangeindex (start=0, stop=5, step=1)
>>> ser2 = PD. Series ([5, 4, 2, -3, True], index=[' B ', ' e ', ' C ', ' a ', ' d '])
>>> ser2
b       5
e       4
c< C23/>2
a      -3
D    True
dtype:object
>>> ser2.index
index (' b ', ' e ', ' C ', ' A ', ' d '], dtype= ' object ')
>>> ser2.values
Array ([5, 4, 2, -3, True], Dtype=object)

Index Series is Ndarray-like

Series is very similar to Ndarray and is a valid parameter for most numpy functions. Include index operations such as slices .

>>> ser = pd. Series (NP.RANDOM.RANDN (5), index=[' A ', ' B ', ' C ', ' d ', ' e '])
>>> ser
a   -0.231872
b    0.207976
c    0.935808
d    0.179578
e   -0.577162
dtype:float64
>>> ser[0]
-0.2318721969038312
>>> Ser[:3]
a   -0.231872
b    0.207976
c    0.935808
dtype:float64
>>> ser[ser >0]
b    0.207976
c    0.935808
D    0.179578
dtype:float64
>>> ser[ser > Ser.median ()]
b    0.207976
c    0.935808
dtype:float64
>>> ser[ser > Ser.median ()]=1
>>> ser
a   - 0.231872
b    1.000000
c    1.000000
d    0.179578
e   -0.577162
dtype: Float64
>>> ser[[4, 3, 1]]
e   -0.577162
d    0.179578
b    1.000000
Dtype:float64
>>> np.exp (Ser)
a    0.793047
b    2.718282
c    2.718282
d    1.196713
e    0.561490
Dtype:float64

Series is Dict-like

Series also looks like a fixed-size dict that can get and set values through index tags:

>>> ser[' a ']
-0.2318721969038312
>>> ser[' e '] =.
>>> ser
a    -0.231872
b     1.000000
c     1.000000
d     0.179578
e    12.000000
dtype:float64
>>> ' e ' in Ser
True
>>> ' F ' in Ser
False

Note: If you reference a label that is not included, an exception is thrown:

With the Get method, an index that is not included returns none, or a specific value. Similar to the dict operation.

>>> Print (Ser.get (' F '))
None
>>> ser.get (' F ', Np.nan)
nan

vectorization Operations & label Alignment

In data analysis, it is not necessary to use the loop, but to use the vector operation.

>>> ser + ser
a    -0.463744
b     2.000000
c     2.000000
d     0.359157
E    24.000000
dtype:float64
>>> ser * 2
a    -0.463744
b     2.000000
c     2.000000
d     0.359157
e    24.000000
dtype:float64
>>> np.exp (Ser)
a         0.793047
b         2.718282
c         2.718282
d         1.196713
e    162754.791419
Dtype:float64

A major difference between series and Ndarray is that the operation between series automatically aligns the data based on the label.

>>> ser
a    -0.231872
b     1.000000
c     1.000000
d     0.179578
e    12.000000
dtype:float64
>>> ser[1:] + ser[:-1]
a         NaN
b    2.000000
c    2.000000
d    0.359157
e         NaN
Dtype:float64

The result of an series action will contain the set of indexes involved. If a label is not found in one of the seires, the result is marked as Nan.

Note: Usually the default result of operations between different index objects produces an indexed set of data to avoid loss of information.
Because of the loss of data, having an index tag can also be an important information for computing. Of course, you can also choose to remove the label of the missing data through the Dropna feature. Property

Name attribute:

>>> s = PD. Series (NP.RANDOM.RANDN (5), name= ' something ')
>>> s
0   -0.533373
1   -0.225402
2   -0.314919
3    0.422997
4   -0.438827
name:something, Dtype:float64
> >> s.name
' something '

In most cases, the series name is automatically assigned, for example, when you get the dataframe of a 1D slice. (subsequent dataframe operations will be explained)

>>> s2 = s.rename ("different")
>>> s2
0   -0.533373
1   -0.225402
2   - 0.314919
3    0.422997
4   -0.438827
name:different, Dtype:float64

It should be noted here that s and S2 are pointing to different objects.

To get an index from an indexed property

>>> s2.index
rangeindex (start=0, stop=5, Step=1)

Index object also has a Name property

>>> s.index.name = "Index_name"
>>> s
index_name
0   -0.533373
1   - 0.225402
2   -0.314919
3    0.422997
4   -0.438827
name:something, Dtype:float64

Getting values from a value index

>>> s.values
Array ([ -0.53337271, -0.22540212, -0.31491934,  0.42299678,-0.43882681])

Dataframe

Dataframe are two-dimensional data structures that can contain different types of columns and are indexed, similar to SQL tables, or series dictionary collections. Create Dataframe

Dataframe is the most-used pandas object, similar to series, and accepts many different class parameters when creating Dataframe. From dict of Series or dicts

>>>d = {' One ': PD. Series ([1., 2., 3.], index=[' A ', ' B ', ' C ']),
     ' two ': PD. Series ([1., 2., 3., 4.], index=[' A ', ' B ', ' C ', ' d ']}
>>>DF = PD. Dataframe (d)
>>>DF

One

	two
A	1.0
B	2.0
C	3.0
D	NaN

>>>PD. Dataframe (d, index=[' d ', ' B ', ' a '])

One

	two
D	NaN
B	2.0
A	1.0

Pd. Dataframe (d, index=[' d ', ' B ', ' A '], columns=[' two ', ' three '])

two	three
D	4.0
B	2.0
A	1.0

You can access the row and column labels individually by accessing the index and Column properties.

>>> df.index
Index ([' A ', ' B ', ' C ', ' d '], dtype= ' object ')
>>> df.columns
index ([' One ', ' Two '], dtype= ' object ')

From dict of Ndarrays/lists

Ndarrays must all be of the same length. If an index is passed, its length must be as long as the array. If no index is passed, the result is range (n), where n is the length of the array.

>>> d = {' One ': [1, 2, 3,, 4.],
...      ' Two ': [4., 3., 2., 1.]}
>>> PD. Dataframe (d)

One

	two
0	1.0
1	2.0
2	3.0
3	4.0

>>> PD. Dataframe (d, index=[' A ', ' B ', ' C ', ' d '])

One

	two
A	1.0
B	2.0
C	3.0
D	4.0

From structured or record array

This situation is the same as creating a collection of dictionaries from an array.

type abbreviated character parameter:

' B ' boolean ' I ' (signed) integer ' u ' unsigned integer ' f ' floating-point ' C ' complex-floating point ' m ' Timedelta ' M ' datetime ' O ' (Python) objects ' S ', ' a ' (byte-) string ' U ' Unicode ' V ' raw data (void # example: >>> dt = Np.dtype (' F8 ') # 64-bit floating-point, note 8 for bytes >>> dt = np.dtype (' C16 ') # 128-bit complex >>> dt = NP . Dtype ("A3, 3u8, (3,4) A10")//3 byte string, 3 64-bit integer child array, 3*4 10-byte string array, note 8 bytes >>> dt = Np.dtype ((void)) #10位 >&gt ;> dt = Np.dtype ((str, 35)) # 35 character string >>> dt = Np.dtype ((' U ', 10)) # 10 character Unicode string >>> dt = Np.dtype ((Np.int32, (2,2)) # 2*2int sub array >>> dt = Np.dtype ((' S10 ', 1)) # 10 character string >> > dt = Np.dtype ((' I4, (2,3) F8, F4 ', (2,3))) # 2x3 struct Sub Array # using Astype, you cannot directly change the Dtype value of the object >>> B = Np.array ([1., 2., 3.
, 4.]) >>> b.dtype dtype (' float64 ') >>> c = b.astype (int) >>> c Array ([1, 2, 3, 4]) >>> C.sha PE (8,) >>&Gt C.dtype Dtype (' int32 ')

>>> data = Np.zeros (2,), dtype=[(' A ', ' I4 '), (' B ', ' F4 '), (' C ', ' A10 ')])
# i4: Define a Big-endian int 4*8= 32-bit data type (
[(0, 0, b '), (0, 0, b ')],
      dtype=[(' A ', ' <i4 '), (' B ', ' <f4 '), (' C ') , ' S10 ')]
>>> data.shape
(2,)
>>> data[:] = [(1,2., ' Hello '), (2,3., "World")]
> >> data
Array ([(1, 2, B ' Hello '), (2, 3, B ' world ')],
      dtype=[(' A ', ' <i4 '), (' B ', ' <f4 '), (' C ', ' S10 ') )
>>> PD. Dataframe (data, index=[' a ', ' second '])

A	B	C
The	1	2.0
Second	2	3.0

>>> PD. Dataframe (data, columns=[' C ', ' A ', ' B '])

C	A	B
0	B ' Hello '	1
1	B ' World '	2

Note: Dataframe and 2-dimensional numpy Ndarray are not exactly the same.

In addition to the above construction methods there are many other construction methods, but the main way to get Dataframe is to read the table structure of the file, the other construction methods are not listed.

>>> d = {' One ': PD. Series ([1., 2

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

detailed world map vector cloudwatch detailed monitoring detailed network statistics detailed network diagram visio a a safety pandas cookbook pandas diff

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Pandas detailed A

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support