"Data analysis using Python" reading notes--fifth Chapter pandas Introduction

Source: Internet
Author: User
Tags arithmetic scalar
http://www.cnblogs.com/batteryhp/p/5006274.html

 

pandas is the library of choice for the rest of this book. pandas can meet the following needs:

A data structure with automatic or explicit data alignment by axis. This prevents many common errors caused by data misalignment and data from different data sources that are indexed differently. .
Integrated time series capabilities
Data structures that can handle both time-series data and non-time-series data
Mathematical operations and parsimony (such as summing an axis) can be performed based on different metadata (axis number)
Flexible handling of missing data
Merge and other relational operations that occur in common databases (such as SQL-based)
1.Introduction to pandas data structure

Two data structures: Series and DataFrame. Series is an object similar to the NumPy array. It consists of a set of data (various NumPy data types) and a set of data labels (that is, indexes) associated with it. You can use index and values to specify the index and value, respectively. If no index is specified, 0 to N-1 indexes are created automatically.

#-*-encoding: utf-8-*-
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

#Series can set index, a bit like a dictionary, use index index
obj = Series ([1,2,3], index = [‘a’, ‘b’, ‘c’])
#print obj [‘a’]
#That is, you can create a Series directly with a dictionary

dic = dict (key = [‘a’, ’b’, ‘c’], value = [1,2,3])
dic = Series (dic)
#Note that you can use a string to update the key value
key1 = [‘a’, ‘b’, ‘c’, ‘d’]
#Note that the following statement can extract the values in the Series object, but you must know that the dictionary cannot do this
dic1 = Series (obj, index = key1)
#print dic
#print dic1
#isnull and notnull are used to detect missing data
#print pd.isnull (dic1)
#Series important feature is automatic alignment by key value
dic2 = Series ([10,20,30,40], index = [‘a’, ‘b’, ‘c’, ‘e’])
#print dic1 + dic2
#name attribute, can be named
dic1.name = ‘s1’
dic1.index.name = ‘key1’
#Series index can be modified in-place
dic1.index = [‘x’, ‘y’, ‘z’, ‘w’]
A DataFrame is a tabular structure that contains an ordered set of columns, each of which can be a different data type. Both row and column indexes can be thought of as a dictionary of Series (using a common index). Like other similar data structures (such as data.frame in R), DataFrame's row- and column-oriented operations are basically balanced. In fact, the data in the DataFrame is stored in one or more two-dimensional blocks (not lists, dictionaries, or other).

#-*-encoding: utf-8-*-
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

#Building a DataFrame can directly pass in a list of equal length or a dictionary of Series
#Not waiting for an error
data = {‘a’: [1,2,3],
        ‘C’: [4,5,6],
        ‘B’: [7,8,9]
}
#Note is sorting by column name
frame = DataFrame (data)
#print frame
#After you specify the column, it will be sorted according to the specified
frame = DataFrame (data, columns = [‘a’, ’c’, ‘b’])
print frame
#Can have empty columns, index is the row name
frame1 = DataFrame (data, columns = [‘a’, ’b’, ‘c’, ‘d’], index = [‘one’, ‘two’, ‘three’])
print frame1
#Retrieve data in a dictionary
print frame [‘a’]
print frame.b
#Column data modification can be directly selected and re-assigned
#Line, you can select by line name or line number
print frame1.ix [‘two‘]
#Assign a value to the column. If it is a Series, it can be assigned exactly after the index is specified.
frame1 [‘d’] = Series ([100,200,300], index = [‘two’, ‘one’, ‘three’])
print frame1
#Delete column with del function
del frame1 [‘d’]
#Warning: The view selected by the column name is a Series view, not a copy. Use the Series copy method to get a copy
Another common structure is a nested dictionary, a dictionary of dictionaries. Such a structure will default to foreign keys as columns and internal columns as rows.

#-*-encoding: utf-8-*-
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
#The keys of the inner dictionary will be combined and sorted to form the final index
pop = {‘Nevada’: {2001: 2.4,2002: 2.9},
       ‘Ohio’: {2000: 1.5,2001: 1.7,2002: 3.6}}
frame3 = DataFrame (pop)
#rint frame3
#Dataframe also has row and column names, and DataFrame has value
frame3.index.name = ‘year’
frame3.columns.name = ‘state’
print frame3
print frame3.values
The following is a list of data that the DataFrame constructor can accept.

Index object

#-*-encoding: utf-8-*-
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
The #pandas index object is responsible for managing axis labels and other metadata. When constructing Series and DataFrame, any array or other sequence labels used are converted to Index
obj = Series (range (3), index = [‘a’, ‘b’, ‘c’])
index = obj.index
#print index
#The index object cannot be modified, which is very important, because this will make the Index object safely shared between multiple data structures
index1 = pd.Index (np.arange (3))
obj2 = Series ([1.5, -2.5,0], index = index1)
print obj2.index is index1

#In addition to looking like an array, Index also functions like a fixed-size collection
print ‘Ohio’ in frame3.columns
print 2003 in frame3.index
Index in pandas is a class, the main Index object in pandas (when it is used).

The following are the methods and properties of Index. It is worth noting that index is not an array.

2. Basic functions

The basic series and DataFrame data processing methods are introduced below. The first is the index:

#-*-encoding: utf-8-*-
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas import Series, DataFrame

#Series has a reindex function that can rearrange the index so that the order of the elements changes

obj = Series ([1,2,3,4], index = [‘a’, ’b’, ‘c’, ‘d’])
#Note that the reindex here does not change the value of obj, it is a "copy"
#fill_value is obviously the value of the empty index
#print obj.reindex ([‘a’, ’c’, ‘d’, ‘b’, ‘e’], fill_value = 0)
#print obj
obj2 = Series ([‘red‘, ‘blue‘], index = [0,4])
#method = ffill, meaning forward value fill
obj3 = obj2.reindex (range (6), method = ‘ffill’)
#print obj3

#DataFrame reindex can modify rows, columns, or both
frame = DataFrame (np.arange (9) .reshape ((3,3)), index = ['a', 'c', 'd'], columns = ['Ohio', 'Texas', 'California' ])
#Just pass in a column number to reindex the rows, because the row parameter of ... frame is called index ... (I guess so)
frame2 = frame.reindex ([‘a’, ’b’, ‘c’, ‘d’])
#print frame2
#When the original index is not passed in, the returned NaN is of course empty.
# frame3 = frame.reindex ([‘e‘])
#print frame3
states = [‘Texas’, ‘Utah’, ‘California’]
#This is the rearrangement of rows and columns
#Note: The method here is used to fill the index, that is, the rows, and the columns cannot be filled (regardless of the method position)
frame4 = frame.reindex (index = [‘a’, ‘b’, ‘c’, ‘d’], columns = states, method = ‘ffill’)
#print frame4

#Use ix's tag indexing function, re-indexing becomes more concise
print frame.ix [[‘a’, ’d’, ‘c’, ‘b’], states]
About ix, it is a method of DataFrame, http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.ix.html.

Discard items on specified axis

#-*-encoding: utf-8-*-
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas import Series, DataFrame
#drop function can drop column and row values on the axis
obj = Series (np.arange (3.), index = [‘a’, ‘b’, ‘c’])
# 原 Series is not discarded
obj.drop (‘b’)
#print obj
#Note below, rows can be dropped at will, columns need to add axis = 1
print frame.drop ([‘a’])
print frame.drop ([‘Ohio’], axis = 1)
Let's talk about indexing, selecting and filtering

#-*-encoding: utf-8-*-
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas import Series, DataFrame

obj = Series ([1,2,3,4], index = [‘a’, ’b’, ‘c’, ‘d’])
frame = DataFrame (np.arange (9) .reshape ((3,3)), index = ['a', 'c', 'd'], columns = ['Ohio', 'Texas', 'California' ])

#SeriesSlicing and Indexing
#print obj [obj <2]
#Note: Slices using labels are different from slices in python, both ends are included (makes sense)
print obj [‘b‘: ‘c’]
#For DataFrame, column You can use the name directly
print frame [‘Ohio’]
#Special case: by slicing and bool index, get rows (makes sense)
print frame [: 2]
print frame [frame [‘Ohio’]! = 0]
#The following method is applicable to all elements of the frame, not rows or columns, the following is the data of numpy.ndarray type
print frame [frame <5], type (frame [frame <5])
frame [frame <5] = 0
print frame

#For label index on DataFrame, use ix
print frame.ix [[‘a’, ’d’], [‘Ohio’, ’Texas’]]
print frame.ix [2] #Note that the default line here
#Note the following default line
print frame.ix [frame.Ohio> 0]
#Note that the comma is followed by the column mark
print frame.ix [frame.Ohio> 0,: 2]
The following are common indexing options:

Arithmetic operations and data alignment

An important feature of #pandas is the ability to automatically align based on the index, where the non-overlapping parts of the index are NaN
s1 = Series ([1,2,3], [‘a’, ‘b’, ‘c’])
s2 = Series ([4,5,6], [‘b’, ‘c’, ‘d‘])
#print s1 + s2
df1 = DataFrame (np.arange (12.). reshape (3,4), columns = list (‘abcd‘))
df2 = DataFrame (np.arange (20.). reshape (4,5), columns = list (‘abcde’))
#print df1 + df2
#Use the add method, and pass in the fill value. Note that the fill_value function below corresponds to the fill first and then adds, instead of adding and adding NaN before filling
#print df1.add (df2, fill_value = 1000)
# df1.reindex (columns = df2.columns, fill_value = 0)
In addition to add, there are other methods:

Operations between DataFrame and Series

#Look at the calculation process between DataFrame and Series
arr = DataFrame (np.arange (12.). reshape ((3,4)), columns = list (‘abcd‘))
#The result below indicates that it can be subtracted by line, which is called broadcasting
#Note: By default, the calculation of DataFrame and Series will match the index of Series to the columns of DataFrame, then calculate, and then broadcast all the way down
#Note: In the following formula, it is wrong to write arr-arr [0], because only the label index function ix followed by a number indicates a row.
print arr-arr.ix [0]
Series2 = Series (range (3), index = list (‘cdf‘))
#According to the rules, NaN values will be formed in unmatched columns
print arr + Series2
#If you want to match the row and broadcast on the column, you need to use the arithmetic operation method
Series3 = arr [‘d‘]
#axis is the axis you want to match
print arr.sub (Series3, axis = 0)
Here is the function application and mapping

#-*-encoding: utf-8-*-
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas import Series, DataFrame

# NumPy's element-level array method also works for pandas objects
frame = DataFrame (np.random.randn (4,3), columns = list (‘abc’), index = [‘Ut’, ‘Oh’, ‘Te’, ‘Or’])
print frame
#The following is the absolute value:
#print np.abs (frame)
#Another common approach is to apply a function to a row or column, using the apply method, similar to R
fun = lambda x: x.max ()-x.min ()
#Default is applied to each column
print frame.apply (fun)
#The following is applied to the column
print frame.apply (fun, axis = 1)
#Many statistical functions do not need to apply at all, just call the method directly
print frame.sum ()
#In addition to the scalar value, the apply function can be followed by a function that returns a Series consisting of multiple values. Is it beautiful?
def f (x):
    return Series ([x.min (), x.max ()], index = [‘min‘, ‘max’])
#print frame.apply (f)
# Element-level python functions are also available, but use the applymap function
format = lambda x: ‘% .2f’% x
print frame.applymap (format)
#Why use applymap because Series has a map method applied to element-level functions? ?
#The map here is useful
print frame [‘b’]. map (format)
Sorting and ranking

#-*-encoding: utf-8-*-
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas import Series, DataFrame
#Use the sort_index function to sort the index of rows and columns
obj = Series (range (4), index = [‘d‘, ‘a’, ‘b’, ‘c’])
print obj.sort_index ()

frame = DataFrame (np.arange (8) .reshape ((2,4)), index = ['three', 'one'], columns = ['d', 'a', 'b', 'c' ])
#The default is to sort the row "index", if the column "index" is sorted, axis = 1 can be
print frame.sort_index ()
print frame.sort_index (axis = 1)
print frame.sort_index (axis = 1, ascending = False)

#If you want to sort the values, use the order function. Note that all missing values will be put to the end (if any).
print obj.order ()
#numpy sort can also be used to sort
print np.sort (obj)
#If you are sorting relative to the value of the DataFrame, the function is still sort_index, but you need to add a parameter by
frame = DataFrame ({‘b’: [4,7, -3,2], ‘a’: [0,1,0,1]})
print frame.sort_index (by = [‘a’, ‘b’])

The #rank function returns the subscripts sorted from small to large. For flat numbers, rank destroys the rating relationship by "assigning an average rank to each group".
#Subscript starts from 1
obj = Series ([7, -5,7,4,2,0,4])
print obj.rank ()
#The argsort function in numpy is weird. What is returned is the subscript corresponding to the order in which the data is sorted. The subscript starts from 0.
print np.argsort (obj)
 #The print result is: 1,5,4,3,6,0,2 According to the order of this subscript, we can get the value from small to small, see below
print obj [np.argsort (obj)]
There is a method option in the #rank function, which specifies the way of subscripting

print obj.rank (method = ‘first’, ascending = False)
print obj.rank (method = ‘max’, ascending = False)
print obj.rank (method = ‘min‘, ascending = False)

#For DataFrame, the rank function arranges each column by default and returns the coordinates
print frame.rank ()
print frame.rank (axis = 1)
Axis index with duplicate values

#-*-encoding: utf-8-*-
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas import Series, DataFrame

#Although many functions of pandas (such as reindex) require unique labels, they are not mandatory
obj = Series (range (5), index = list (‘aabbc‘))
print obj
#Index is unique Use is_unique to see if it is unique
print obj.index.is_unique
#For repeated value indexes, if selected, returns a Series, the unique index returns a scalar
print obj [‘a’]
#Same for DataFrame
df = DataFrame (np.random.randn (4,3), index = list (‘aabb‘))
print df
print df.ix [‘b’]
##### When importing the data yourself, you can do index uniqueness before data processing, etc. Create your own DataFrame. Note that this cannot be the case.
3.Summarize and calculate descriptive statistics

#-*-encoding: utf-8-*-
import numpy as np
import os
import pandas as pd
from pandas import Series, DataFrame
import matplotlib.pyplot as plt
import time

The #pandas object has a set of commonly used mathematical and statistical methods, most of which belong to reduced statistics. It is used to extract a value from a Series, or a column or a row from a DataFrame.
#Note: Compared with NumPy arrays, these functions are built based on the absence of missing data, that is, these functions will automatically ignore missing values.
df = DataFrame ([[1.4, np.nan], [7.1, -4.5], [np.nan, np.nan], [0.75, -1.3]], index = list ('abcd'), columns = [ 'one', 'two'])
print df.sum ()
print df.sum (axis = 1)
#The following are some functions, idxmin and idmax return the minimum or maximum index
print df.idxmin ()
print df.idxmin (axis = 1)
#About cumulative functions
print df.cumsum ()
#describe function, basically the same as the describe function in R
print df.describe ()
#For non-numeric data, look at the results below

obj = Series ([‘c‘, ‘a’, ‘a’, ‘b’, ‘d’] * 4)
print obj.describe ()
‘‘ ‘
The result is:
count 20
unique 4
top a
freq 8
Among them, freq refers to the highest frequency of letters
‘‘ ‘
#-*-encoding: utf-8-*-
import numpy as np
import os
import pandas as pd
from pandas import Series, DataFrame
import matplotlib.pyplot as plt
import time

#Look at the cummin function below
#Note: The cummin function here is the minimum value so far, not the minimum value after the addition
frame = DataFrame ([[1,2,3,4], [5,6,7,8], [-10,11,12, -13]], index = list (‘abc’), columns = [‘one’, ‘two’, ‘three’, ‘four’])
print frame.cummin ()
print frame
>>>
   one two three four
a 1 2 3 4
b 1 2 3 4
c -10 2 3 -13
   one two three four
a 1 2 3 4
b 5 6 7 8
c -10 11 12 -13
Correlation coefficient and covariance

Some summary statistics (such as correlation coefficient and covariance) are calculated from parameter pairs. Can't get the data in this section? Can't go online.

Unique values, value counts, and membership

#-*-encoding: utf-8-*-
import numpy as np
import os
import pandas as pd
from pandas import Series, DataFrame
import matplotlib.pyplot as plt

obj = Series ([‘a‘, ‘a’, ‘b’, ‘f’, ‘e’])
uniques = obj.unique ()
uniques.sort () #Remember this is sorting in place
#print uniques
#Counting statistics below, note that they are arranged in descending order of frequency of occurrence
#print obj.value_counts ()
#value_counts is also a top-level pandas method. Can be used for any array or sequence
#print obj.values
#print pd.value_counts (obj.values, sort = False)
#Finally, isin determines the membership of the vectorized set, which can be used to select a subset in the Series or DataF column
mask = obj.isin ([‘b‘, ‘c’])
print mask
print obj [mask]

data = DataFrame ({‘Qu1‘: [1,3,4,3,4],
                  ‘Qu2’: [2,3,1,2,3],
                  ‘Qu3’: [1,5,2,4,4]})
print data
print data.apply (pd.value_counts) .fillna (0)
The above functions are really very useful!

4.Handling missing data

#-*-encoding: utf-8-*-
import numpy as np
import os
import pandas as pd
from pandas import Series, DataFrame
import matplotlib.pyplot as plt
import time
from numpy import nan as NA

#pandas was designed to automatically ignore missing values,
#nan None are treated as missing values
str_data = Series ([‘a’, np.nan, ‘b’, ‘c’])
str_data [0] = None
print str_data.isnull ()
print str_data.notnull ()
>>>
0 True
1 True
2 False
3 False
0 False
1 False
2 True
3 True
#NumPy is missing a true NA data type or bit pattern in the data type? ?
 
Filter out missing data

#-*-encoding: utf-8-*-
import numpy as np
import os
import pandas as pd
from pandas import Series, DataFrame
import matplotlib.pyplot as plt
import time
from numpy import nan as NA

data = Series ([1, NA, 3.5,7, NA])
#Note that the original index of the value that is not NA is returned, not the index after removal
#There is a function reset_index This function (method?) Can reset the index, where the drop = True option will discard the original index and set a new 0-based index. This method is only useful for DataFrame.
print data.dropna ()
#The following results are the same
print data [data.notnull ()]
data1 = DataFrame ([[1,2,3], [NA, 2.3,4], [NA, NA, NA]])
#Note: Due to the setting of DataFrame, as long as there are NA rows will be discarded
print data1.dropna ()
#Incoming how = ‘all’ then all NA lines are dropped. The name of how is really a bit arbitrary, haha
print data1.dropna (how = ‘all‘)
#Discard column
print data1.dropna (how = ‘all‘, axis = 1)
#One more parameter, thresh
data2 = DataFrame (np.random.randn (7,3))
data2.ix [: 4,1] = NA
data2.ix [: 2,2] = NA
#print data2
#The thresh function here is selected from the rows with the least number of non-NA values
print data2.dropna (thresh = 2)
print data2.dropna (thresh = 4, axis = 1)
Fill missing data

#-*-encoding: utf-8-*-
import numpy as np
import os
import pandas as pd
from pandas import Series, DataFrame
import matplotlib.pyplot as plt
import time
from numpy import nan as NA

#Mainly fill the value at NA with the fillna method
data2 = DataFrame (np.random.randn (7,3))
data2.ix [: 4,1] = NA
data2.ix [: 2,2] = NA
#fillnaReturn a new object, inplace = True to fill in place
print data2.fillna (0)
#print data2.fillna (0, inplace = True)
#print data2
#Dictionary is used for different columns
print data2.fillna ({1: 0.5,3: -1})
#The difference methods that are valid for reindex can also be applied to fillna, please look up or search for reindex
df = DataFrame (np.random.randn (6,3))
df.ix [2:, 1] = NA
df.ix [4:, 2] = NA
print df.fillna (method = ‘ffill’, limit = 2)
#As long as you use your brain a bit, we can know that other numbers such as the mean can be filled into the NA.
data = Series ([1.2, NA, 4, NA])
print data.fillna (data.mean ())
The parameters of fillna are as follows:

5. Hierarchical index

Hierarchical index is an important function of pandas, which can have more than two index levels on one axis. In abstract terms, it enables you to deal with high dimensions in a low dimension.

#-*-encoding: utf-8-*-
import numpy as np
import os
import pandas as pd
from pandas import Series, DataFrame
import matplotlib.pyplot as plt
import time

data = Series (np.random.randn (10), index = [['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', ' d ',' d '], [1,2,3,1,2,3,1,2,2,3]])
#print data
#The following is how to select the index

print data.index
print data [‘b’]
print data [‘b‘: ‘c’]
print data.ix [[‘b’, ’d’]]
#Here is how to select the "inner layer"
print data [:, 2]
#Hierarchical index plays an important role in data reshaping and group-based operations (such as pivot table generation), such as rearranging DataFrame in unstack mode:
print data.unstack ()
#stack is the inverse of unstack
print data.unstack (). stack ()

#For DataFrame, each axis can have a hierarchical index
frame = DataFrame (np.arange (12) .reshape ((4,3)), index = [['a', 'a', 'b', 'b'], [1,2,1,2] ], columns = [['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']])
#print frame
#Note the following way: specify the name for each axis, followed by
frame.index.names = [‘key1’, ‘key2’]
frame.columns.names = [‘state’, ‘color’]
#print frame
#print frame [‘Ohio’]

#Can create MultiIndex separately and reuse
#The following multiindex can be created like this, pay attention to the following generation method
columns = pd.MultiIndex.from_arrays ([[‘Ohio’, ’Ohio’, ’Colorado’], [‘Green’, ‘Red’, ‘Green’]], names = [‘state‘, ‘color’])
frame1 = DataFrame (np.arange (12) .reshape ((4,3)), columns = columns)
print frame1
#Reorder, adjust index level
print frame.swaplevel (‘key1’, ‘key2’)
#sortlevel sorts the data according to the values in each level. Usually, swaplevel is also used (sorted)
#Note that you get a copy, not an in-place modification
print frame.sortlevel (1)
print frame.swaplevel (0,1) .sortlevel (0)
print frame

#Many statistics that describe and summarize DataFrame and Series have a level option for specifying the aggregation method
print frame.sum (level = ‘key2’)
#If no level is specified, the sum of all column names will be summarized according to the column
print frame.sum ()
print frame.sum (level = ‘color’, axis = 1)
#-*-encoding: utf-8-*-
import numpy as np
import os
import pandas as pd
from pandas import Series, DataFrame
import matplotlib.pyplot as plt
import time
#People often want to use one or more columns of the DataFrame as row indexes, or may need to turn the row indexes into columns of the DataFrame
frame = DataFrame (('a': range (7), 'b': range (7,0, -1), 'c': ['one', 'one', 'one', 'two', ' two ',' two ',' two '],' d ': [0,1,2,0,1,2,3]})
print frame
The set_index function in #DataFrame will make it Convert one or more columns to a row index
frame2 = frame.set_index ([‘c‘, ‘d’])
print frame2 #In fact, use the 3rd and 4th columns for a classification and summary
frame3 = frame.set_index ([‘c‘, ‘d’], drop = False)
#The opposite of set_index is the reset_index function
print frame2.reset_index ()
#Take a test below
frame4 = DataFrame ([[0,7], [1,6], [2,5], [3,4], [4,3], [5,2], [6,1]], index = [['one', 'one', 'one', 'two', 'two', 'two', 'two'], [0,1,2,0,1,2,3]], columns = ['a', 'b'])
frame4.index.names = [‘c’, ‘d’]
print frame4
print frame4.reset_index (). sort_index (axis = 1)
Other topics about pandas

#-*-encoding: utf-8-*-
import numpy as np
import os
import pandas as pd
from pandas import Series, DataFrame
import matplotlib.pyplot as plt
import pandas.io.data as web
#Here are some painful issues: integer indexes and integer labels
ser = Series (np.arange (3.))
#print ser [-1] #Report an error because of the ambiguity of the integer index
ser2 = Series (np.arange (3.), index = [‘a’, ‘b’, ‘c’])
print ser2 [-1] #correct
#ixfunctions are always label-oriented
print ser.ix [: 1]
#If you need a reliable, position-based index that does not consider the index type, you can use the iget_value method of Series, the irow and icol methods of Dataframe
ser3 = Series (range (3), index = [-5,1,3])
print ser3.iget_value (2)
frame = DataFrame (np.arange (6) .reshape (3,2), index = [2,0,1])
print frame.irow (0)

#pandas has a Panel data structure (not the main content) that can be viewed as a three-dimensional DataFrame. Multidimensional data in pandas can be processed with multi-level indexes
#Can use a dictionary composed of DataFrame objects or a three-dimensional ndarray to create Panel objects
pdata = pd.Panel (dict ((stk, web.get_data_yahoo (stk, '1/1/2009', '6/1/2012')) for stk in ['AAPL', 'GOOG', 'MSFT', 'DELL']))
#Network error, no data
Each item of #Panel is a DataFrame.
`` Using Python for Data Analysis '' Reading Notes-Chapter 5 Getting Started with pandas



Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.