Installation Configuration Pycharm
Official Download Address: http://www.jetbrains.com/pycharm/
CSDN Download Address: http://download.csdn.net/download/coofly/6637569
Adjustment of color scheme:
File–>settings–>editor–>colors & Fonts Selection Monokai
Show line Numbers
File–>settings–>editor–> general–>appearance–>show line number installation Pandas plugin
Install Pandas first before installing
Setuptools
Pip
Pandas
NumPy
Matplotlib
File–>settings–>pythonproj–>project interpreter Series
Reference: http://pda.readthedocs.org/en/latest/chp5.html
Pandas two important data structures: Series and Dataframe
Series is a one-dimensional, similar array object that contains an array of data (any NumPy data type) and a data label associated with the array, called an index.
eg
Ser = Series ([5,4,3,2,-1])
print (Ser);
Output results:
0 5
1 4
2 3
3 2
4-1
The index is on the left and the value is on the right. Because we did not specify the index for the data, a default index containing integers 0 to N-1 (where N is the length of the data) was created.
Create a series with an index to determine if there is no data point
Ser1 = Series ([5,4,3,2,-1],index=[' A ', ' B ', ' C ', ' d ', ' e '])
print (ser1)
output result:
a 5
b 4
C 3
D 2
e -1
Retrieving data by index
Print (ser1[' C '])
output result:
3
If you have some data in a Python dictionary, you can create a series from that data by passing the dictionary
Create a series from a dictionary
Sdata = {}
sdata[' a '] = 5
sdata[' c '] = ten
sdata[' B '] = 4
sdata[' d '] =-2
ser2 = Series (sdata)
print (ser2)
output:
a 5
b 4
c
D -2
The index in the result series will be sorted by dictionary
Delete data
#drop
Ser3 = Ser2.drop ([' a '])
print (ser3)
b 4
c
D -2
Datafame
Datafame represents a table, a spreadsheet-like data structure that contains a sorted set of lists that have many ways to build a dataframe, but the most common one is a dictionary or numpy array with an equal length list:
data = {' state ': [' Ohio ', ' Ohio ', ' Ohio ', ' Nevada ', ' Nevada '],
' year ': [Watts, 2001, 2002, 2001, 2002],
' pop ': [1.5 , 1.7, 3.6, 2.4, 2.9]}
frame = dataframe (data)
print
output result: Pop state year
0 1.5 Ohio
1 1.7 Ohio 2001
2 3.6 Ohio 2002
3 2.4 Nevada 2001
4 2.9 Nevada 2002
Set Order:
frame = dataframe (data, columns=[' year ', ' state ', ' Pop '])
output result: Year State pop
0 2000 Ohio 1.5
1 2001 Ohio 1.7
2 2002 Ohio 3.6
3 2001 Nevada 2.4
4 2002 Nevada 2.9
Add Index:
Frame1 = dataframe (data, columns=[' year ', ' state ', ' Pop '],index=[' a ', ' B ', ' C ', ' d ', ' e '))
output result: Year State pop
a Ohio 1.5
b 2001 Ohio 1.7
C 2002 Ohio 3.6
d 2001 Nevada 2.4
e 2002 Nevada 2.9
Retrieving data:
Ser3 = frame1[' year ']
print (ser3[' a '])
dataframe get the data through columns, get series.
Output Result:
2000
Conditional retrieval
frame1 = frame[frame[' Pop '] > 2]
print (frame1)
output results: Year State pop
2 2002 Ohio 3.6
3 2001 Nevada 2.4
4 2002 Nevada 2.9
Retrieving data by Index object
Print (frame1.ix[' e '])
output results:
year 2002
State Nevada
pop 2.9
Add Data:
newval = Series ([ -1.2, -1.5, -1.7], index=[' A ', ' C ', ' e '])
frame1[' debt '] = newval
print (frame1)
output Result:
Year State pop debt
a Ohio 1.5 -1.2
b 2001 Ohio 1.7 nan
C 2002 Ohio 3.6 -1.5
D 2001 Nevada 2.4 nan
E 2002 Nevada 2.9 -1.7
If there is no value, it is represented as an NA value in the result
Delete data
FRAM2 = Frame1.drop ([' Year ', ' state '],axis=1);
Print (FRAM2)
output results:
pop debt
a 1.5 -1.2
b 1.7 NaN
C 3.6 -1.5
d 2.4 NaN
e 2.9 -1.7
Drop () returns a new object, and the meta object is not changed.
fram2 = Frame1.drop ([' A ', ' B ']);
Print (FRAM2)
output results: Year State pop debt
C 2002 Ohio 3.6 -1.5
d 2001 Nevada 2.4 NaN
e 2002 Nevada 2.9 -1.7
Pandas important functionTake the first two row values after three row value transpose to the axis sort the value sort merge append append operation GroupBy Group, sum
Reference: http://www.cnblogs.com/chaosimple/p/4153083.html
Take the first two rows of value print (Frame1.head (2)) output results: Year state pop debt a Ohio 1.5-1.2 B 2001 Ohio 1.7 NaN take three row value print (Frame1.tail (3)) output result: Year state pop debt C 2002 Ohio 3.6-1.5 D 2001 Nevada 2.4 NaN E 2002 Nevada 2. 9-1.7 transpose Print (frame1. T) Output Result: A b C D e year 2001 2002 2001 2002 State Ohio Ohio Ohio Nevada Neva Da Pop 1.5 1.7 3.6 2.4 2.9 debt-1.2 NaN-1.5 NaN-1.7 axis sort print (Frame1.sort_index cending=true)) output result: Debt pop state year a-1.2 1.5 Ohio-B NaN 1.7 Ohio 2001 c-1.5 3.6 Ohio 2002 D NaN 2.4 Nevada 2001 e-1.7 2.9 Nevada 2002 pair value sort print (Frame1.sort (columns= ' year ')) output result: Year state Pop debt a Ohio 1.5-1.2 B 2001 Ohio 1.7 nan D 2001 Nevada 2.4 nan C 2002 Ohio 3.6-1.5 E
2002 Nevada 2.9-1.7 Merge: Concat, by merging rows, multiple dataframe and then putting them into an array frame2 = Dataframe (Np.random.rand (4,3)); Dataframe CutFrame3 = [Frame2[:2],frame2[3:4]] dataframe Merge _frame3 = Pd.concat (frame3) print (_FRAME3) output results: 0 1 2 0 0.449950 0.556051 0.811427 1 0.312357 0.429655 0.725275 3 0.558072 0.747375 0.803023 Append Append operations fram
e4 = Frame2.append (_frame3,ignore_index=true); Print (FRAME4) output results: 0 1 2 0 0.449950 0.556051 0.811427 1 0.312357 0.429655 0.725275 2-0.
287861 0.464538 0.744888 3, 0.558072 0.747375 0.803023 4 0.449950 0.556051 0.811427 5 0.312357 0.429655 0.725275
6 0.558072 0.747375 0.803023 groupby Group, sum group_data = frame.groupby (' state '). SUM () print (group_data) output result: Year POPs State Nevada 4003 5.3 Ohio 6003 6.8
Import Save Data
Read CSV file
CSV = pd.read_csv (' E:\zhangqx\test.csv ', parse_dates=true)
Write CSV file
Csv.to_csv (' E:\zhangqx\test1.csv ')
Supported formats:
Csv,excel,hdf,sql,json,html