Original English: 04-lesson
In this lesson, we will revert to some basic concepts. We'll use a smaller dataset so you can easily understand the concepts I'm trying to explain. We will add columns, delete columns, and slice the data (slicing) operations in different ways. enjoy!
# Import required Libraries import
pandas as PD
import sys
Print (' Python version ' + sys.version)
print (' Pandas version: ' + pd.__version__)
Python version 3.6.1 | Packaged by Conda-forge | (Default, Mar 2017, 21:57:00)
[GCC 4.2.1 compatible Apple LLVM 6.1.0 (clang-602.0.53)]
Pandas version:0.19.2
# Our little dataset
d = [0,1,2,3,4,5,6,7,8,9]
# Creates a dataframe
df = PD. Dataframe (d)
DF
|
0 |
0 |
0 |
1 |
1 |
2 |
2 |
3 |
3 |
4 |
4 |
5 |
5 |
6 |
6 |
7 |
7 |
8 |
8 |
9 |
9 |
# Let's change the name of the list
df.columns = [' Rev ']
DF
|
Rev |
0 |
0 |
1 |
1 |
2 |
2 |
3 |
3 |
4 |
4 |
5 |
5 |
6 |
6 |
7 |
7 |
8 |
8 |
9 |
9 |
# We add a column of
df[' newcol ' = 5
DF
|
Rev |
Newcol |
0 |
0 |
5 |
1 |
1 |
5 |
2 |
2 |
5 |
3 |
3 |
5 |
4 |
4 |
5 |
5 |
5 |
5 |
6 |
6 |
5 |
7 |
7 |
5 |
8 |
8 |
5 |
9 |
9 |
5 |
# Modify the value of the newly added column
df[' newcol ' = df[' newcol ' + 1
DF
|
Rev |
Newcol |
0 |
0 |
6 |
1 |
1 |
6 |
2 |
2 |
6 |
3 |
3 |
6 |
4 |
4 |
6 |
5 |
5 |
6 |
6 |
6 |
6 |
7 |
7 |
6 |
8 |
8 |
6 |
9 |
9 |
6 |
# We can delete column
del df[' Newcol ']
DF
|
Rev |
0 |
0 |
1 |
1 |
2 |
2 |
3 |
3 |
4 |
4 |
5 |
5 |
6 |
6 |
7 |
7 |
8 |
8 |
9 |
9 |
# Let's add a few more columns. Translator Note: When using dataframe columns, Dataframe automatically adds this new column
df[' Test ' = 3
df[' col '] = df[' rev. '
DF
3 4
|
rev. |
test |
col |
0 |
0 |
3 |
0 |
1 |
1 |
3 |
1 |
2 |
2 |
3 |
2 |
3 |
3 |
3 |
4 |
4 |
3 |
5 |
5 |
3 |
5 |
6 |
6 |
3 |
6 |
7 |
7 |
3 |
7 | /tr>
8 |
8 |
3 |
8 |
9 |
9 |
3 |
9 |
# if necessary, you can change the index's name i = [' A ', ' B ', ' C ', ' d ', ' e ', ' f ', ' g ', ' h ', ' I ', ' j '] Df.index = i df
|
Rev |
Test |
Col |
A |
0 |
3 |
0 |
B |
1 |
3 |
1 |
C |
2 |
3 |
2 |
D |
3 |
3 |
3 |
E |
4 |
3 |
4 |
F |
5 |
3 |
5 |
G |
6 |
3 |
6 |
H |
7 |
3 |
7 |
I |
8 |
3 |
8 |
J |
9 |
3 |
9 |
By using *loc, we can select some of the data in the Dataframe.
Df.loc[' a ']
Rev. 0
Test 3
col 0
name:a, Dtype:int64
# df.loc[starting index (included): Terminating index (inclusive)]
df.loc[' a ': ' d ']
|
Rev |
Test |
Col |
A |
0 |
3 |
0 |
B |
1 |
3 |
1 |
C |
2 |
3 |
2 |
D |
3 |
3 |
3 |
# df.iloc[Start index (included): End index (not included)]
# Note:. Iloc is very tightly restricted to the index of shaping. From [version 0.11.0] (http://pandas.pydata.org/pandas-docs /STABLE/WHATSNEW.HTML#V0-11-0-APRIL-22-2013) began to have this operation.
Df.iloc[0:3]
|
Rev |
Test |
Col |
A |
0 |
3 |
0 |
B |
1 |
3 |
1 |
C |
2 |
3 |
2 |
You can also select a column's value by column name.
df[' Rev ']
A 0
b 1
C 2
D 3
e 4
F 5
g 6
H 7
I 8
J 9
Name:rev, Dtype:int64
df[[' Rev ', ' Test ']]
|
Rev |
Test |
A |
0 |
3 |
B |
1 |
3 |
C |
2 |
3 |
D |
3 |
3 |
E |
4 |
3 |
F |
5 |
3 |
G |
6 |
3 |
H |
7 |
3 |
I |
8 |
3 |
J |
9 |
3 |
# df.ix[line range, column Range]
df.ix[0:3, ' Rev '
A 0 B 1 C 2 Name:rev, Dtype:int64
Df.ix[5:, ' col ']
F 5 G 6 h 7 I 8 J 9 Name:col, Dtype:int64
df.ix[:3,[' col ', ' Test '] #译者注: Select multiple columns with a list of columns
|
Col |
Test |
A |
0 |
3 |
B |
1 |
3 |
C |
2 |
3 |
There are also some convenient ways to select the first or last records.
# Select Top-n Record (default is 5)
Df.head ()
|
Rev |
Test |
Col |
A |
0 |
3 |
0 |
B |
1 |
3 |
1 |
C |
2 |
3 |
2 |
D |
3 |
3 |
3 |
E |
4 |
3 |
4 |
# Select Bottom-n Record (default is 5)
Df.tail ()
i 9
|
rev. |
test |
col |
F |
5 |
3 |
5 |
g |
6 |
3 |
6 |
h |
7 |
3 |
7 |
8 |
3 |
8 |
J |
9 |
3 |