Sometimes we can rank and sort series and dataframe based on the size of the index or the size of the value. A, sorting
Pandas provides a Sort_index method that sorts A, series sort 1, sorted by index based on the index of rows or columns in the order of the dictionary.
#定义一个Series
s = Series ([1,2,3],index=["A", "C", "B"])
#对Series的索引进行排序, the default is ascending
print (S.sort_index ())
'
a 1
b 3
C 2
'
#对索引进行降序排序
print (S.sort_index (ascending=false))
"'
C 2
b 3
a 1
'"
2, sorted by value
s = Series ([np.nan,1,7,2,0],index=["A", "C", "E", "B", "D"])
#对Series的值进行排序, the default is to sort by value in ascending order of
print (S.sort_ VALUES ())
"'
d 0.0
C 1.0
B 2.0
e 7.0
a NaN
'"
#对Seires的值进行降序排序
Print (S.sort_values (ascending=false))
'
e 7.0
B 2.0
C 1.0
D 0.0
a NaN
' "
When you sort values, either ascending or descending, the missing value (NaN) is ranked at the very bottom.
B, dataframe sorting
1, sorted by index
A = Np.arange (9). Reshape (3,3)
data = dataframe (a,index=["0", "2", "1"],columns=["C", "A", "B"])
#按行的索引升序进行排序, Default by row, ascending
print (Data.sort_index ())
'
c a b
0 0 1 2
1 6 7 8
2 3 4 5
' '
#按行的索引按降序进行排序
print (Data.sort_index (ascending=false))
'
c a b
2 3 4 5
1 6 7 8 0 0 1 2
'
#按列升序的索引进行排序
print (Data.sort_index (axis=1))
'
a b c
0 1 2 0
2 4
5 3 1 7 8 "' #按列降序的索引进行排序 Print (Data.sort_index (ascending=false))
' '
c a b
2 3 4 5
1 6 7 8
0 0 1 2
' '
2, sorted by value
a = [[[9,3,1],[1,2,8],[1,0,5]]
data = Dataframe (A, index=["0", "2", "1"], columns=["C", "A", "B"])
Sort in order of the value of the specified column
print (Data.sort_values (by= "C"))
'
c a b
2 1 2 8
1 1 0 5 0 9 3 1 ' '
print (Data.sort_values (by=["C", "a")
) "
C a b
1 1 0 5
2 1 2 8 0 9 3 1
'
#按指定行值进行排序
print (data.sort_values (by= "0", Axis=1)) ' ""
b a c
0 1 3 9
2 8
2 1 1 5 0 1 ' '
Note: When sorting the values of dataframe, we must use by to specify a row (column) or a few rows (columns), if not using the by parameter to specify the time, will be reported typeerror:sort_values () missing 1 required Positional argument: ' by '. When you use the by parameter to sort some columns (rows), whichever is the first in the list, it is possible that the following will not take effect, because sometimes it is impossible to sort the first row (column) in ascending order and the second row (column). When you specify row values to sort, you must set Axis=1, or you will not be able to make an error, because the default is to specify the column index, the index is not found so the error, Axis=1 means that the specified row index.
second, ranking
Rank and order are somewhat similar, ranking will have a ranking value (starting from 1 to the number of valid data in the array), which is similar to Numpy.argsort's indirect sort index, except that it can break the level relationship according to some rule.
A, Series's ranking
s = Series ([1,3,2,1,6],index=["A", "C", "D", "B", "E"])
#默认是根据值的大小进行平均排名
'
1 is the smallest, so the first 1 row in the first, the second 1 row in the second
because the average ranking is taken, 1 is ranked 1.5
'
print (S.rank ())
'
a 1.5
C 4.0
D 3.0
b 1.5
e 5.0
' '
#根据值在数组中出现的顺序进行排名
print (S.rank (method= "a")
"
a 1.0
C 4.0
D 3.0
B 2.0
e 5.0
' "
Method parameters In addition to the original data in the order of the occurrence of the ranking, and Min use the entire group of the smallest ranking, Max is the entire group with the largest ranking, average use average rankings, but also the default ranking. You can also set the ascending parameter to set a descending or ascending sort.
B, Dataframe's ranking
A = [[9, 3, 1], [1, 2, 8], [1, 0, 5]]
data = Dataframe (A, index=["0", "2", "1"], columns=["C", "A", "B"])
print (DA TA)
'
c a b
0 9 3 1
2 1 2 8 1 1 0 5
'
#默认按列进行排名
print (Data.rank ()) ' '
c a b
0 3.0 3.0 1.0
2 1.5 2.0 3.0
1 1.5 1.0 2.0 ' '
#按行进行排名
Print ( Data.rank (Axis=1))
'
c a b
0 3.0 2.0 1.0
2 1.0 2.0 3.0
1 2.0 1.0 3.0
'
The method parameter and the ascending parameter are set in the same way as series.