Pandas ranking and rank __pandas the road of cultivation

Source: Internet
Author: User

Sometimes we can rank and sort series and dataframe based on the size of the index or the size of the value. A, sorting

Pandas provides a Sort_index method that sorts A, series sort 1, sorted by index based on the index of rows or columns in the order of the dictionary.

    #定义一个Series
    s = Series ([1,2,3],index=["A", "C", "B"])
    #对Series的索引进行排序, the default is ascending
    print (S.sort_index ())
    '
    a    1
    b    3
    C    2
    '
    #对索引进行降序排序
    print (S.sort_index (ascending=false))
    "'
    C    2
    b    3
    a    1
    '"
2, sorted by value

    s = Series ([np.nan,1,7,2,0],index=["A", "C", "E", "B", "D"])
    #对Series的值进行排序, the default is to sort by value in ascending order of
    print (S.sort_ VALUES ())
    "'
    d    0.0
    C    1.0
    B    2.0
    e    7.0
    a    NaN
    '"
    #对Seires的值进行降序排序
    Print (S.sort_values (ascending=false))
    '
    e    7.0
    B    2.0
    C    1.0
    D    0.0
    a    NaN
    ' "
When you sort values, either ascending or descending, the missing value (NaN) is ranked at the very bottom.

B, dataframe sorting

1, sorted by index

    A = Np.arange (9). Reshape (3,3)
    data = dataframe (a,index=["0", "2", "1"],columns=["C", "A", "B"])
    #按行的索引升序进行排序, Default by row, ascending
    print (Data.sort_index ())
    '
       c  a  b
    0  0  1  2
    1  6  7  8
    2  3  4  5
    ' '
    #按行的索引按降序进行排序
    print (Data.sort_index (ascending=false))
    '
       c  a  b
    2  3  4  5
    1  6 7 8 0 0 1  2
    '
    #按列升序的索引进行排序
    print (Data.sort_index (axis=1))
    '
       a  b  c
    0  1  2  0
    2  4
    5 3 1 7 8 "' #按列降序的索引进行排序 Print (Data.sort_index (ascending=false))
    ' '
       c  a  b
    2  3  4  5
    1  6  7  8
    0  0 1 2
    ' '
2, sorted by value

    a = [[[9,3,1],[1,2,8],[1,0,5]]
    data = Dataframe (A, index=["0", "2", "1"], columns=["C", "A", "B"])
    Sort in order of the value of the specified column
    print (Data.sort_values (by= "C"))
    '
       c  a  b
    2  1  2  8
    1  1  0  5 0 9 3 1 ' '
    print (Data.sort_values (by=["C", "a")
    ) "
       C  a  b
    1  1  0  5
    2  1 2 8 0 9 3 1
    '
    #按指定行值进行排序
    print (data.sort_values (by= "0", Axis=1)) ' ""
       b  a  c
    0  1  3  9
    2  8
    2 1 1 5 0 1 ' '
Note: When sorting the values of dataframe, we must use by to specify a row (column) or a few rows (columns), if not using the by parameter to specify the time, will be reported typeerror:sort_values () missing 1 required Positional argument: ' by '. When you use the by parameter to sort some columns (rows), whichever is the first in the list, it is possible that the following will not take effect, because sometimes it is impossible to sort the first row (column) in ascending order and the second row (column). When you specify row values to sort, you must set Axis=1, or you will not be able to make an error, because the default is to specify the column index, the index is not found so the error, Axis=1 means that the specified row index.
second, ranking

Rank and order are somewhat similar, ranking will have a ranking value (starting from 1 to the number of valid data in the array), which is similar to Numpy.argsort's indirect sort index, except that it can break the level relationship according to some rule.
A, Series's ranking

    s = Series ([1,3,2,1,6],index=["A", "C", "D", "B", "E"])
    #默认是根据值的大小进行平均排名
    '
    1 is the smallest, so the first 1 row in the first, the second 1 row in the second
    because the average ranking is taken, 1 is ranked 1.5
    '
    print (S.rank ())
    '
    a    1.5
    C    4.0
    D    3.0
    b    1.5
    e    5.0
    ' '
    #根据值在数组中出现的顺序进行排名
    print (S.rank (method= "a")
    "
    a    1.0
    C    4.0
    D    3.0
    B    2.0
    e    5.0
    ' "
Method parameters In addition to the original data in the order of the occurrence of the ranking, and Min use the entire group of the smallest ranking, Max is the entire group with the largest ranking, average use average rankings, but also the default ranking. You can also set the ascending parameter to set a descending or ascending sort.

B, Dataframe's ranking

    A = [[9, 3, 1], [1, 2, 8], [1, 0, 5]]
    data = Dataframe (A, index=["0", "2", "1"], columns=["C", "A", "B"])
    print (DA TA)
    '
       c  a  b
    0  9  3  1
    2  1 2 8 1 1 0 5
    '
    #默认按列进行排名
    print (Data.rank ()) ' '
         c    a    b
    0  3.0  3.0  1.0
    2  1.5  2.0  3.0
    1  1.5 1.0 2.0 ' '
    #按行进行排名
    Print ( Data.rank (Axis=1))
    '
         c    a    b
    0  3.0  2.0  1.0
    2  1.0  2.0  3.0
    1  2.0  1.0  3.0
    '
The method parameter and the ascending parameter are set in the same way as series.





Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.