Python array, list, And dataframe index slicing operations: July 22, July 19, 2016-zhi Lang document,

Source: Internet
Author: User

Python array, list, And dataframe index slicing operations: July 22, July 19, 2016-zhi Lang document,
Array, list, And dataframe index slicing operations: January 1, July 19, 2016-zhi Lang document

List, one-dimensional, two-dimensional array, datafrme, loc, iloc, and ix

Numpy array index and slice introduction:
Starting from the basic list index, let's start with the code and result:

A = [,] a [: 5:-1] # step <0, so start = 9 a [0: 5: -1] # specify start = 0 a [1:-1] # step <0, so stop = 0

Output:

[9, 8, 7, 6][][1, 0]

List slicing generally has two ":" delimiters in "[]". The Chinese meaning is [start: end: Step Size]. In the above case, the step size is-1, so the output data is in reverse order. If no value is assigned (start, stop), the default value is 0. The default value of sep is 1 and the value cannot be 0.

A [] # Number of the first 11-20 a [: 10: 2] # Number of the first 10, one a for each two [: 5] # All numbers, one for each five

Advanced Operations in python slicing:
Slice Principle Analysis:
List slice, which is called internallyGetitem,Setitem,DelitemAnd slice functions. The slice function is related to the range () function.
The key passed to the slice is a special slice object. This object has the attributes that can describe the request slice orientation, and the meaning and demonstration of the slice:

>>> List4 = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20][1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]>>> x = List4[1:10] #x = List4.__getitem__(slice(1,10,None))[2, 3, 4, 5, 6, 7, 8, 9, 10]>>> List4[1:5]=[100,111,122] #List4.setitem__(slice(1,3,None),100,111,122])[1, 100, 111, 122, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]>>> del List4[1:4] #List4.del__delitem__(slice(1,4,None))[1, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]>>>

Slice boundary problems:

S = [100, 100] # S upper bound is 0 Lower Bound for the 4S [-100:100] # Return [,]-exceeds the upper bound, exceeds the lower bound: it is equivalent to s [0: 4] s [-100:-200] # The returned results []-100,-200 all exceed the upper bound, and the upper bound is automatically obtained: it is equivalent to s [0: 0] s [100:200] # the return value [] 100,200 is beyond the lower bound, and the value of the lower bound is automatically obtained: equivalent to s [] s [: 100] # Return the start value of [0th,]. If the value is omitted, it indicates that the end value starts from s [0:] # Return the value of [,]. If the end value is null, it indicates that the end value ends at the end.

Slice extension knowledge:

>>> Id (List4) 140115516658320 # assign the value List5 = List4 directly through the list. The memory address space to which the value is directed is unchanged and is (140115516658320 ), the List4 or List5 list will be deleted, that is, List4 and List5 have no elements. >>> List5 = List4> List5 [1, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20] >>> List4 [1, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20] >>> id (List5) 140115516658320 # However, the two lists named by slices point to different memory address numbers, 140115516658320! = 140115516604784 >>>> List6 = List5 >>> id (List6) 140115516658320 >>> List6 = List4 [:] >>> id (List6) 140115516604784 >>># address change... >>>

We will add the following extensions:

>>> ListOfRows = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12] >>> li = listOfRows >>> id (listOfRows) 206368904L >>> id (li) # the IDs of the two are the same, and the same object 206368904L >>>> listOfRows [:] = [[row [0], row [3], row [2] for row in listOfRows] >>> listOfRows [[1, 4, 3], [5, 8, 7], [9, 12, 11] >>> li # Use slice assignment to achieve the expected effect. The same object changes following [[1, 4, 3], [5, 8, 7], [9, 12, 11] >>>> id (listOfRows) 206425904l >>>> id (li) # the IDs of both are unchanged, it indicates that the slice assignment is actually performed on the original object by modifying 206425904l >>> listOfRows = [[1, 2, 4], [5, 6, 7, 8], [9, 10, 11, 12] >>> li [[1, 4, 3], [5, 8, 7], [9, 12, 11] >>> id (li) # li has not changed 206425904l >>> id (listOfRows) # different IDs indicate that listOfRows is bound to a new object 206412488L >>> listOfRows [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]

If "listOfRows =" is used directly, a new object is generated, which is written using "listOfRows [:] =. Simply put, you can use the slice assignment to modify the class capacity of the original object, instead of creating a new object.
Consequence is a data structure in python that obtains objects in a sequence based on indexes.
Python contains six built-in sequence classes:List, tuple, string, unicode, buffer, Xrange. Xrange is special. It is a generator, and some sequence features of other types are not suitable for it. Generally, index, len, max, min, in, +, *, and slice can be used for data types with sequence structures.
A list slice is called a step slice. the syntax of a list slice that uses the third element is sequence [Starting index: Ending index: Stepping value]. The phrase is: "ignore your head and tail ". If your first index is "0", you can omit it.
When Python uses the slice syntax, a slice object is generated. The extended slice syntax allows different index slice operations including step slice, multi-dimensional slice, and omitted slice. The syntax of multi-dimensional slicing is sequence [start1: end1, start2: end2], or the ellipsis, sequence [..., Start1: end1]. The slice object can also be implemented by the built-in function slice ().

Selection of two-dimensional arrays:
The syntax of multi-dimensional array slicing is sequence [start1: end1, start2: end2 ,..., Startn: endn] we use a 3x3 two-dimensional array to demonstrate the selection problem:

>>> b  = np.arange(9).reshape(3,3)>>> barray([[0, 1, 2],       [3, 4, 5],       [6, 7, 8]])

Array subscript starts from 0. For array a, you only need to use a [m, n] to select the elements in each array. The corresponding location is as follows:

[(0,0),(0,1),(0,2)][(1,0),(1,1),(1,2)][(2,0),(2,1),(2,2)]

The syntax for Two-Dimensional slice is sequence [start1: end1, start2: end2].

>>> B [1:,: 2] # first, split the output from the first comma (, 1) to enable [(), (), ()] # And [(2, 0), (2, 1), (2, 2)] # Take the data separated by the first comma and perform the second-dimensional operation in the column ending with 2, input the following array ([[3, 4], [6, 7])

After understanding the step slice, the two-dimensional and three-dimensional aspects have the same good understanding, and are not as complicated as the step.
You can also copy the sliced elements.

>>> B [1:,: 2] = 1 # broadcast assignment >>> barray ([[0, 1, 2], [1, 1, 5], [1, 1, 8]) >>> B [1:,: 2]. shape (2L, 2L) >>> B [1:,: 2] = np. arange (2, 6 ). reshape () # corresponding value >>> barray ([[0, 1, 2], [2, 3, 5], [4, 5, 8])

Similarly, sequence [start1: end1, start2: end2]. When the ticket value is obtained, a [l, m, n].
Omitted [:] indicates all elements in n-dimensional form.

>>> b=np.arange(24).reshape(2,3,4)>>> b[1,]array([[12, 13, 14, 15],       [16, 17, 18, 19],       [20, 21, 22, 23]])>>> b[1,2]array([20, 21, 22, 23])>>> b[1,2,3]23>>> b[1,:,3]array([15, 19, 23])>>>  

Here, we can use iloc to split a df as a multi-dimensional array for pandas dataframe.

>>> B = np. arange (9 ). reshape (3, 3) >>> df = pd. dataFrame (B) >>> df. iloc [1, 2] 5 >>> df. iloc [1:, 2] 1 52 8 Name: 2, dtype: int32 >>> df. iloc [1:,: 2] 0 11 3 42 6 7 >>> df. iloc [1:,: 2] = 1 # same broadcast assignment >>> df 0 1 20 0 1 21 1 1 52 1 1 1 8

(Don't Worry about using my df slice)

Let's talk about loc. loc is selected based on index and columns. In the df assignment operation, we recommend this assignment method.

When index and columns are values and start from 0, let's compare them:

>>> b = np.arange(36).reshape(6,6)>>> barray([[ 0,  1,  2,  3,  4,  5],       [ 6,  7,  8,  9, 10, 11],       [12, 13, 14, 15, 16, 17],       [18, 19, 20, 21, 22, 23],       [24, 25, 26, 27, 28, 29],       [30, 31, 32, 33, 34, 35]])>>> df = pd.DataFrame(b)>>> df    0   1   2   3   4   50   0   1   2   3   4   51   6   7   8   9  10  112  12  13  14  15  16  173  18  19  20  21  22  234  24  25  26  27  28  295  30  31  32  33  34  35>>> df.loc[1:,:2]    0   1   21   6   7   82  12  13  143  18  19  204  24  25  265  30  31  32>>> df.iloc[1:,:2]    0   11   6   72  12  133  18  194  24  255  30  31>>> df.iloc[1,2]8>>> df.loc[1,2]8>>> 

We can see that df. loc [1:,: 2] selects the content of the 2nd column, but its essence is not range (). It includes the end 2. It is actually> = relationship. Use column to determine columns greater than or equal to 2. Terminate immediately after the conditions are not met.

>>> df.columns  =  [2,1,3,4,0,5]>>> df    2   1   3   4   0   50   0   1   2   3   4   51   6   7   8   9  10  112  12  13  14  15  16  173  18  19  20  21  22  234  24  25  26  27  28  295  30  31  32  33  34  35>>> df.loc[1,2]6>>> df.iloc[1,2]8>>> >>> df.iloc[1:,:2]    2   11   6   72  12  133  18  194  24  255  30  31>>> df.loc[1:,:2]    21   62  123  184  245  30>>> 

One advantage of loc is that you can rearrange the column order.

>>> df.loc[:,(1,2,3,4)]    1   2   3   40   1   0   2   31   7   6   8   92  13  12  14  153  19  18  20  214  25  24  26  275  31  30  32  33>>> df.iloc[:,(1,2,3,4)]    1   3   4   00   1   2   3   41   7   8   9  102  13  14  15  163  19  20  21  224  25  26  27  285  31  32  33  34>>> 

It's amazing that this iloc is hard to handle. When the column name is changed to a letter, loc can be empty.

Ix solves the problem of mixed selection

>>> df.ix[:,(1,2,3,4)]    1   2   3   40   1   0   2   31   7   6   8   92  13  12  14  153  19  18  20  214  25  24  26  275  31  30  32  33>>> df.ix[:,:2]    20   01   62  123  184  245  30>>> df.ix[:,:2]

Ix simply understands that when the columns and columns are numbers, ix follows loc. If it is a letter, ix automatically determines the value in [], but the value in [row, column] remains unchanged.

>>> Df. loc [: 2,: 2] 22 0> df. iloc [: 2,: 2] 2 12 0 11 6 7 >>> df. ix [: 2,: 2] 22 0> df. index = ['A', 'C', 'D', 'B', 'E', 'F']> df. ix [: 2,: 2] 2a 0c 6 >>> df. iloc [: 2,: 2] 2 1a 0 1c 6 7> df. loc [: 2,: 2] # Here, loc returns an error because there is no Traceback (most recent call last) of the numerical type in column ):

After talking about this, we should be able to understand these slices.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.