Python array, list, And dataframe index slicing operations: July 22, July 19, 2016-zhi Lang document,
Array, list, And dataframe index slicing operations: January 1, July 19, 2016-zhi Lang document
List, one-dimensional, two-dimensional array, datafrme, loc, iloc, and ix
Numpy array index and slice introduction:
Starting from the basic list index, let's start with the code and result:
A = [,] a [: 5:-1] # step <0, so start = 9 a [0: 5: -1] # specify start = 0 a [1:-1] # step <0, so stop = 0
Output:
[9, 8, 7, 6][][1, 0]
List slicing generally has two ":" delimiters in "[]". The Chinese meaning is [start: end: Step Size]. In the above case, the step size is-1, so the output data is in reverse order. If no value is assigned (start, stop), the default value is 0. The default value of sep is 1 and the value cannot be 0.
A [] # Number of the first 11-20 a [: 10: 2] # Number of the first 10, one a for each two [: 5] # All numbers, one for each five
Advanced Operations in python slicing:
Slice Principle Analysis:
List slice, which is called internallyGetitem,Setitem,DelitemAnd slice functions. The slice function is related to the range () function.
The key passed to the slice is a special slice object. This object has the attributes that can describe the request slice orientation, and the meaning and demonstration of the slice:
>>> List4 = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20][1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]>>> x = List4[1:10] #x = List4.__getitem__(slice(1,10,None))[2, 3, 4, 5, 6, 7, 8, 9, 10]>>> List4[1:5]=[100,111,122] #List4.setitem__(slice(1,3,None),100,111,122])[1, 100, 111, 122, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]>>> del List4[1:4] #List4.del__delitem__(slice(1,4,None))[1, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]>>>
Slice boundary problems:
S = [100, 100] # S upper bound is 0 Lower Bound for the 4S [-100:100] # Return [,]-exceeds the upper bound, exceeds the lower bound: it is equivalent to s [0: 4] s [-100:-200] # The returned results []-100,-200 all exceed the upper bound, and the upper bound is automatically obtained: it is equivalent to s [0: 0] s [100:200] # the return value [] 100,200 is beyond the lower bound, and the value of the lower bound is automatically obtained: equivalent to s [] s [: 100] # Return the start value of [0th,]. If the value is omitted, it indicates that the end value starts from s [0:] # Return the value of [,]. If the end value is null, it indicates that the end value ends at the end.
Slice extension knowledge:
>>> Id (List4) 140115516658320 # assign the value List5 = List4 directly through the list. The memory address space to which the value is directed is unchanged and is (140115516658320 ), the List4 or List5 list will be deleted, that is, List4 and List5 have no elements. >>> List5 = List4> List5 [1, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20] >>> List4 [1, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20] >>> id (List5) 140115516658320 # However, the two lists named by slices point to different memory address numbers, 140115516658320! = 140115516604784 >>>> List6 = List5 >>> id (List6) 140115516658320 >>> List6 = List4 [:] >>> id (List6) 140115516604784 >>># address change... >>>
We will add the following extensions:
>>> ListOfRows = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12] >>> li = listOfRows >>> id (listOfRows) 206368904L >>> id (li) # the IDs of the two are the same, and the same object 206368904L >>>> listOfRows [:] = [[row [0], row [3], row [2] for row in listOfRows] >>> listOfRows [[1, 4, 3], [5, 8, 7], [9, 12, 11] >>> li # Use slice assignment to achieve the expected effect. The same object changes following [[1, 4, 3], [5, 8, 7], [9, 12, 11] >>>> id (listOfRows) 206425904l >>>> id (li) # the IDs of both are unchanged, it indicates that the slice assignment is actually performed on the original object by modifying 206425904l >>> listOfRows = [[1, 2, 4], [5, 6, 7, 8], [9, 10, 11, 12] >>> li [[1, 4, 3], [5, 8, 7], [9, 12, 11] >>> id (li) # li has not changed 206425904l >>> id (listOfRows) # different IDs indicate that listOfRows is bound to a new object 206412488L >>> listOfRows [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]
If "listOfRows =" is used directly, a new object is generated, which is written using "listOfRows [:] =. Simply put, you can use the slice assignment to modify the class capacity of the original object, instead of creating a new object.
Consequence is a data structure in python that obtains objects in a sequence based on indexes.
Python contains six built-in sequence classes:List, tuple, string, unicode, buffer, Xrange. Xrange is special. It is a generator, and some sequence features of other types are not suitable for it. Generally, index, len, max, min, in, +, *, and slice can be used for data types with sequence structures.
A list slice is called a step slice. the syntax of a list slice that uses the third element is sequence [Starting index: Ending index: Stepping value]. The phrase is: "ignore your head and tail ". If your first index is "0", you can omit it.
When Python uses the slice syntax, a slice object is generated. The extended slice syntax allows different index slice operations including step slice, multi-dimensional slice, and omitted slice. The syntax of multi-dimensional slicing is sequence [start1: end1, start2: end2], or the ellipsis, sequence [..., Start1: end1]. The slice object can also be implemented by the built-in function slice ().
Selection of two-dimensional arrays:
The syntax of multi-dimensional array slicing is sequence [start1: end1, start2: end2 ,..., Startn: endn] we use a 3x3 two-dimensional array to demonstrate the selection problem:
>>> b = np.arange(9).reshape(3,3)>>> barray([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
Array subscript starts from 0. For array a, you only need to use a [m, n] to select the elements in each array. The corresponding location is as follows:
[(0,0),(0,1),(0,2)][(1,0),(1,1),(1,2)][(2,0),(2,1),(2,2)]
The syntax for Two-Dimensional slice is sequence [start1: end1, start2: end2].
>>> B [1:,: 2] # first, split the output from the first comma (, 1) to enable [(), (), ()] # And [(2, 0), (2, 1), (2, 2)] # Take the data separated by the first comma and perform the second-dimensional operation in the column ending with 2, input the following array ([[3, 4], [6, 7])
After understanding the step slice, the two-dimensional and three-dimensional aspects have the same good understanding, and are not as complicated as the step.
You can also copy the sliced elements.
>>> B [1:,: 2] = 1 # broadcast assignment >>> barray ([[0, 1, 2], [1, 1, 5], [1, 1, 8]) >>> B [1:,: 2]. shape (2L, 2L) >>> B [1:,: 2] = np. arange (2, 6 ). reshape () # corresponding value >>> barray ([[0, 1, 2], [2, 3, 5], [4, 5, 8])
Similarly, sequence [start1: end1, start2: end2]. When the ticket value is obtained, a [l, m, n].
Omitted [:] indicates all elements in n-dimensional form.
>>> b=np.arange(24).reshape(2,3,4)>>> b[1,]array([[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]])>>> b[1,2]array([20, 21, 22, 23])>>> b[1,2,3]23>>> b[1,:,3]array([15, 19, 23])>>>
Here, we can use iloc to split a df as a multi-dimensional array for pandas dataframe.
>>> B = np. arange (9 ). reshape (3, 3) >>> df = pd. dataFrame (B) >>> df. iloc [1, 2] 5 >>> df. iloc [1:, 2] 1 52 8 Name: 2, dtype: int32 >>> df. iloc [1:,: 2] 0 11 3 42 6 7 >>> df. iloc [1:,: 2] = 1 # same broadcast assignment >>> df 0 1 20 0 1 21 1 1 52 1 1 1 8
(Don't Worry about using my df slice)
Let's talk about loc. loc is selected based on index and columns. In the df assignment operation, we recommend this assignment method.
When index and columns are values and start from 0, let's compare them:
>>> b = np.arange(36).reshape(6,6)>>> barray([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35]])>>> df = pd.DataFrame(b)>>> df 0 1 2 3 4 50 0 1 2 3 4 51 6 7 8 9 10 112 12 13 14 15 16 173 18 19 20 21 22 234 24 25 26 27 28 295 30 31 32 33 34 35>>> df.loc[1:,:2] 0 1 21 6 7 82 12 13 143 18 19 204 24 25 265 30 31 32>>> df.iloc[1:,:2] 0 11 6 72 12 133 18 194 24 255 30 31>>> df.iloc[1,2]8>>> df.loc[1,2]8>>>
We can see that df. loc [1:,: 2] selects the content of the 2nd column, but its essence is not range (). It includes the end 2. It is actually> = relationship. Use column to determine columns greater than or equal to 2. Terminate immediately after the conditions are not met.
>>> df.columns = [2,1,3,4,0,5]>>> df 2 1 3 4 0 50 0 1 2 3 4 51 6 7 8 9 10 112 12 13 14 15 16 173 18 19 20 21 22 234 24 25 26 27 28 295 30 31 32 33 34 35>>> df.loc[1,2]6>>> df.iloc[1,2]8>>> >>> df.iloc[1:,:2] 2 11 6 72 12 133 18 194 24 255 30 31>>> df.loc[1:,:2] 21 62 123 184 245 30>>>
One advantage of loc is that you can rearrange the column order.
>>> df.loc[:,(1,2,3,4)] 1 2 3 40 1 0 2 31 7 6 8 92 13 12 14 153 19 18 20 214 25 24 26 275 31 30 32 33>>> df.iloc[:,(1,2,3,4)] 1 3 4 00 1 2 3 41 7 8 9 102 13 14 15 163 19 20 21 224 25 26 27 285 31 32 33 34>>>
It's amazing that this iloc is hard to handle. When the column name is changed to a letter, loc can be empty.
Ix solves the problem of mixed selection
>>> df.ix[:,(1,2,3,4)] 1 2 3 40 1 0 2 31 7 6 8 92 13 12 14 153 19 18 20 214 25 24 26 275 31 30 32 33>>> df.ix[:,:2] 20 01 62 123 184 245 30>>> df.ix[:,:2]
Ix simply understands that when the columns and columns are numbers, ix follows loc. If it is a letter, ix automatically determines the value in [], but the value in [row, column] remains unchanged.
>>> Df. loc [: 2,: 2] 22 0> df. iloc [: 2,: 2] 2 12 0 11 6 7 >>> df. ix [: 2,: 2] 22 0> df. index = ['A', 'C', 'D', 'B', 'E', 'F']> df. ix [: 2,: 2] 2a 0c 6 >>> df. iloc [: 2,: 2] 2 1a 0 1c 6 7> df. loc [: 2,: 2] # Here, loc returns an error because there is no Traceback (most recent call last) of the numerical type in column ):
After talking about this, we should be able to understand these slices.