Array,list,dataframe Index Tile Operation July 19, 2016--smart wave document
A simple discussion on list, one-dimensional, two-dimensional array,datafrme,loc, Iloc and IX
NumPy an array of indexes and tiles:
Starting with the most basic list index, let's start with a code and result:
a = [0,1,2,3,4,5,6,7,8,9] a[:5:-1] #step < 0,所以start = 9 a[0:5:-1] #指定了start = 0 a[1::-1] #step < 0,所以stop = 0
Output:
[9, 8, 7, 6][][1, 0]
List slice, in "[]" There are generally two ":" Delimiter, Chinese meaning is [start: End: Step] In the above case, the step is 1 so the output of the data is reversed. No Assignment (start,stop) defaults to 0. Sep defaults to 1 and the value cannot be 0.
a[10:20]#前11-20个数a[:10:2]#前10个数,每两个取一个a[::5]#所有数,每5个取一个
Advanced operations in Python slices:
Analysis of the principle of slicing:
The list of slices, inside is called getitem,setitem,delitem , and slice functions. The slice function is also associated with the range () function.
The key passed to the slice is a special slice object. The object has an attribute that describes the orientation of the requested slice, the meaning of the slice, and a demonstration:
>>> List4 = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20][1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]>>> x = List4[1:10] #x = List4.__getitem__(slice(1,10,None))[2, 3, 4, 5, 6, 7, 8, 9, 10]>>> List4[1:5]=[100,111,122] #List4.setitem__(slice(1,3,None),100,111,122])[1, 100, 111, 122, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]>>> del List4[1:4] #List4.del__delitem__(slice(1,4,None))[1, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]>>>
Boundary problem for slices:
s=[1,2,3,4] # S 上界为 0 下界为 4s[-100:100] #返回 [1,2,3,4] -100超出了上界,100超出了下界:等价于 s[0:4]s[-100:-200] #返回 [] -100,-200均超出了上界,自动取上界:等价于s[0:0]s[100:200] #返回 [] 100,200均超出了下界,自动取下界值:等价于s[4:4]s[:100] #返回 [1,2,3,4] 开始值省略表示从第0个开始s[0:] #返回 [1,2,3,4] 结束值为空表示到最后一个结束
Knowledge of slicing extensions:
>>> id(List4)140115516658320#直接通过列表来赋值 List5 = List4,指向的内存地址空间是不变的,都是(140115516658320),无论删除List4还是List5这个列表都会被删除,即List4和List5都没有元素了。>>> List5 = List4>>> List5[1, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]>>> List4[1, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]>>> id(List5)140115516658320#但是,通过切片来命名的两个列表他们指向的内存地址编号是不同的,140115516658320 != 140115516604784>>> List6 = List5>>> id(List6)140115516658320>>> List6 = List4[:]>>> id(List6)140115516604784>>> #地址改变... >>>
The following additions are made to the extensions:
>>> listofrows = [[1,2,3,4], [5,6,7,8], [9,10,11,12]]>>> li = listofrows>>> ID ( listofrows) 206368904l>>> ID (LI) #两者id一致, referencing the same object 206368904l>>> listofrows[:] = [[Row[0], row[3], row [2]] for row in Listofrows]>>> listofrows[[1, 4, 3], [5, 8, 7], [9,, 11]]>>> Li #使用切片赋值, to achieve the desired effect, the same object with with changes [[1, 4, 3], [5, 8, 7], [9, +, 11]]>>> ID (listofrows) 206368904l>>> ID (LI) #两者的id都没有变化, description of the slice assignment is modified on the original object 206368904l>>> listofrows = [[1,2,3,4], [5,6,7,8], [9,10,11,12]]>>> li[[1, 4, 3], [5, 8, 7], [9, 12, 11]] >>> ID (LI) #li没有改变206368904L >>> ID (listofrows) #两者id不同, stating that Listofrows binds a new object 206412488l>> > Listofrows[[1, 2, 3, 4], [5, 6, 7, 8], [9, Ten, One,]]
If you use "listofrows =" directly, a new object is created, using "listofrows[:] =" notation. Simply put, using a slice assignment modifies the class tolerance of the original object instead of creating a new object.
A sequence (consequence) is a data structure in Python that gets the objects in the sequence based on the index.
Python contains six kinds of built-in sequence classes:list, tuple, string, Unicode, buffer, xrange. Where Xrange is special, it is a generator, and several other types have some sequence attributes that are not suitable for it. In general, data types with a sequence structure can be used: index, Len, Max, Min, in, +, *, slice.
A list slice is called a step slice, allowing the third element to be sliced with its syntax sequence[start index: End index: Step value]. The formula is: "Gu Tou regardless of the tail." If your first index is "0", then you can omit to write.
When Python uses slice syntax, it produces slice objects. Extended slice syntax allows for different index tile operations to include step slices, multidimensional slices, and omitted slices. The syntax for a multidimensional slice is sequence[start1:end1,start2:end2], or use the ellipsis, Sequence[...,start1:end1]. The slice object can also be slice () by the built-in function.
Selection of two-dimensional arrays:
First we said that the syntax for multidimensional array slices is sequence[start1:end1,start2:end2,..., Startn:endn] We use a 3x3 two-dimensional array to illustrate the selection problem:
>>> b = np.arange(9).reshape(3,3)>>> barray([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
Array subscript is starting from 0, for array A, you only need to use A[m,n] to select the elements in each array. The corresponding location is as follows
[(0,0),(0,1),(0,2)][(1,0),(1,1),(1,2)][(2,0),(2,1),(2,2)]
For the slice two-dimensional syntax is sequence[start1:end1,start2:end2]
>>> b[1:,:2]#先从第一个逗号分割输出从1开使行 就是 [(1,0),(1,1),(1,2)]# 和 [(2,0),(2,1),(2,2)]#拿第一个逗号分割的数据,在进行第二维操作,到2结束的列,输入如下array([[3, 4], [6, 7]])
Based on the understanding of stepping slices, the two-and three-dimensional are equally well understood and not as complicated as stepping
You can also copy the elements of a slice
>>> b[1:,:2] = 1 #广播赋值>>> barray([[0, 1, 2], [1, 1, 5], [1, 1, 8]])>>> b[1:,:2].shape(2L, 2L)>>> b[1:,:2] = np.arange(2,6).reshape(2,2) #对应赋值>>> barray([[0, 1, 2], [2, 3, 5], [4, 5, 8]])
Three-dimensional, the same is sequence[start1:end1,start2:end2]. When you take a single value, A[l,m,n].
The omitted representation [:] takes all elements of the nth dimension.
>>> b=np.arange(24).reshape(2,3,4)>>> b[1,]array([[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]])>>> b[1,2]array([20, 21, 22, 23])>>> b[1,2,3]23>>> b[1,:,3]array([15, 19, 23])>>>
Here, for pandas's dataframe we can use the iloc to slice a DF as a multidimensional array
>>> b = np.arange(9).reshape(3,3)>>> df = pd.DataFrame(b)>>> df.iloc[1,2]5>>> df.iloc[1:,2]1 52 8Name: 2, dtype: int32>>> df.iloc[1:,:2] 0 11 3 42 6 7>>> df.iloc[1:,:2] = 1#同样的广播赋值>>> df 0 1 20 0 1 21 1 1 52 1 1 8
(Mom doesn't have to worry about my df slice)
Let's talk about the selection of Loc,loc based on index and columns, which is recommended in DF assignment operation.
When index and columns are numeric and starting from 0 we compare:
>>> b = np.arange(36).reshape(6,6)>>> barray([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35]])>>> df = pd.DataFrame(b)>>> df 0 1 2 3 4 50 0 1 2 3 4 51 6 7 8 9 10 112 12 13 14 15 16 173 18 19 20 21 22 234 24 25 26 27 28 295 30 31 32 33 34 35>>> df.loc[1:,:2] 0 1 21 6 7 82 12 13 143 18 19 204 24 25 265 30 31 32>>> df.iloc[1:,:2] 0 11 6 72 12 133 18 194 24 255 30
You can see Df.loc[1:,:2] chose the contents of column 2nd, but his nature is not range (0,2) he included the end of the 2. He's actually a >= relationship. The column is judged to be greater than or equal to 2. Terminates immediately after the condition is not met.
>>> df.columns = [2,1,3,4,0,5]>>> df 2 1 3 4 0 50 0 1 2 3 4 51 6 7 8 9 10 112 12 13 14 15 16 173 18 19 20 21 22 234 24 25 26 27 28 295 30 31 32 33 34 35>>> df.loc[1,2]6>>> df.iloc[1,2]8>>> >>> df.iloc[1:,:2] 2 11 6 72 12 133 18 194 24 255 30 31>>> df.loc[1:,:2] 21 62 123 184 245
One advantage of Loc is that you can rearrange the order of the column
>>> df.loc[:,(1,2,3,4)] 1 2 3 40 1 0 2 31 7 6 8 92 13 12 14 153 19 18 20 214 25 24 26 275 31 30 32 33>>> df.iloc[:,(1,2,3,4)] 1 3 4 00 1 2 3 41 7 8 9 102 13 14 15 163 19 20 21 224 25 26 27 285 31 32 33
Very magical, this iloc is not easy to do, when the column name for the letter, Loc can be fanciful.
IX solving the problem of mixed selection
>>> df.ix[:,(1,2,3,4)] 1 2 3 40 1 0 2 31 7 6 8 92 13 12 14 153 19 18 20 214 25 24 26 275 31 30 32 33>>> df.ix[:,:2] 20 01 62 123 184 245 30>>> df.ix[:,:2]
IX simple understanding is that when the ranks are numbers, ix with LOC. IX automatically determines the value in [] if it is all letters, but "row, column" does not change
>>> df.loc[:2,:2] 22 0>>> df.iloc[:2,:2] 2 12 0 11 6 7>>> df.ix[:2,:2] 22 0>>> df.index = [‘a‘,‘c‘,‘d‘,‘b‘,‘e‘,‘f‘]>>> df.ix[:2,:2] 2a 0c 6>>> df.iloc[:2,:2] 2 1a 0 1c 6 7>>> df.loc[:2,:2] #这里loc就报错了,因为column里面没有数值类型的 Traceback (most recent call last):
That's a lot to be understood about.
Python array,list,dataframe Index Tile Operation July 19, 2016--smart wave document