NumPy using arrays for data processing

Last Update:2018-04-06 Source: Internet

Author: User

Tags python list

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Describe conditional logic as an array operation

Numpy.where () is an expression of three-mesh operation

1 in [the]: Xarr = Np.array ([1.1,1.2,1.3,1.4,1.5])23 in []: Yarr = Np.array ([ 2.1,2.2,2.3,2.4,2.5])45 in [approx]: Condi = Np.array ([True,false,true,true,false])

Assuming that there are three arrays above, when the value in Condi is true, select a value from Xarr, or select a value from Yarr to form a new array. Use the normal list derivation as follows:

result = [(x if C else y) for x,y,c in Zip (Xarr,yarr,condi)]

But this approach has drawbacks: when it comes to large amounts of data processing, the speed is not fast (slow is Python's "feature"). cannot be applied to a multidimensional array.

Using the Where method is a lot easier.

result = Np.where (Condi,xarr,yarr)

The second and third arguments of numpy.where do not necessarily have to be arrays, they can also be scalars.

Suppose we want to generate a new array based on Condi, and if the value in Condi is true, select the number 1, otherwise the number 0.

in [+]: res = np.where (condi,1, 0) in []: resout[]: Array ([1, 0, 1, 1, 0])

In a multidimensional array, use "+" instead of a positive number, "-" instead of a negative number

in []: arr = np.random.randn (bis)) in [47]: arrout[47]:array ([[-0.33641281,-0.56924078, 0.25727917,-0.35087934],       [-0.00734107,-0.47985579,-1.35289703,-1.31366566],       [-0.71342875,-0.21957414,-1.25596815, 0.0859283 ],       [-0.93246019,-0.61227975,-0.87573005, 1.4124276]]) in []: Np.where (arr>0,"+","-") out[48]:array ([['-','-','+','-'],       ['-','-','-','-'],       ['-','-','-','+'],       ['-','-','-','+']], dtype='<u1')

Where can also implement multi-condition operations

In [Wuyi]: Np.where (Cond1 &cond2, 0, Np.where (cond1,1,np.where (cond2,2,3)))# similar to li=  [] for in Zip (cond1,cond2):    if and y:        li.append (0)     elif x:        li.append (1)    elif  y:        li.append (2)     Else:        li.append (3)

Mathematical and statistical methods

Sum, mean, and STD can be called either as an array method or as a top-level function of numpy.

in [[+]: arr = Np.arange (). Reshape (3,5) in [+]: arrout[]:array ([[0,  1,  2,  3,  4],       5,  6,  7,  8,  9],       [Ten, One, a, a.]]) # method call as Array In [All]: Arr.sum () out[[]: [+]:arr.mean () out[[]: 7.0#  Top method call in [NumPy ]: Np.mean (arr) out[68]: 7.0

Functions such as mean and sum can accept a parameter that calculates the statistical value of the axis upward, and the end result is an array of one dimension less

in []: arr = np.arange. Reshape (3,4,5) in [70]: arrout[70]:array ( [[[[0],1, 2, 3, 4],        [ 5, 6, 7, 8, 9],        [10, 11, 12, 13, 14],        [15, 16, 17, 18, 19]],       [[20, 21, 22, 23, 24],        [25, 26, 27, 28, 29],        [30, 31, 32, 33, 34],        [35, 36, 37, 38, 39]],       [[40, 41, 42, 43, 44],        [45, 46, 47, 48, 49],        [50, 51, 52, 53, 54],        [55, 56, 57, 58, 59]] ) in [[]: arr.sum (axis = 1)#the value of the parameter is the index of shape, and I don't know what shape can go to see NumPy basics that blogout[71]:array ([[30, 34, 38, 42, 46],       [110, 114, 118, 122, 126],       [190, 194, 198, 202, 206]])

SUM (Axis=1) aggregates the array of the specified dimension to sum

Other methods, such as Cumsum and Cumprod, do not converge, but instead produce an array of intermediate results:

in []: arr = Np.array ([[0,1,2],[3,4,5],[6,7,8]]) in [73]: arrout[73]:array ([[0,1, 2],       [3, 4, 5],       [6, 7, 8]]) in [74]: arr.cumsum () out["The": Array ([0, 1, 3, 6, ten, [+], dtype=,int32) in [75]: arr.cumsum (0) out[75]:array ([[0,1, 2],       [ 3, 5, 7],       [ 9, [dtype=]],int32) in [: Arr.cumsum (1) out[76]:array ([[0,1, 3],       [ 3, 7, 12],       [ 6, dtype=)],int32) in [[Arr.cumprod]: 1) out[77]:array ([[0, 0, 0], [3, 12, 60],       [  6, 336]], Dtype=int32)

Use as a top-level function

In [Max]: np.cumsum (arr) out[: Array ([0,  1,  3, 6, ten, 79,,  ], Dtype=int32) in [ ]: np.cumsum (Arr,axis =0) out[]:array ([[0,  1,  2],       3,  5,  7 ],       9, [[]], Dtype=int32)

Methods for Boolean arrays: Sum, any, and all

In [the]: Bools = Np.array ([True,false,true,true,false]) in [the]: bools.sum () out[: 3 in[84 ]:in [[+]: Bools.any () out[:Truein []: Bools.all () out[]: false# top function in [Np.all]: bools [out[]:Falsein []: np.sum (bools) out[[3][88 ]:

Sort

method is basically the same as a Python list

In [the]: arr = np.random.randn (8) in [94]: arrout[94]:array ([-2.97429771,  0.37645009,- 0.04291609, -0.61994895, -0.26251303,       -1.1557209, -0.19910847, -0.11393288]) in [+]: Arr.sort () in []: arrout[]:array ([-2.97429771,-1.1557209,-0.61994895,-0.26251303,- 0.19910847,       -0.11393288, -0.04291609,  0.37645009])

For multidimensional arrays, you can specify the axis parameter, which is used for any one axis to sort up

In [the]: arr = np.random.randn (4,5) in [98]: arrout[98]:array ([[-0.78510617,-0.02370449,-0.12615757,-0.15039283,-1.00503264],       [ 0.24344011,-1.91231612, 0.80572501,-0.6740432,-1.62471378],       [-0.09096377, 1.79134715,-0.28566318,-0.8119145,-0.20454602],       [ 0.02648784, 0.57795444,-0.53447708,-0.74497177,-0.04684859]]) in [[Arr.sort]: (1) in [100]: arrout[100]:array ([[-1.00503264,-0.78510617,-0.15039283,-0.12615757,-0.02370449],       [-1.91231612,-1.62471378,-0.6740432, 0.24344011, 0.80572501],       [-0.8119145,-0.28566318,-0.20454602,-0.09096377, 1.79134715],       [-0.74497177,-0.53447708,-0.04684859, 0.02648784, 0.57795444]]) in [101]: arr = np.random.randn (4,5) in [102]: arrout[102]:array ([[-0.99257127, 0.36384095, 1.14265096, 0.23094948, 1.42900315],       [ 0.07606583, 1.53456921, 1.15069057,-0.78014895, 0.24934741],       [ 0.63191444, 0.23237672, 0.4590821, 0.01904812, 1.63680472],       [-1.24936364,-0.44730791,-0.30612594,-1.05307121, 1.28685507]]) in [103]: Arr.sort (0) in [104]: arrout[104]:array ([[-1.24936364,-0.44730791,-0.30612594,-1.05307121,-0.24934741],       [-0.99257127, 0.23237672, 0.4590821,-0.78014895, 1.28685507],       [ 0.07606583, 0.36384095, 1.14265096, 0.01904812, 1.42900315],       [ 0.63191444, 1.53456921, 1.15069057, 0.23094948, 1.63680472]])

It is important to note that the top-level sort function returns the array to the sorted copy, while the in-place sort modifies the array itself.

In [the]: arr = np.random.randn (4,5) in [106]: Arr_repeat=np.sort (Arr,axis =1) in [107]: arr_repeatout[107]:array ([[-0.64056336, 0.14082859, 0.44317426, 0.60988308, 0.77472024],       [-1.63521891, 0.39869871, 0.55635461, 0.58039867, 0.59073797],       [-1.62714899,-0.66642289,-0.16457651, 0.09046719, 0.5139126 ],       [-0.79493979, 0.12287039, 0.50570075, 1.08870126, 1.34838367]]) in [108]: arrout[108]:array ([[0.60988308, 0.44317426, 0.14082859, 0.77472024, 0.64056336],       [ 0.59073797, 0.55635461, 0.58039867,-1.63521891, 0.39869871],       [-0.16457651,-1.62714899,-0.66642289, 0.5139126, 0.09046719],       [ 0.50570075, 1.34838367, 0.12287039, 1.08870126,-0.79493979]])

Sort also has two parameters kind and order,kind are algorithms for specifying sorting, default is fast, and heap sort and merge Sort "Quicksort,mergesort,heapsort". Order: A string or list that can be set to sort by a property

ImportNumPy as NP>>> Dtype = [('Name','S10'), ('Height', float), (' Age', int)]>>> values = [('Li', 1.8, 41), ('Wang', 1.9, 38), ('Duan', 1.7, 38)]>>> a = Np.array (values, dtype=Dtype)>>> Np.sort (A, order='Height')#Sort by property height, at which time the argument is a stringArray ([('Duan', 1.7, 38), ('Li', 1.8, 41), ('Wang', 1.9, 38)], Dtype=[('Name','| S10'), ('Height','<f8'), (' Age','<i4')])>>> Np.sort (A, order=[' Age','Height']) #Sort by attribute age first, if Age equals, and then by height, when the argument is a listArray ([('Duan', 1.7, 38), ('Wang', 1.9, 38), ('Li', 1.8, 41)], Dtype=[('Name','| S10'), ('Height','<f8'), (' Age','<i4')])

Uniqueness and some other sets of logical operations

The uniqueness is actually to go heavy. Ufunc is Numpy.unique ()

In [119]: My_list = Np.array ([1,3,4,6,7,4,3,1,2]) in [+]: Np.unique (my_list) out[]: Array ([1, 2, 3 , 4, 6, 7])

Note: The array itself does not have a unique method.

Aggregate functions of NumPy

NumPy using arrays for data processing

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More