Python Data Analysis Basics Tutorial: NumPy Learning Guide __python

Source: Internet
Author: User
Tags arithmetic arithmetic operators diff floor function square root unpack python list
Chapter II NumPy Foundation 2.6 Changing the dimension of an array

Ravel (), flatten () flatten multidimensional arrays

B.transpose () matrix transpose, equivalent to B. T, one-dimensional arrays are invariant

reshape () changing the array dimension 2.8 Combined Array

Hstack ((A, b)) horizontal combination, equivalent to concatenate ((A, B), Axis=1)
Vstack ((A, b)) vertically combined, equivalent to concatenate ((A, B), axis=0)

Column_stack ((A, B)) column combinations, two dimensions equivalent to Hstack
Row_stack ((A, b)) row combination, two dimensional equivalence and vstack
2.10 split array

In:a out
:
Array ([[0, 1, 2],
      [3, 4, 5],
      [6, 7, 8]])
In:hsplit (A, 3)    #水平分割, equivalent to split (a,3, Axis=1) out
:
[[[0],
        [3],
        [6]]),
array ([[1],
        [4],
        [7]]),
array ([[ 2],
        [5],
        [8]])]

Vsplit (a,3) split vertically, equivalent to split (a,3,axis=0) 2.11 Array Properties

Ndim array dimension, or number of array axes

Size array Elements Total

ItemSize the number of bytes of an array element in memory

Nbytes array storage = itemsize * Size

b = Array ([1.J + 1, 2.J + 3]) imaginary numbers
Real part B.imag imaginary part of B.real complex array

The Flat property returns a Numpy.flatiter object that allows us to iterate over any multidimensional array like a one-dimensional array.

In:b = Arange (4). Reshape (2,2)
in:b out
:
Array ([[0, 1],
      [2, 3]])
in:f = B.flat
in:f
   out: <numpy.flatiter object at 0x103013e00>
in:for item in F:print item
...:
2.12 Array Conversions

The ToList function converts the numpy array into a python list.

In:b
Out:array ([1.+1.J, 3.+2.J])
in:b.tolist () out
: [(1+1j), (3+2J)]

The Astype function can specify the data type when converting an array.

In:b
Out:array ([1.+1.J, 3.+2.J])
in:b.astype (int)
/usr/local/bin/ipython:1: complexwarning:casting Complex discards the imaginary part #虚部丢失, the conversion to B.astype (' complex ') does not occur.

#!/usr/bin/python
Out:array ([1, 3])
3.2 Read and write files

Savetxt

Import NumPy as np
i2 = Np.eye (2)
np.savetxt ("Eye.txt", I2)
3.4 Read into CSV file
# aapl,28-01-2011, 344.17,344.4,333.53,336.1,21144800

c,v=np.loadtxt (' data.csv ', delimiter= ', ', usecols= (6,7 ), Unpack=true) #index从0开始
3.6.1 arithmetic mean value

Np.mean (c) = Np.average (c) 3.6.2 Weighted Average value

t = Np.arange (len (c))
Np.average (c, weights=t)
3.8 Extreme value

Np.min (c)
Np.max (c)

NP.PTP (c) difference between the maximum and the minimum value 3.10 Statistical analysis

Median of Np.median (c)
Np.msort (c) Ascending sort
Np.var (c) Variance
3.12 Analysis of stock return rate

Np.diff (c) can return a difference from an adjacent array element
Array of values

returns = Np.diff (arr)/arr[:-1]  #diff返回的数组比收盘价数组少一个元素

NP.STD (c) Standard deviation

Logarithmic rate of return

Logreturns = Np.diff (Np.log (c))  #应检查输入数组以确保其不含有零和负数

Where you can return all the numbers that meet the criteria based on the criteria you specify
The index value of the group element.
Posretindices = Np.where (returns > 0)

np.sqrt (1./252.) square root, floating-point number 3.14 Analysis Date Data

# aapl,28-01-2011, 344.17,344.4,333.53,336.1,21144800

dates, close=np.loadtxt (' data.csv ', delimiter= ', ', ', Usecols= (1,6), converters={1:datestr2num}, Unpack=true)
print "dates =", Dates

def datestr2num (s):
    Return Datetime.datetime.strptime (S, "%d-%m-%y"). Date (). Weekday ()
# Monday 0
# Tuesday 1
# Wednesday 2
# Thursday 3< c14/># Friday 4
# Saturday 5
# Sunday 6

#output
dates = [4. 0.1. 2.3. 4.0. 1.2. 3.4. 0.1. 2.3. 4.1. 2.4. 0.1. 2.3. 4.0.
1.2. 3.4.]
Averages = Np.zeros (5) for
I in range (5):
    indices = np.where (dates = i)
    prices = Np.take (Close, indices) 
    #按数组的元素运算, produces an array as output.

>>> A = [4, 3, 5, 7, 6, 8]
>>> indices = [0, 1, 4]
>>> Np.take (A, indices) 
   
    array ([4, 3, 6])
   

Np.argmax (c) #返回的是数组中最大元素的索引值
Np.argmin (c)
3.16 Summary data

  # aapl,28-01-2011, 344.17,344.4,333.53,336.1,21144800 #得到第一个星期一和最后一个星期五 first_monday = Np.ravel (np.where (dates = 0)) [0] Last_friday = np.ravel (np.where (dates = = 4)) [-1] #创建一个数组, for storing index values for each day in three weeks weeks_indices = Np.arange (First_monday, Last_friday + 1) #按照每个子数组5个元素, with the split function to cut the fractional group weeks_indices = Np.split (weeks_indices, 5) #output [ Array ([1, 2, 3, 4, 5]), Array ([6, 7, 8, 9, ten]), array ([11,12, M,])] Weeksummary = Np.apply_along_axis (summarize , 1, Weeks_indices,open, high, Low, close) def summarize (A, O, H, L, c): #open, High, low, close Monday_open = o[a[0]] We Ek_high = Np.max (Np.take (H, a)) Week_low = Np.min (Np.take (L, a)) Friday_close = C[a[-1]] Return ("APPL", Monday_open, W Eek_high, Week_low, Friday_close) np.savetxt ("Weeksummary.csv", Weeksummary, delimiter= ",", fmt= "%s") #指定了文件名, The name of the array to save, the delimiter (in this case, the comma in English punctuation), and the format for storing floating-point numbers. 

The format string begins with a percent semicolon. Next is an optional flag character:-Indicates the result is left-aligned, and 0 indicates that the left complement 0,+ represents the output symbol (plus + or minus sign-). The third part is an optional output width parameter that represents the minimum number of digits for the output. The forth part is the precision format character, with the "." The beginning, followed by an integer representing the precision. Finally, a type specifies the character, which is specified as a string type in the example.

Numpy.apply_along_axis (func1d, axis, arr, *args, **kwargs)

>>> def My_func (a):
... ""     Average and last element of a 1-d array ""
...     Return (A[0] + a[-1]) * 0.5
>>> b = Np.array ([[[1,2,3], [4,5,6], [7,8,9]])
>>> Np.apply_along_ax Is (My_func, 0, b)  #沿着X轴运动, taking column slice
array ([4.,  5.,  6.])
>>> Np.apply_along_axis (My_func, 1, b)  #沿着y轴运动, fetch row slice
Array ([2,  5,  8.])


>>> B = Np.array ([[[8,1,7], [4,3,9], [5,2,6]])
>>> Np.apply_along_axis (sorted, 1, b)
Array ( [[1, 7, 8],
       [3, 4, 9],
       [2, 5, 6]]
3.20 Calculate simple moving average line

(1) Use the ones function to create a length of n elements are initialized to 1 of the array, and then the entire array divided by N, you can get the weight. As shown below:

n = Int (sys.argv[1])
weights = Np.ones (n)/n
print "weights", weights

at n = 5 o'clock, the output is as follows:

Weights [0.2 0.2 0.2 0.2 0.2]  #权重相等

(2) Using these weights, call the Convolve function:

c = np.loadtxt (' data.csv ', delimiter= ', ', usecols= (6,), unpack=true)
SMA = Np.convolve (weights, c) [n-1:-n+1]   # Convolution is an important operation in analytic mathematics, defined as an integral of the product of a function and another function that has been reversed and moved.

t = np.arange (N-1, Len (c))   #作图
plot (T, c[n-1:], lw=1.0)
plot (T, SMA, lw=2.0) show
()
3.22 Calculating exponential moving average line

Exponential moving average line (exponential moving average). The weights used by the exponential moving averages are exponential decay. The weights given to historical data points are reduced exponentially, but never reach 0.

x = Np.arange (5)
print "Exp", Np.exp (x)
#output
exp [1.2.71828183 7.3890561 20.08553692 54.59815003]

Linspace returns an array of element values that are evenly distributed within the specified range.

Print "Linspace", Np.linspace ( -1, 0, 5) #起始值, terminating value, optional number of elements
#output 
linspace [-1 -0.75-0.5-0.25 0.]

(1) Weight calculation

N = Int (sys.argv[1])
weights = Np.exp (Np.linspace ( -1., 0., N))

(2) Normalization of weight treatment

Weights/= weights.sum ()
print "weights", weights
#output
weights [0.11405072 0.14644403 0.18803785 0.24144538 0.31002201]

(3) Calculation and drawing

c = np.loadtxt (' data.csv ', delimiter= ', ', usecols= (6,), unpack=true)
ema = Np.convolve (weights, c) [n-1:-n+1]
t = np.arange (N-1, Len (c))
plot (T, c[n-1:], lw=1.0)
plot (T, EMA, lw=2.0) show
()
3.26 predicting prices with linear models
(x, residuals, rank, s) = NP.LINALG.LSTSQ (A, B) #系数向量x, the rank of a residual group, a and the singular value of a of
print X, residuals, rank, s
#计算下一个预测值 
   print Np.dot (b, X)
3.28 Drawing Trend Lines
>>> x = Np.arange (6)
>>> x = X.reshape ((2, 3))
>>> x
Array ([[0, 1, 2],
       [3, 4 , 5]]
>>> np.ones_like (x)   #用1填充数组
Array ([[1, 1, 1],
       [1, 1, 1]])

Similar functions
Zeros_like
Empty_like
Zeros
Ones
Empty
3.30 Array pruning and compression

A = Np.arange (5)
print "A =", a
print "clipped", A.clip (1, 2) #将所有比给定最大值还大的元素全部设为给定的最大值, All elements that are smaller than the given minimum value are all set to the given minimum
#output
a = [0 1 2 3 4]
clipped [1 1 2 2 + 2]
A = Np.arange (4)
print a
print "Compressed", A.compress (A > 2) #返回一个根据给定条件筛选后的数组
#output
[0 1 2 3]
   
    compressed [3]
   
b = Np.arange (1, 9)
print "b =", B
print "factorial", B.prod () #输出数组元素阶乘结果
#output
b = [1 2 3 4 5 6-7 8]
   factorial 40320

print "Factorials", B.cumprod ()
#output
factorials [1 2 6 720 5040 40320] #数组元素 Traversal factorial
4.2 Stock Correlation Analysis

covariance = Np.cov (a,b)

Detailed covariance and covariance matrices

Get diagonal element
Covariance.diagonal ()
4.4 polynomial fitting

Bhp=np.loadtxt (' bhp.csv ', delimiter= ', ', usecols= (6,), unpack=true)
vale=np.loadtxt (' vale.csv ', delimiter= ', ', ", Usecols= (6,), unpack=true)
t = np.arange (len (bhp))
poly = Np.polyfit (t, bhp-vale, int (sys.argv[1))) # sys.argv[ 1] is 3, that is, 3-order polynomial fitting data
print "polynomial fit", poly

#output polynomial fits
[1.11655581e-03-5.28581762e-02 5.80684638e-01 5.79791202e+01]
#预测下个值
print "Next value", Np.polyval (Poly, t[-1] + 1)

Derivation of polynomial function by using Polyder function (in order to minimize the value)

Der = Np.polyder (poly)
print "derivative", der
#output
derivative [0.00334967-0.10571635 0.58068464]

Finding the root of derivative function, that is to find the extremum point of the original polynomial function

Print "Extremas", Np.roots (der)
#output
extremas [24.47820054 7.08205278]

Note: The book suggests that the results of the 3-order polynomial fitting data are not good enough to try the higher order polynomial fitting. 4.6 calculation of obv (on-balance Volume) net turnover

The diff function calculates the difference between two contiguous elements in an array and returns an array of these difference values.
Change = Np.diff (c)

The sign function returns the positive and negative symbols for each element in the array, returns 1 when the array element is negative, returns 1 for positive, or returns 0
Np.sign (change)

Use the piecewise (segmented) function to get the positive or negative of an array element. Call the function with the appropriate return value and corresponding criteria:

Pieces = np.piecewise (Change < 0, change > 0], [-1, 1])
print "pieces", pieces

Check consistency
Np.array_equal (A, B)

Np.vectorize substitution Cycle

>>> def myfunc (A, B):
...     " Return A-b if a>b, otherwise return a+b "
...     If a > B:
...         Return a-b ...     else: ...         Return a + b

>>> vfunc = np.vectorize (myfunc)
>>> Vfunc ([1, 2, 3, 4], 2)
Array ([3, 4, 1, 2])

The Vectorize function is provided primarily to convenience, not for performance. The implementation is essentially a for loop. 4.10 Smoothing Data using the Hanning function

(1) Call the Hanning function to compute the weights and generate a window with a length of n (in this example n take 8)

N = Int (sys.argv[1])
weights = np.hanning (n)
print "weights", weights

#output
weights [0.0.1882551 0.61126047 0.95048443 0.95048443 0.61126047 0.1882551 0.

bhp = np.loadtxt (' bhp.csv ', delimiter= ', ', usecols= (6,), unpack=true)   #某股票数据
bhp_returns = Np.diff ( BHP)/bhp[:-1] #股票收益率计算
smooth_bhp = Np.convolve (Weights/weights.sum (), bhp_returns) [n-1:-n+1]  # Use weights to smooth stock yields

#绘图
t = Np.arange (N-1, Len (bhp_returns))
plot (T, bhp_returns[n-1:], lw=1.0)
Plot (t, smooth_bhp, lw=2.0) show
()

Two polynomial to do the difference operation
Poly_sub = Np.polysub (A, B)

Select function
Numpy.select (Condlist, ChoiceList, default=0)

>>> x = Np.arange (Ten)
>>> condlist = [X<3, x>5]  
#输出两个array [True...false...],[false, .. true]
>>> choicelist = [x, x**2]
>>> np.select (condlist, choicelist)  
Array ([0,  1,  2,  0,  0,  0, 36, 49, 64, 81]

The Trim_zeros function can remove elements at the beginning and end of a one-dimensional array that are 0:
Np.trim_zeros (a) 5.2 create matrix (slightly)
5.4 Create a new matrix from an existing matrix (abbreviated) 5.6 Methods to create common functions (slightly) 5.7 General Functions (abbreviated) 5.8 Method of calling a common function on Add (abbreviated) Division operation of 5.10 Arrays

In NumPy, the basic arithmetic operators +,-and * implicitly associate the common functions add, subtract, and multiply.
In other words, when you use these arithmetic operators with the NumPy array, the corresponding generic function is automatically invoked. Division contains more complex procedures, involving three common functions divide, true_divide, and floor_division, and two corresponding operators/and/or in the division operation of the array.

A = Np.array ([2, 6, 5])
B = Np.array ([1, 2, 3])
print "Divide", Np.divide (A, B), Np.divide (b, a)
#output
D Ivide [2 3 1] [0 0 0]

print "True Divide", Np.true_divide (A, B), Np.true_divide (b, a)
#output
True Divide [2. 3.1.66666667] [0.5 0.33333333 0.6]



print "Floor Divide", Np.floor_divide (A, B), Np.floor_divide (b, a) c = 3.14 * b
    #floor_divide函数总是返回整数结果, which is equivalent to calling the divide function before calling the floor function.
print "Floor Divide 2", Np.floor_divide (c, B), np.floor_divide (b, c) #floor函数将对浮点数进行向下取整并返回整数.

#output
Floor Divide [

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.