Common functions of Python3numpy 1. TXT file
(1) The unit matrix, that is, the elements on the main diagonal are 1, the remaining elements are 0 square matrices.
In NumPy, you can create a two-dimensional array with the eye function, and we just need to give a parameter that specifies the number of elements in the matrix of 1.
For example, create a 3x3 array:
import numpy as npI2 = np.eye(3)print(I2)
[[1. 0. 0.] [0. 1. 0.] [0. 0. 1.]]
(2) using the Savetxt function to store the data in a file, of course we need to specify the file name and the array to save.
np.savetxt(‘eye.txt‘, I2)#创建一个eye.txt文件,用于保存I2的数据
2. csv file
- CSV (comma-separated value, comma-separated value) format is a common file format;
- Typically, the dump file of a database is in CSV format, and each field in the file corresponds to a column in a database table;
- spreadsheet software, such as Microsoft Excel, can handle CSV files.
Note: The Loadtxt function in NumPy makes it easy to read CSV files, automatically slice fields, and load data into numpy arrays
Data.csv's data content:
c, v = np.loadtxt(‘data.csv‘, delimiter=‘,‘, usecols=(6,7), unpack=True)# usecols的参数为一个元组,以获取第7字段至第8字段的数据# unpack参数设置为True,意思是分拆存储不同列的数据,即分别将收盘价和成交量的数组赋值给变量c和v
print(c)
[336.1 339.32 345.03 344.32 343.44 346.5 351.88 355.2 358.16 354.54 356.85 359.18 359.9 363.13 358.3 350.56 338.61 342.62 342.88 348.16 353.21 349.31 352.12 359.56 360. 355.36 355.76 352.47 346.67 351.99]
print(v)
[21144800. 13473000. 15236800. 9242600. 14064100. 11494200. 17322100. 13608500. 17240800. 33162400. 13127500. 11086200. 10149000. 17184100. 18949000. 29144500. 31162200. 23994700. 17853500. 13572000. 14395400. 16290300. 21521000. 17885200. 16188000. 19504300. 12718000. 16192700. 18138800. 16824200.]
print(type(c))print(type(v))
<class ‘numpy.ndarray‘><class ‘numpy.ndarray‘>
3. Volume Weighted Average price = average () function
- Vwap Overview:
VWAP (volume-weighted Average Price, Volume weighted average ) is a very important amount of economics,
It represents the "average" price of financial assets.
The higher the volume of a price, the greater the weight of the price.
Vwap is a weighted average calculated with volume as the weight , and is often used for algorithmic trading.
vwap = np.average(c,weights=v)print(‘成交量加权平均价格vwap =‘, vwap)
成交量加权平均价格vwap = 350.5895493532009
4. Arithmetic mean function = mean () function
The mean function in NumPy calculates the arithmetic mean of an array element
print(‘c数组中元素的算数平均值为: {}‘.format(np.mean(c)))
c数组中元素的算数平均值为: 351.0376666666667
5. Time weighted Average price
- Twap Overview:
In economics,TWAP (time-weighted Average Price, time-weighted average) is an indicator of another "average" price. Now that we have calculated the Vwap, let's calculate the Twap. In fact, Twap is just a variant, the basic idea is that the recent price is more important, so we should give a higher weight for the recent price . The simplest method is to use the Arange function to create a sequence of natural numbers that starts from 0, in turn, and the number of natural numbers is the number of close prices. Of course, this is not necessarily the right way to calculate twap.
t = np.arange(len(c))print(‘时间加权平均价格twap=‘, np.average(c, weights=t))
时间加权平均价格twap= 352.4283218390804
6. Maximum value and minimum value
h, l = np.loadtxt(‘data.csv‘, delimiter=‘,‘, usecols=(4,5), unpack=True)print(‘h数据为: \n{}‘.format(h))print(‘-‘*10)print(‘l数据为: \n{}‘.format(l))
h数据为: [344.4 340.04 345.65 345.25 344.24 346.7 353.25 355.52 359. 360. 357.8 359.48 359.97 364.9 360.27 359.5 345.4 344.64 345.15 348.43 355.05 355.72 354.35 359.79 360.29 361.67 357.4 354.76 349.77 352.32]----------l数据为: [333.53 334.3 340.98 343.55 338.55 343.51 347.64 352.15 354.87 348. 353.54 356.71 357.55 360.5 356.52 349.52 337.72 338.61 338.37 344.8 351.12 347.68 348.4 355.92 357.75 351.31 352.25 350.6 344.9 345. ]
print(‘h数据的最大值为: {}‘.format(np.max(h)))print(‘l数据的最小值为: {}‘.format(np.min(l)))
h数据的最大值为: 364.9l数据的最小值为: 333.53
- There is a PTP function in the numpy to calculate the range of values for the array
- This function returns the difference between the maximum and minimum values of an array element
- In other words, the return value equals Max (array)-min (array)
print(‘h数据的最大值-最小值的差值为: \n{}‘.format(np.ptp(h)))print(‘l数据的最大值-最小值的差值为: \n{}‘.format(np.ptp(l)))
h数据的最大值-最小值的差值为: 24.859999999999957l数据的最大值-最小值的差值为: 26.970000000000027
7. Statistical analysis
- Number of Median:
We can use some thresholds to get rid of outliers, but there is a better way, that is, the median.
The values of each variable are arranged in order of size, forming a sequence, and the number in the middle of the series is the median.
For example, we have 5 values of 1, 2, 3, 4, 5, then the median is the middle digit 3.
m = np.loadtxt(‘data.csv‘, delimiter=‘,‘, usecols=(6,), unpack=True)print(‘m数据中的中位数为: {}‘.format(np.median(m)))
m数据中的中位数为: 352.055
# 数组排序后,查找中位数sorted_m = np.msort(m)print(‘m数据排序: \n{}‘.format(sorted_m))N = len(c)print(‘m数据中的中位数为: {}‘.format((sorted_m[N//2]+sorted_m[(N-1)//2])/2))
m数据排序: [336.1 338.61 339.32 342.62 342.88 343.44 344.32 345.03 346.5 346.67 348.16 349.31 350.56 351.88 351.99 352.12 352.47 353.21 354.54 355.2 355.36 355.76 356.85 358.16 358.3 359.18 359.56 359.9 360. 363.13]m数据中的中位数为: 352.055
- Variance:
Variance is the value of the sum of squares of the difference between each data and the arithmetic mean of all data, divided by the number of data.
print(‘variance =‘, np.var(m))
variance = 50.126517888888884
var_hand = np.mean((m-m.mean())**2)print(‘var =‘, var_hand)
var = 50.126517888888884
Note: The difference between the sample variance and the population variance is calculated. The total variance is to remove the squared sum of deviations with the number of data, while the sample variance is the number of sample data minus 1 to remove the squared sum of deviations, where the number of sample data minus 1 (i.e. n-1) is called degrees of freedom. The reason for this difference is to ensure that the sample variance is an unbiased estimator.
8. Stock return rate
In academic literature, the analysis of close price is often based on stock return rate and logarithmic yield.
The simple yield refers to the rate of change between two adjacent prices, while the logarithmic rate of return is the difference between the 22 of the logarithm and the value of all prices.
We learned the knowledge of logarithms in high school, and the logarithm of "a" minus "B" is equal to the logarithm of "a divided by B". Therefore, the logarithmic rate of return can also be used to measure the rate of change in price.
Note that because the yield is a ratio, for example, we divide the dollar by the dollar (or other currency units), so it is dimensionless.
In short, investors are most interested in the variance of the yield or the standard deviation, as this represents the size of the investment risk.
(1) First, let's calculate the simple rate of return. The diff function in NumPy can return an array of the difference values of adjacent array elements. This is somewhat analogous to the differential in calculus. To calculate the yield, we also need to divide the difference by the price of the preceding day. Note, however, that the diff returns an array with fewer elements than the closing price array. returns = Np.diff (arr)/arr[:-1]
Notice that we did not divide the last value in the closing price array. Next, use the STD function to calculate the standard deviation:
Print ("Standard deviation =", np.std (returns))
(2) The logarithmic rate of return is even simpler to calculate. We first use the log function to get the logarithm of each closing price, and then the diff function is used for the result.
Logreturns = Np.diff (Np.log (c))
In general, we should check the input array to ensure that it does not contain 0 and negative numbers. Otherwise, you will get an error prompt. In our case, however, the stock price is always positive, so you can omit the check.
(3) We are likely to be very interested in which trading days yields are positive.
After we have completed the previous steps, we just need to use the WHERE function to do this. The WHERE function returns the index value of all array elements that satisfy the condition, based on the specified criteria.
Enter the following code:
Posretindices = Np.where (returns > 0)
Print "Indices with positive returns", Posretindices
To output the index of all positive elements in the array.
Indices with positive Returns (array ([0, 1, 4, 5, 6, 7, 9, 10, 11, 12, 16, 17, 18, 19, 21, 22, 23, 25, 28]),)
(4) In investment studies, volatility (volatility) is a measure of price movement. Historical volatility can be calculated based on historical price data. Logarithmic rate of return is required when calculating historical volatility, such as annual volatility or monthly volatility. The annual volatility equals the standard deviation of the logarithmic yield divided by its mean, divided by the square root of the reciprocal of the trading day, usually taking 252 days on the trading day.
With the STD and mean functions, the code looks like this:
annual_volatility = NP.STD (logreturns)/np.mean (logreturns)
annual_volatility = Annual_volatility/np.sqrt (1./252.)
(5) The division operation in the SQRT function. In Python, the division of integers and the division of floating-point numbers are different (Python3 has modified this feature), and we must use floating-point numbers to get the correct results. Similar to the method for calculating annual volatility, the calculated monthly volatility is as follows:
Annual_volatility * NP.SQRT (1./12.)
c = np.loadtxt(‘data.csv‘, delimiter=‘,‘, usecols=(6,), unpack=True)returns = np.diff(c)/c[:-1]print(‘returns的标准差: {}‘.format(np.std(returns)))logreturns = np.diff(np.log(c))posretindices = np.where(returns>0)print(‘retruns中元素为正数的位置: \n{}‘.format(posretindices))annual_volatility = np.std(logreturns)/np.mean(logreturns)annual_volatility = annual_volatility/np.sqrt(1/252)print(‘每年波动率: {}‘.format(annual_volatility))print(‘每月波动率:{}‘.format(annual_volatility*np.sqrt(1/12)))
returns的标准差: 0.012922134436826306retruns中元素为正数的位置: (array([ 0, 1, 4, 5, 6, 7, 9, 10, 11, 12, 16, 17, 18, 19, 21, 22, 23, 25, 28], dtype=int64),)每年波动率: 129.27478991115132每月波动率:37.318417377317765
This article refers to the basic Python data Analysis Tutorial: NumPy Learning Guide
Common functions of Python3numpy