Data Analysis Learning Notes (vii)--STOCK price analysis

Source: Internet
Author: User
Tags arithmetic diff square root unpack stock prices

This example, through numpy analysis of stock price CSV file reading and writing

CSV (comma-separated value, comma separated values) is a common file format, usually the database is the file is a CSV format, each field in the file corresponds to the columns in the database table.

Here is a file in CSV format, this article is an example of this file data.
Data structure as follows diagram

Each data corresponds to the first listed stock code to identify the stock (the Apple stock Code is AAPL), the second column is in the DD-MM-YYYY format, and the third column is empty, followed by the opening price, the highest, the lowest and the closing price, and the last is the volume of the day.

We can read and write through the NumPy LoadText () and Savetext () functions.

# reading data from File
# ' data.csv ': filename
# delimiter: Separator
# usecols: Column subscript # Unpack of data taken:
return array
c,v = Np.loadtxt (' Data.csv ', delimiter= ', ', usecols= (6,7), unpack=true)
# Read the closing price and volume data
Write
x = y = z = Np.arange (0.0,5.0,1.0)
np.savetxt (' Test.csv ', X, delimiter= ', ')   # x Array
np.savetxt (' Test.csv ', (x, Y,z))   # x,y,z is a one-dimensional array
np.savetxt (' Test.csv ', X, fmt= '%1.4e ')   # using exponential notation
Volume weighted Average price

Read volume data and closing price data first

C,v = Np.loadtxt (' data.csv ', delimiter= ', ', usecols= (6,7), unpack=true)

# Calculates the average price of the volume
Vwap = Np.average (c, WEIGHTS=V)
V_mean = Np.average (c)
print (' arithmetic average: {}, Weighted average: {} '. Format (V_mean, vwap))
' Arithmetic average: 351.0376666666667, weighted average: 350.5895493532009 ' '

# time Weighted average price
t = Np.arange (len (c))
Twap = Np.average (c, weights=t)
print (' time weighted average: {} '. Format (twap) ')
' time Weighted average: 352.4283218390804 '
maximum value and minimum value

Find the highest price and lowest price

# read the highest and lowest price
h,l = np.loadtxt (' data.csv ', delimiter= ', ', usecols= (4,5), unpack=true)
# Maximum value in the highest price
H.max ()     # 364.9
# Minimum value
l.min ()     # 333.53

# Interval difference: NP.PTP ()
H_PTP = NP.PTP (h) L_PTP
= NP.PTP (l)
print (' The difference for the highest price is: {:. 2f} '. Format (H_PTP))
print (' The difference of the lowest price is: {:. 2f} '. Format (L_PTP))
' " The difference of the highest price is: 24.86
the difference of the lowest price is: 26.97 ""
Simple statistical analysis

Statistical analysis of the closing price

c = np.loadtxt (' data.csv ', delimiter= ', ', usecols= (6,), unpack=true)
print (' raw data: {} '. Format (c))
'
Raw data: [336.1  339.32 345.03 344.32 343.44 346.5  351.88 355.2 358.16 354.54 356.85 359.18 359.9 363.13 358.3 350.56 338.61 342.62 342.88 348.16 353.21 349.31 352.12 359.56
 .   355.36 355.76 352.47 346.67 351.99] ' # median
print (the median
: {} '. Format (Np.median (c)))
' median: 352.055 ' '
# verify
sorted = Np.sort (c)
print (' sorted data is: {} '. Format (sorted))
'
[336.1  338.61 339.32 342.62 342.88 343.44 344.32 345.03 346.5  346.67
 348.16 349.31 350.56 351.88 351.99 352.12 352.47 353.21 3 54.54 355.2
 355.36 355.76 356.85 358.16-358.3 359.18 359.56 359.9  .   363.13] '
# can be seen in the median number of [351.99 352.12] the mean

# variance
print (' The variance of the stock price is: {} '. Format (Np.var (c)))
' The variance of the stock price is: 50.126517888888884 ""
Stock return rateStock return: The difference/closing price of the closing price
# diff () minus the difference in the previous item, the number will be one less, corresponding to the last closing price returns = Np.diff (c)/c[:-1] Print (' Yield: {} '. Format (returns)) ' [0.0095804  8 0.01682777-0.00205779-0.00255576 0.00890985 0.0155267 0.00943503 0.00833333-0.01010721 0.00651548 0.00652935  0.00200457 0.00897472-0.01330102-0.02160201-0.03408832 0.01184253 0.00075886 0.01539897 0.01450483-0.01104159 0.00804443 0.02112916 0.00122372-0.01288889 0.00112562-0.00924781-0.0164553 0.01534601] ' print (' standard deviation of yield: {} '.                                                      
Format (NP.STD (returns)) ' rate of return standard deviation: 0.012922134436826306 ' ' # Calculate positive yield data                            
Posretindices = Np.where (returns>0) Returns_value = returns[posretindices]  Print (' The positive position is: {}, Data is: {} '. Format (posretindices, Returns_value) ') ' The return position is positive: (Array ([0, 1, 4,  5, 6, 7, 9, 10, 11, 12, 16, 17, 18, 19, 21, 22, 23, 25, 28], data for: [0.00958048 0.01682777 0.00890985 0.0155267
 0.00943503 0.008333330.00651548 0.00652935 0.00200457 0.00897472 0.01184253 0.00075886 0.01539897 0.01450483 0.00804443 0.02112916 0.00122372
 0.00112562 0.01534601] ""
Rate of return volatility

Yield volatility is a measure of the uncertainty of asset yield, which is used to reflect the risk level of financial assets. The higher volatility, the more volatility of financial asset prices, the greater the uncertainty of asset yield; the lower volatility, the smoother the volatility of financial asset prices, the greater the certainty of asset yield.

Formula: The standard deviation of the logarithmic yield/average of the logarithmic yield/number of days of the reciprocal square root of the transaction

# logarithmic rate of return, note minus 0 of the data because 0 does not have logarithmic                   
logreturns = Np.diff (Np.log (c))            
print (' Logarithmic rate of return: {} '. Format (logreturns)) 
# Remove weekend days for 252                                                                                       
annual_volatility = NP.STD (logreturns)/Np.mean (logreturns)/Np.sqrt (1./252.)                
Print (' Annual return volatility is: {} '. Format (annual_volatility)   
') ' annual return volatility is: 129.27478991115132 '                                             
month_volatility = Annual_volatility * NP.SQRT (1./12.)                                         
Print (' monthly rate of return volatility: {} '. Format (month_volatility) ')   
' monthly rate of return volatility: 37.318417377317765 '                                          
Data Rollup

Analysis by date

We calculate the closing price and the average price according to the weekly analysis of the week.

First, in order to facilitate processing, we need to convert the date format in the data to the data index form

From datetime import datetime                                     
' '                                                               
monday:0                                                          
tuesday:1                                                         
wendesday:2                                                       
thursday:3                                                        
friday:4                                                          
Saturday:5                                                        
sunday:6                                                          
' '                                                               
# date conversion function                                                          
def datestr2num (s):                                               
    s = S.decode (' utf-8 ')  # Convert bytes to str type return                        
    datetime.strptime (S, '%d-%m-%y '). Date (). Weekday ()      

Read date and closing price data

# converters, map data, here is the data on the 1th column through the function datestr2num processing                                                                
dates,colse = np.loadtxt (' data.csv ', delimiter= ', ', usecols= ( 1, 6), converters={1:datestr2num}, Unpack=true)    
print (' dates:{} '. Format (dates))    
' dates:[4.0. 1.2. 3.4. 0.1. 2.3. 4.0. 1.2. 3.4. 1.2. 3.4. 0.1. 2.3.
 4.0. 1.2. 3.4.] "                                                                              
" # The number 0-4 corresponds respectively Monday to Friday                                                                                                          

Analysis by date

# Create 5-day Array averages = Np.zeros (5)                                                                                
    For I in range (5):                                                                                
    # Separate the days of the week                                                                   
    Indexs = Np.where (dates==i) # take out the corresponding price prices = Np.ta                                                                
    Ke (Colse, indices=indexs) # Np.take (Colse, Indices=indexs) equivalent to Colse[indexs] avg = Np.mean (prices) # to calculate the mean value Print (' Day {} prices:{} average:{:.2f} '. Format (i, Prices, AV g)) Averages[i] = avg # replace data ' ' Day 0 prices:[[339.32 351.88 359.18 353.21 35 5.36]] average:351.79 DaY 1 prices:[[345.03 355.2 359.9 338.61 349.31 355.76]] average:350.64 Day 2 prices:[[344.32 358.16 363.13 342.62 352.12 352.47]] average:352.14 Day 3 prices:[[343.44 354.54 358.3 342.88 359.56 346.67]] average:350.90 Day 4 prices:[[336.1 34   6.5 356.85 350.56 348.16 360.         

 351.99]] average:350.02 '

By the way, calculate the highest and lowest average in 5 days

Print (' The highest value is: {:. 2f}, the week {} '. Format (Np.max (averages), Np.argmax (averages)))   
print (' The lowest mean is: {:. 2f}, the week {} '. Format (np.min (averages), np.argmin (averages))   

' The highest mean is: 352.14, the Week 2
the lowest mean is: 350.02, the Week 4
'''
Week Summary

Sometimes, we want to make a statistical summary by week, for example: I want to see the first three weeks, the Monday open, the closing price of Friday, the highest and lowest prices in a week.

The first is still to get the data we need

Dates, open, high, low, close = Np.loadtxt (' data.csv ', delimiter= ', ', usecols= (1, 3, 4, 5, 6), Converters={1:datestr2num}, Unpack=true)   

The index is then created for the first three weeks, corresponding to the three-week stock name, the Monday opening price, the highest price of the week, the lowest of the week, and the Friday closing

# split () can divide the array equally
weeks_indices = Np.split (np.arange (0), 3)  
print (' Weeks indices: {} '. Format (weeks_ indices)) 
Weeks indices: [Array ([0, 1, 2, 3, 4]), Array ([5, 6, 7, 8, 9]), Array ([10, 11, 12, 13, 14])]

Here is a brief introduction to the Apply_along_axis () function

Numpy.apply_along_axis (func, axis, arr, *args, **kwargs)
Required Parameters: Func,axis,arr. Where Func is our custom function, the arr in the function func (arr) is an array, and the function's main function is to transform every element in the array to get the result of the target.
Axis represents a function func an array of arr axes, 1: transverse, 0: Vertical
Optional parameters: *args, **kwargs. Are all func () function extra parameters
Return value: the Numpy.apply_along_axis () function returns an array based on the Func () function and the dimension axis operation

Example:

Def My_func (a): Return                                                
    (A[0] + a[-1]) * 0.5                                

B=np.array ([[[1,2,3,4],[5,6,7,8],[9,10,11,12]])                 

res = Np.apply_ Along_axis (My_func, 0, b)   # vertical                    

print (res)  # [5. 6.7. 8.]                                    

res = Np.apply_along_axis (My_func, 1, b)    # transverse                      

print (res)  # [2.5  6.5 10.5]                                 

Before we go back, we'll set a summarize () function to calculate the weekly totals

def summarize (A, O, H, L, c): Monday_open = o[a[0]]  # Monday Opening Week_high = Np.max (Np.take (H, a)) # The highest price in a week Week_low                     
    = Np.min (Np.take (L, a)) # lowest Price in one week Friday_close = c[a[-1]] # Friday closing Return ("APPL", Monday_open, Week_high, Week_low, Friday_close) # invokes the Apply_along_axis () function, passing in the correlation parameter weeksummary = NP 
. Apply_along_axis (summarize, 1, weeks_indices, open, high, low, close) print (' Week summary: {} '. Format (weeksummary)) ' Week Summary: [[' APPL ' ' 344.17 ' 345.65 ' ' 333.53 ' ' 343.44 '] [' APPL ' ' 343.61 ' ' 360.0 '] ' 343.51 ' ' 354.54 '] [' APPL ' 3                                                                             
54.75 ' 364.9 ' 353.54 ' 358.3 ']] ' # to save processed data to local Np.savetxt (' Weeksummary.csv ', weeksummary, delimiter= ', ', fmt= '%s ') 

Under the Engineering directory, locate the Weeksummary.csv file and open

true fluctuation amplitude mean (ATR) ATR Introduction

1, the true fluctuation amplitude mean (ATR) is Wells Wang Lede (J. Welles Wilder), this indicator is mainly used to measure the volatility of securities prices, so this technical indicator does not directly reflect price trends and trend stability, but only to indicate the extent of price fluctuations.

2. Extreme high ATR or low ATR values can be viewed as a reversal of the price trend or the beginning of the next trend. The lower ATR shows a quieter market atmosphere, while the higher ATR represents a more robust trading climate. A long period of low ATR is likely to indicate that the market is accumulating strength and gradually starting the next price trend, while a very high ATR is usually caused by a sharp rise or fall in prices over a short period of time, which is usually unlikely to remain high for long.

3, traders can also use ATR to set their own trading stop and win price. Since ATR calculates the true range of currency pairs within a certain period of time, the range can be used as a criterion for calculating stops and wins. TR: True fluctuation amplitude

Tr=∣ Best price-Lowest price ∣ and ∣ highest-last received ∣ and ∣ yesterday-lowest price ∣ maximum
Atr=tr N-Day simple moving average, parameter n set to 14th first we calculate TR

H, l, C = np.loadtxt (' data.csv ', delimiter= ', ', usecols= (4,5,6), unpack=true) # We first compute the TR Truerange = Np.maximum of all data (NP.FA  BS (H-L), Np.fabs (h-c), Np.fabs (c-l)) print (' true fluctuation amplitude: {} '. Format (truerange)) ' Real fluctuation amplitude: [10.87 5.74 4.67 1.7 5.69 3.19    5.61 3.37 4.13 12.
  4.26 2.77 2.42 4.4 3.75 9.98 7.68 6.03 6.78 3.63 3.93 8.04 5.95 3.87 2.54 10.36 5.15 4.16 4.87-7.32]
' Generally speaking, the real fluctuation amplitude mean (ATR) is usually calculated on the basis of 7 or 14 time periods, which can be a period of one day, or a day's daily price, or even a weekly and monthly price. The first ATR is usually the simple arithmetic mean of TR in the first 7 or 14 days of the day to calculate the first ATR: here take the cycle n = 14 # cycle to ATR0 = Np.mean (Truerange[:n]) print (' First ATR is: {} '. for 
Mat (ATR0) "The first ATR for: 5.058571428571428" # Calculates the remaining number of days of ATR # n = Len (truerange)-N Print (' The number of days to calculate ATR is: {} '. Format (n)) "The number of days to calculate ATR is: 16" # Create N number of 0 arrays to store ATR ATR = Np.zeros (N) # so atr[0] is ATR0 atr[0] = ATR0 # The remaining ATR steps are as follows: ' 1.
    Multiply the ATR of the first 14 days by 13 2. Add the value of step one to the new Day's TR 3. Divide step two by the ' for I in range (1, N): atr[i] = (n-1) * Atr[i-1] + truerange[i] Atr[i]/= n print (' atr:{} '. Format (ATR)) "atr:[5.05857143 5.1072449 5.07601312 4.83486933 4.89595009 4.77409651 4.8338039 4.72924648 4.68644316 5.208840
 08 5.14106579 4.97170394 4.78943938 4.76162228 4.68936354 5.06726615] '
Moving Average line: convolve ()Simple moving mean line

A simple moving average is typically used to analyze data on a time series, such as the moving average of the closing price of an N-session stock.

Import Matplotlib.pyplot as PLT 
# read data
c = np.loadtxt (' data.csv ', delimiter= ', ', usecols= (6,), unpack=true)
N = 5

# defines moving windows and weights
weights = np.ones (n)/n
' ' Array ([0.2,  0.2,  0.2,  0.2,  0.2]) ' c11/># calls the convolution function convolve and intercepts the symmetric array from the number and closing price len (c)-n+1 consistent
sample = Np.convolve (weights, c) [N-1:-(N-1)]
# Set the time as the x-axis coordinates
= Np.arange (N-1, Len (c))
# Draws the true closing price trend, the color is red
plt.plot (Times, c[n-1:], linewidth=1.0, Color= ' Red ')
# draws simple moving average movements, colors for green
plt.plot (time, sample, linewidth=1.0, color= ' green ')
Plt.show ()

As shown in the above picture: The Red Line is part of the true closing trend of the stock price, the Green Line part is a simple moving average trend, its relative real value deferred, relatively smooth

On the understanding of Convolve in convolution function convolve:numpy moving average line

Compared with the simple moving average weight, the weight of the exponential moving average line is exponential decay, that is, the weight given by the data with the current time is decreased exponentially.
We use the exp () function to get the exponential value, the Linspace () function to get the uniform array within the specified range

# below gets the weight of the exponential moving average line
N = 5
weights = np.exp (Np.linspace (0, 1, N))
# Find out the proportion of each share
weights = weights/np.sum (Weig HTS)
print (' Weight: {} '. Format (weights))
' is weighted to: [0.11405072 0.14644403 0.18803785 0.24144538 0.31002201] '

At this point, the weight is no longer 0.2, but the more close to the current time weight is larger, conversely the smaller

On the basis of the simple moving average, we add an exponential moving EMA

exponent = Np.convolve (weights, c) [N-1:-(N-1)]
# draws exponential moving average trend, color blue
plt.plot (time, exponent, linewidth=1.0, Color= ' Blue ')
plt.show () # show function moves to the back

The Blue line in this figure is the exponential moving average.

Other related articles: Python draw the stock moving average of bollinger bands

A band used to describe changes in stock prices
Composition: Three lines: Upper rail, middle rail, lower rail
The middle rail is a simple moving average, the upper rail is: simple moving EMA plus n day twice times standard deviation, the next track is: Simple moving EMA minus n day twice times standard deviation

First, take out the data and calculate the simple moving average.

n = 5
weights = Np.ones (n)/N
C = np.loadtxt (' data.csv ', delimiter= ', ', usecols= (6,), unpack=true)
# simple Moving EMA 
  sample = Np.convolve (weights, c) [N-1:-(N-1)]
# Standard margin group
deviation = []

Calculate N-day standard deviation

For I in range (N-1, Len (c)):
    # dev is N Day's closing price, push forward n
    -day dev = c[i-(N-1): i+1]

    # Create an array of n days, value 0
    Ave rages = Np.zeros (N)
    # Populates the array with the I-(N-1) of the simple moving EMA, which is the
    Averages.fill (Sample[i-(N-1)) of the subscript starting from 0
    Computational standard deviation
    dev = dev-averages
    dev = Dev * * 2
    dev = np.sqrt (Np.mean (dev))
    deviation.append (Dev)

# twice times The standard deviation
deviation = 2 * Np.array (deviation)

Then the data for the track and the lower tracks are:

# upper Rail
UPPERBB = sample + deviation
# lower rail
LOWERBB = sample-deviation

Draw a Bollinger band

Time = Np.arange (N-1, Len (c))
# True closing price
Plt.plot (Time, c[n-1:], linewidth=1.0, color= ' yellow ')
# simple Moving EMA
Plt.plot (time, sample, linewidth=1.0, color= ' green ')
# orbit
Plt.plot (time, UPPERBB, linewidth=2.0, color= ' red ') , linestyle= '--')
# lower Orbital
plt.plot (time, LOWERBB, linewidth=2.0, color= ' green ', linestyle= '--')
plt.show ()

As shown above, two dotted lines inside the cloth belt, the middle green solid line for the simple moving average, yellow part of the true closing trend

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.