Use Python to do stock market data analysis! The necessary skills of shareholders Oh! Not yet get to go?

Last Update:2018-08-22 Source: Internet

Author: User

Tags benchmark stock prices

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Trading strategy

An open position is closed when certain expected conditions are reached. Long positions indicate that a trade requires a rise in the price of a financial commodity to generate profit, and a short position indicates that the transaction requires a decline in the financial commodity price to generate profit. In stock trading, long positions are bullish, short positions are bearish and vice versa. (This is very typical in stock options trading)

Enter the group: 548377875 can get dozens of sets of PDFs Oh!

We want to design a trading strategy that contains the trading excitation signals for fast trading, the rules that determine the degree of turnover, and the complete exit strategy. Our goal is to design and evaluate this trading strategy.

Assume that the proportion of the total amount per transaction is fixed (10%). Also set in each transaction, if the loss exceeds 20% of the transaction value, we exit the position. Now we have to decide when to enter the position and when to exit to ensure profitability.

Here I'm going to show the method of moving average intersection analysis moving average. I will use two moving averages, one fast and the other slow. Our strategy is to:

Start trading when fast moving averages and slow moving lines converge
Stop trading when fast moving averages and slow moving lines intersect again

Long is the start of trading when the fast average line rises above the slow average and stops trading when the fast average falls below the slow average. Short selling is the opposite, it refers to the start of trading when the fast average line falls below the slow average, and the fast average rises above the slow average line to stop trading.

Now we have a whole set of strategies. We need to do a test before we can use it. Backtesting is a common test method that uses historical data to see if the strategy is profitable. For example, the Apple stock value chart, if the 20-day moving average is the fast line, 50 days of moving average is a slow line, then our strategy is not very profitable, at least when you have been doing long positions.

Let's automate the process of backtesting. First we want to identify when the 20-day average line is below 50 days, and above.

apple[' 20d-50d ' =apple[' 20d ']-apple[' 50d ']

Apple.tail ()

We call the difference symbol a state transition. The fast moving average represents a bullish state above a slow moving average, while the opposite is a bear market. The following code is used to identify the state transitions.

# Np.where () is a vectorized if-else function, where a condition are checked for each component of a vector, and the first Argument passed is used when the condition holds, and the other passed if it does not

apple["regime"] = Np.where (apple[' 20d-50d '] > 0, 1, 0)

# We have 1 's for bullish regimes and 0 's for everything else. Below I Replace bearish regimes's values with-1, and to maintain the rest of the vector, the second argument is apple["Re Gime "]

apple["regime"] = Np.where (apple[' 20d-50d '] < 0,-1, apple["regime"])

apple.loc[' 2016-01-01 ': ' 2016-08-07 ', ' Regime '].plot (Ylim = ( -2,2)). Axhline (y = 0, color = "BLACK", LW = 2)

apple["Regime"].plot (Ylim = ( -2,2)). Axhline (y =0, color = "Black", LW =2)

apple["Regime"].value_counts ()

1 966

-1 663

0 50

Name:regime, Dtype:int64

From the above curve can be seen 966 days apple stock is bull, 663 days is a bear market, there are 54 days without bias. (In the original, the bull and bear are reversed, the translation is corrected; The original number is not the same as the code result, the translation is corrected according to the code result)

Trading signals appear at the time of state transitions. When a bull market occurs, the buy signal is activated and the sell signal is activated at the end of the bull market. Similarly, a sell signal is activated when a bear market occurs, and a buy signal is activated at the end of a bear market. (This is only good for you if you have short stocks, or if you use some other method, such as stock option betting on the market)

# to ensure a trades close out, I temporarily change the regime of the last row to 0

Regime_orig = apple.ix[-1, "regime"]

Apple.ix[-1, "regime"] = 0

apple["Signal"] = Np.sign (apple["regime"]-apple["regime"].shift (1))

# Restore Original regime data

Apple.ix[-1, "regime"] = Regime_orig

Apple.tail ()

apple["Signal"].plot (Ylim = (-2, 2))

apple["Signal"].value_counts ()

0.0 1637

-1.0 21

1.0 20

Name:signal, Dtype:int64

We will buy Apple stock 20 times, throw 21 times (the original number is inconsistent with the code result, the translation is corrected according to the code result). If we only chose Apple's stock, only 21 transactions took place within six years. If we take action every time the Bulls turn short, we will be involved in 21 trades. (Please keep in mind that the number of trades is not as good as possible, after all, the transaction is not free)

You may have noticed that the system is not very stable. Fast averages excite trades above the slow average, even if the status is just a short moment, which can lead to the immediate termination of the transaction (which is not good because the cost of every transaction in the real world will be quickly consumed). At the same time all bull markets turn into a bear market, and if you allow both a bear market and a bull market, there will be a strange situation where each trade ends up automatically triggering another trade in the opposite direction. A better system would require more evidence to justify the direction of the market, but we will not pursue that detail here.

Let's look at the stock price for each buy and sell.

apple.loc[apple["Signal"] ==1, "Close"]

Date

2010-03-16 on 224.449997

2010-06-18 on 274.070011

2010-09-20 on 283.230007

2011-05-12 on 346.569988

2011-07-14 on 357.770004

2011-12-28 on 402.640003

2012-06-25 on 570.770020

2013-05-17 on 433.260010

2013-07-31 on 452.529984

2013-10-16 on 501.110001

2014-03-26 on 539.779991

2014-04-25 on 571.939980

2014-08-18 on 99.160004

2014-10-28 on 106.739998

2015-02-05 on 119.940002

2015-04-28 on 130.559998

2015-10-27 on 114.550003

2016-03-11 on 102.260002

2016-07-01 on 95.889999

2016-07-25 on 97.339996

Name:close, Dtype:float64

apple.loc[apple["Signal"] ==-1, "Close"]

Date

2010-06-11 253.509995

2010-07-22 259.020000

2011-03-30 348.630009

20 11-03-31 348.510006

2011-05-27 337.409992

2011-11-17 377.410000

2012-05-09 569.180023

2012 -10-17 644.610001

2013-06-26 398.069992

2013-10-03 483.409996

2014-01-28 506.499977

2014-0 4-22 531.700020

2014-06-11 93.860001

2014-10-17 97.669998

2015-01-05 106.250000

2015-04-16 126.169998

2015-06-25 127.500000

2015-12-18 106.029999

2016-05-05 93.239998

2016-07-08 96 .680000

2016-09-01 106.730003

Name:close, Dtype:float64

# Create a DataFrame with trades, including the "Price" at the "trade" and the "regime under which" is made.

Apple_signals = Pd.concat ([

Pd. DataFrame ({"Price": apple.loc[apple["Signal"] = = 1, "Close"],

"Regime": apple.loc[apple["Signal"] = = 1, "regime"],

"Signal": "Buy"}),

Pd. DataFrame ({"Price": apple.loc[apple["Signal"] = = 1, "Close"],

"Regime": apple.loc[apple["Signal"] = =-1, "regime"],

"Signal": "Sell"}),

])

Apple_signals.sort_index (inplace = True)

Apple_signals

# Let's see the profitability of long trades

Apple_long_profits = PD. DataFrame ({

"Price": apple_signals.loc[(apple_signals["Signal"] = = "Buy") &

apple_signals["regime"] = = 1, "Price"],

"Profit": PD. Series (apple_signals["Price"] – apple_signals["Price"].shift (1)). loc[

apple_signals.loc[(apple_signals["Signal"].shift (1) = = "Buy") & (apple_signals["Regime"].shift (1) = = 1)].index

].tolist (),

"End Date": apple_signals["Price"].loc[

apple_signals.loc[(apple_signals["Signal"].shift (1) = = "Buy") & (apple_signals["Regime"].shift (1) = = 1)].index

].index

})

Apple_long_profits

We can see from the table above that Apple's stock price fell sharply on the day of May 17, 2013, and our system would be very poor. But that price fell not because of the big crisis, but just a split. Since dividends are not as significant as the split, this may affect system behavior.

# Let's see the result of the whole period for which we have Apple data

PANDAS_CANDLESTICK_OHLC (apple, stick = $, otherseries = ["20d", "50d", "200d"])

We do not want the performance of our trading system to be affected by dividends and splits. One solution is to use historical dividend split data to design the trading system, which can truly reflect the behavior of the stock market and help us find the best solution, but this approach is more complicated. Another option is to adjust the price of a stock based on dividends and splits.

Yahoo! Finance only offers adjusted stock closing prices, but these are enough for us to adjust the opening, high prices and low prices. Adjusting the closing price is the way to achieve this:

Let's go back to the beginning, adjust the stock price first, and then evaluate our trading system.

def ohlc_adj (DAT):

"""

:p Aram Dat:pandas DataFrame with stock data, including ' Open ', ' High ', ' low ', ' close ', and ' Adj close ', with ' Adj close ' Containing adjusted closing prices

: Return:pandas DataFrame with adjusted stock data

This function adjusts the stock data for splits, dividends, etc., returning a data frame with

"Open", "High", "Low" and "Close" columns. The input DataFrame is similar to that returned

By pandas Yahoo! Finance API.

"""

return PD. DataFrame ({"Open": dat["open"] * dat["ADJ close"]/dat["Close"],

' High ': dat["High"] * dat["ADJ close"]/dat["Close"],

"Low": dat["Low"] * dat["ADJ close"]/dat["Close"],

"Close": dat["Adj Close"})

Apple_adj = Ohlc_adj (apple)

# This next code repeats all the earlier analysis we do on the adjusted data

apple_adj["20d"] = Np.round (apple_adj["Close"].rolling (window =, center =false). Mean (), 2)

apple_adj["50d"] = Np.round (apple_adj["Close"].rolling (window =, center =false). Mean (), 2)

apple_adj["200d"] = Np.round (apple_adj["Close"].rolling (window = $, center =false). Mean (), 2)

apple_adj[' 20d-50d '] = apple_adj[' 20d ')-apple_adj[' 50d ']

# Np.where () is a vectorized if-else function, where a condition are checked for each component of a vector, and the first Argument passed is used when the condition holds, and the other passed if it does not

apple_adj["regime"] = Np.where (apple_adj[' 20d-50d '] > 0, 1, 0)

# We have 1 's for bullish regimes and 0 's for everything else. Below I Replace bearish regimes's values with-1, and to maintain the rest of the vector, the second argument is apple["Re Gime "]

apple_adj["regime"] = Np.where (apple_adj[' 20d-50d '] < 0,-1, apple_adj["regime"])

# to ensure a trades close out, I temporarily change the regime of the last row to 0

Regime_orig = apple_adj.ix[-1, "regime"]

Apple_adj.ix[-1, "regime"] = 0

apple_adj["Signal"] = Np.sign (apple_adj["regime"]-apple_adj["regime"].shift (1))

# Restore Original regime data

Apple_adj.ix[-1, "regime"] = Regime_orig

# Create a DataFrame with trades, including the "Price" at the "trade" and the "regime under which" is made.

Apple_adj_signals = Pd.concat ([

Pd. DataFrame ({"Price": apple_adj.loc[apple_adj["Signal"] = = 1, "Close"],

"Regime": apple_adj.loc[apple_adj["Signal"] = = 1, "regime"],

"Signal": "Buy"}),

Pd. DataFrame ({"Price": apple_adj.loc[apple_adj["Signal"] = = 1, "Close"],

"Regime": apple_adj.loc[apple_adj["Signal"] = =-1, "regime"],

"Signal": "Sell"}),

])

Apple_adj_signals.sort_index (inplace = True)

Apple_adj_long_profits = PD. DataFrame ({

"Price": apple_adj_signals.loc[(apple_adj_signals["Signal"] = = "Buy") &

apple_adj_signals["regime"] = = 1, "Price"],

"Profit": PD. Series (apple_adj_signals["Price"] – apple_adj_signals["Price"].shift (1)). loc[

apple_adj_signals.loc[(apple_adj_signals["Signal"].shift (1) = = "Buy") & (apple_adj_signals["Regime"].shift (1) = =1)].index

].tolist (),

"End Date": apple_adj_signals["Price"].loc[

apple_adj_signals.loc[(apple_adj_signals["Signal"].shift (1) = = "Buy") & (apple_adj_signals["Regime"].shift (1) = =1)].index

].index

})

PANDAS_CANDLESTICK_OHLC (Apple_adj, stick = $, otherseries = ["20d", "50d", "200d"])

Apple_adj_long_profits

You can see that the price chart after the dividend and split adjustment has become very different. After the analysis we will use the data after the adjustment.

Assuming we have 1 million in the stock market, let's take a look at how our system reacts according to the following conditions:

Trade with 10% of the total amount each time
Exit the position if the loss reaches 20% of the turnover

When simulating, remember:

100 stocks per transaction
Our rules of avoidance are thrown when stock prices fall to a certain value. We need to check whether the low price during this period is low to the point where the avoidance rules can be set. In reality, we cannot guarantee that we can sell the stock at a low value unless we buy the option. Here for brevity we will set the value as the sell value.
Each transaction will be paid to the intermediary a certain commission. We're not thinking about this here.

The following code demonstrates how to implement a backtesting test:

# We need to get the low of the price during each trade.

Tradeperiods =PD. DataFrame ({"Start": Apple_adj_long_profits.index,

"End": apple_adj_long_profits["End Date"]})

apple_adj_long_profits["Low"] =tradeperiods.apply (Lambdax:min (apple_adj.loc[x["Start"]:x["End"], "low"]), Axis =1)

Apple_adj_long_profits

# Now we had all the information needed to simulate this strategy in apple_adj_long_profits

Cash =1000000

Apple_backtest =PD. DataFrame ({"Start Port. Value ": [],

"End Port. Value ": [],

"End Date": [],

"Shares": [],

"Share price": [],

"Trade Value": [],

"Profit per Share": [],

"Total Profit": [],

"Stop-loss triggered": []})

Port_value =.1# Max Proportion of portfolio bet on all trade

Batch =100# number of shares bought per batch

Stoploss =.2#% of trade loss that would trigger a stoploss

Forindex, Row inapple_adj_long_profits.iterrows ():

Batches =np.floor (Cash *port_value)//np.ceil (batch *row["Price"]) # Maximum number of batches of stocks invested in

Trade_val =batches *batch *row["Price"] # What much money was put on the line with each trade

ifrow["Low" < (1-stoploss) *row[' price ': # Account for the Stop-loss

Share_profit =np.round ((1-stoploss) *row["Price"], 2)

Stop_trig =true

Else

Share_profit =row["Profit"]

Stop_trig =false

Profit =share_profit *batches *batch # Compute profits

# Add a row to the Backtest data frame containing the results of the trade

Apple_backtest =apple_backtest.append (PD. DataFrame ({

"Start Port. Value ": Cash,

"End Port. Value ": Cash +profit,

"End Date": row["End Date"],

"Shares": Batch *batches,

"Share Price": row["Price"],

"Trade Value": Trade_val,

"Profit per Share": Share_profit,

"Total Profit": Profit,

"Stop-loss triggered": Stop_trig

}, index =[index]))

Cash =max (0, Cash +profit)

Apple_backtest

apple_backtest["End Port. Value "].plot ()

Our total property has increased by 10% in six years. This is not a bad result, given that only 10% of the total amount of each transaction is accounted for.

We also note that this strategy does not trigger a stop loss mandate. Does that mean we don't need it? That's hard to say. After all, this excitation event depends entirely on our setpoint.

The Stop loss delegate is automatically activated, and it does not take into account the overall stock market trend. That is to say whether the stock market is really low or temporary fluctuations will inspire the cessation of the loss of the Commission. And the latter is what we need to be aware of, because in reality, a stop loss mandate triggered by price fluctuations not only allows you to spend a transaction fee, but also does not guarantee that the final selling price is the price you set.

The following links support and oppose the use of the Stop loss delegate separately, but after the content I will not ask our backtesting system to use it. This simplifies the system but is not very realistic (I believe industrial systems should have a stop loss mandate).

In reality, we will not use 10% of the total to charge a stock but to invest in a variety of stocks. You can trade with different companies at a given time, and most of the property should be in stocks, not cash. Now we are investing in multiple stocks (the original is stops, the feeling is typo, the translation is translated according to stocks), and the city is retired when the two moving averages intersect (no stop loss is used). We need to change the code for backtesting. We use a pandas dataframe to store all of the stock, and the previous cycle also needs to record more information.

The following function is used to generate a buy and sell order, as well as another backtracking test function.

The code is too long, the reading experience is not good, you can click on the "read the original" in the Web version view

Signals =ma_crossover_orders ([("AAPL", Ohlc_adj (apple)),

("MSFT", Ohlc_adj (Microsoft)),

("GOOG", Ohlc_adj (Google)),

("FB", Ohlc_adj (Facebook)),

("TWTR", Ohlc_adj (Twitter)),

("NFLX", Ohlc_adj (Netflix)),

("AMZN", Ohlc_adj (Amazon)),

("YHOO", Ohlc_adj (Yahoo)),

("Sny", Ohlc_adj (Yahoo)),

("Ntdoy", Ohlc_adj (Nintendo)),

("IBM", Ohlc_adj (IBM)),

("HPQ", Ohlc_adj (HP))],

Fast =20, Slow =50)

Signals

475 ROWSX3 Columns

BK =backtest (signals, 1000000)

Bk

475 rowsx9 Columns

bk["Portfolio Value"].groupby (level =0). Apply (Lambdax:x[-1]). Plot ()

A more realistic portfolio can invest in any 12 shares and achieve a 100% gain. This looks good, but we can do it better.

Benchmark Analysis method

The benchmark analysis method can analyze the efficiency of the trading strategy. Benchmarking is the comparison of strategies with other (well-known) strategies to evaluate the performance of the strategy.

Each time you evaluate a trading system, compare it to the buy-hold Strategy (SPY). In addition to some trust funds and a few investment managers who do not use it, the strategy is invincible for most of the time. The efficient market hypothesis emphasizes that no one can beat the stock market, so everyone should buy index funds because it can reflect the composition of the market as a whole. The Spy is a trading Open Index fund (a trust fund that can be traded like a stock), and its price effectively reflects the stock price in the 500. Buying and holding a spy means that you can effectively match the market rate of return rather than defeating it.

Here's the spy data, let's look at the return of a simple buy-hold Spy:

Spyder =web. DataReader ("SPY", "Yahoo", Start, end)

Spyder.iloc[[0,-1],:]

Batches =1000000//np.ceil (100*spyder.ix[0, "Adj Close"]) # Maximum number of batches of stocks invested in

Trade_val =batches *batch *spyder.ix[0, "Adj Close"] # How much money was used to buy SPY

Final_val =batches *batch *spyder.ix[-1, "Adj Close" + (1000000-trade_val) # Final value of the Portfolio

Final_val

2180977.0

# We see the buy-and-hold strategy beats the strategy we developed earlier. I would also like-to-see a plot.

Ax_bench = (spyder["Adj Close"]/spyder.ix[0, "Adj Close"]). Plot (label = "SPY")

Ax_bench = (bk["Portfolio Value"].groupby (level =0). Apply (Lambdax:x[-1])/1000000). Plot (ax =ax_bench, label = " Portfolio ")

Ax_bench.legend (Ax_bench.get_lines (), [L.get_label () Forl inax_bench.get_lines ()], loc = ' best ')

Ax_bench

Buying and holding a spy is better than our current trading system-our system has not yet considered expensive transaction fees. Considering the opportunity cost and the consumption of the strategy, we should not use it.

How can we improve our system? For beginners to diversify as much as possible is an option. All of our stocks are now from technology companies, which means that the slump in technology companies is reflected in our portfolio. We should design a system that can take advantage of short positions and bear markets so that we can make a profit no matter how the market changes. We can also look for better ways to predict the highest expected price of a stock. But no matter what we need to do better than the spy, or because our system will bring the opportunity cost, it is useless.

Other benchmarking strategies exist as well. If our system is better than "buy and hold Spy", we can further compare it with other systems such as:

(I first approached these strategies here) the basic criteria are still: do not use a complex, large-volume trading system if it does not win a simple trading index fund model that is not frequent. (In fact, this standard is very difficult to achieve)

The last point to emphasize is that assuming your trading system defeats all of the benchmark systems in backtesting, it does not mean that it can accurately predict the future. Because Backtesting is easy to fit, it cannot be used to predict the future.

Conclusion

Although the final conclusion of the lecture is not so optimistic, remember that effective market theory is flawed. My personal view is that when trading is more dependent on algorithms, it is more difficult to beat the market. There is a saying: trust funds are unlikely to beat the market, your system can beat the market is only a possibility. (The reason for the poor performance of the trust fund is that the charges are too high and the index fund does not exist.) ）

This lecture simply describes a trading strategy based on the moving average. There are many other trading strategies that are not mentioned here. And we did not delve into short stock and currency trading. In particular, stock options have many things to say, and it also provides different ways to predict the direction of stocks. You can read more about derivatives Analytics with python:data analysis, Models, Simulation, calibration and hedging books. (The library of the University of Utah has this book)

Another resource is the O ' Reilly python for Finance, also available in the University of Utah Library.

It is normal to remember to lose money in stocks, and the stock market can offer high returns that other methods cannot provide, and every investment strategy should be well thought out. This lecture is intended to stimulate the students to further explore the topic themselves.

Homework

Question 1

Establish a trading system based on the moving average (no stop loss conditions are required). Choose 15 stocks listed before January 1, 2010, use backtesting to test your system, and compare the spy benchmark, can your system beat the market?

Question 2

In reality, a commission is paid for every transaction. Figure out how to calculate the commission, and then modify your backtes () function so that it can calculate different commission modes (fixed fees, prorated charges, etc.).

Our current moving average junction analysis system triggers a trade when the two averages intersect. Modify the system to make it more accurate:

When you have completed the changes, repeat question 1, use a real Commission strategy (from the Exchange) to simulate your system, and ask the moving average difference to reach a certain moving standard difference before triggering the transaction.

Question 3

Our trading system is unable to handle short stocks. The complexity of short selling is that the loss is without a lower limit (the maximum loss of a long position equals the total price of the purchased stock). Learn how to handle short positions and then modify Backtest () to allow them to handle short trades. Think about how to implement short trades, including how many short trades are allowed? How to deal with short trades when making other transactions? Tip: The amount of a short trade can be represented by a negative number in the function.

Repeat question 1 after completion, and you can also consider the factors mentioned in question 2.

Use Python to do stock market data analysis! The necessary skills of shareholders Oh! Not yet get to go?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More