The strategy of deep learning stock selection in risk neutrality

Source: Internet
Author: User
Tags comparable

The problem of data-driven machine learning model

The current popular machine learning methods, including deep learning, are mostly data-driven methods that extract knowledge by learning about training set data. The success of the data-driven machine learning approach is based on the premise that the "knowledge" learned from the training set data still applies when it is pushed out of the sample.

When the machine learning method is applied to the investment field, it is generally used to train the model with the historical data as the training set data, which is applied in the future market. In the strategy of deep learning multi-factor stock selection, it is also through the study of historical share market data to establish the forecast model. Whether the application of this type of machine learning method in the field of investment will succeed depends on whether the model learned from historical data is effective in future extrapolation.

However, in the stock market, on the one hand, due to the large number of participants, there are plenty of factors and noises affecting stock market, on the other hand, there are style switch and industry wheel movement. As a result, there is a significant difference in the "future" scenarios where the history of the Training machine learning model and the application of machine learning models can be predicted, so that machine learning models that perform well in the sample are generally performed outside the sample.

For example, if the stock sample used to train a model comes from a market with a small-market-capitalization style, the trained model is better able to predict in a small market, and will "think" that a small market value is an important reason for the excess return of the stock. Once the market style switches to big market capitalisation, the model that is trained under the small-cap style may fail.

In order to solve this problem, this report attempts to neuter the risk of stock samples and reduce the influence of risk factor rotation and industry wheel movement on model training and prediction. It should be noted that the "risk neutrality" mentioned in the present report refers to the risk neutrality at the machine learning model level, which is different from the combination optimization method in multi-factor stock selection to reduce the risk exposure.

This report first reviews the model structure and related parameter setting of the deep learning multi-factor stock-selecting model, then introduces the method of risk neutrality, and finally proves the validity of the proposed method through empirical analysis. As a result, the strategy presented in this report is able to achieve better market performance.

Second, deep learning prediction model

(i) deep learning model structure

In this report, we use a 7-layer deep neural network system to establish the stock price forecasting model in the deep learning selection model. It contains input layer x, output layer y, and hidden layer H1, H2, H3 、......、 H5. The number of nodes in each layer is shown in the following table. The structure of this deep learning model is a model structure optimized by grid search (see the third of Deep learning series: New advances in deep learning, re-mining of alpha factors).

Wherein, x is the input layer, the number of nodes is 156, representing the stock sample of 156 characteristics, including the traditional selection of stock factors (such as valuation factor, scale factor, reversal factor, liquidity factor, volatility factor), the price of technical indicators (such as MACD, KDJ and other indicators), And 28 of the 0-1 variables representing the industry attributes of the claimant.

Y is the output layer, a total of 3 nodes, indicating the stock of the future trend of three possibilities: rise (with excess income), flat (no excess income), fall (negative excess income). In this report, 3 different output categories are represented by a 3-dimensional vector. Y=[1 0 0] Indicates an upward sample, y=[0 1 0] indicates a flat plate sample, Y=[0 0 1] represents a falling sample.

The deep neural network is to fit the relationship between input x and output y, and establish a predictive model for the output Y. Wherein, the 1th hidden layer of the node J is

The node J of the first M hidden Layer (m=2,3,4,5) is

The output Layer node K is

Wherein, Σ_h and σ_o respectively represents the implicit layer activation function and output layer activation function, the parameters of the different layers can be recorded as the parameter w, then the neural network can be recorded as Y=f (X;w), where w is the parameter to be optimized. A neural network system with 2 hidden layers H1 and H2 is presented.

The Universal approximation theorem (Universal approximation theorem) proves that neural networks have strong fitting capabilities. Before applying neural network to predict, it is necessary to use a large number of training samples to obtain the parameter W of the network by the optimized method. Specifically, in deep learning, the parameter w is optimized by training the sample data (training set), so that the predicted output y of the model is as close as possible to the true label T of the sample, that is, to minimize the following prediction error (loss function)

The optimization problem of the objective function is called minimization of mean square error. For classification problems, other forms of objective functions can also be constructed, for example, the cross-entropy (Entropy) loss function is better suited as an objective function optimized for the classification neural network model:

In the training of deep learning model, we usually use the method of error reverse propagation to get the gradient and optimize the parameters.

In order to improve the generalization ability of the model, this report uses the dropout method, randomly chooses to discard different hidden nodes when each parameter iteration is updated, which drives each hidden layer node to learn more useful and independent features that do not depend on other nodes. At the same time, this report uses batch normalization technology to improve the training efficiency of the model. For more details on dropout and batch normalization, refer to the report, "The third of Deep learning series: New advances in deep learning, the re-excavation of alpha factor".

(b) Stock sample labeling

In the multi-factor stock selection, we want to pick a portfolio that generates excess earnings and run the benchmark. Thus, the goal of the deep learning Prediction model is to identify stocks that generate excess returns.

Accordingly, in order to build a deep learning model, for each trading day of the sample interval, the stock price of the different stock samples is labeled "label" (i.e. output y of the Deep learning model) for a period of time (this report is 10 trading days): "Up", "Down" and "flat". Among them, stocks marked as rising are those that generate excess income, which is the largest share of the gains over the next 10 trading days (more than 90% cents); stocks marked as falling are negative excess returns, which is the least part of the next 10 trading days (less than 10% cents) The stocks marked as flat are those in the middle of the next 10 trading days (the gains are between 45% and 55% cents). As shown in. This makes the sample quantity of the three kinds of samples equal and more advantageous to the training classification model.

After labeling and filtering stock samples, you can train deep learning predictive models. In the stock selection, according to the rising rate, that is to predict the share of the "rising" category of probability P ("Up" |x), the election of stocks. That is, to find the next 10 trading days in the top 10% position of the probability of a larger stock.

The problem with this sample labeling is that the label of the stock is susceptible to market-style wheel movement and industry rotation. In the small-cap style, the "upside" stock is mostly small-cap stocks, and "down" stocks are mostly large-cap stocks. If the stock sample is mostly from a market of small-cap style, then the model tends to use market capitalization as an important feature and tends to classify small-cap stocks into "up" categories. Once the market style switches to the big market style, the model has a relatively large risk of failure. Similarly, if the current industry sector, a strong performance of the sector, then the component of the industry sector is more likely to be labeled as "up" category, the classification model, easy to cause the industry to deviate.

(iii) Risk-neutral Stock sample labeling

In order to make the labeling of samples (that is, the target of stock selection) is not susceptible to risk factors, the sample labeling process can be risk-neutral.

If you need to find out in each industry can generate excess income in the stock, you can first market-wide stocks in accordance with different industries, respectively, at the same time to cross-section of the stock in different industries to sort the rate of return, select each industry "run win" and "Run Lost" stock, as shown in. The goal of this sample labeling method is to look for stocks in each industry that generate excess revenue, so that the label y of a stock sample is not affected by the industry's wheel movement. Therefore, this method can be referred to as the "industry neutral" sample labeling method.

The wheel style of the size plate is a very significant style switching phenomenon in a shares. If you want to look for stocks that are not easily affected by the switching of the size plate, you can divide the whole market into different market segments and search for stocks that generate excess income in different market values, that is to say, at the same time, the rate of return of the shares in different market segments is sorted, and each market value range is selected. "Run to win" and "run Lost" stock as shown. Under this sample labeling method, the label y of the Stock sample is not easily affected by the style switching of the size plate. Therefore, this method can be called a "market-style neutral" sample labeling method.

The above is a neutral approach to single risk factors. More generally, we need to consider multiple risk factors at the same time. Assuming that k risk factors are taken into account, risk factors can be stripped by a regression method:

where r is the yield of the stock in the subsequent period, and X represents the factor exposure value of the stock on the K risk factor. The yield F of K risk factor and the return of stock after stripping risk factor are obtained by cross-sectional regression of market stock in t time.

And then follow the residuals? The section T is sorted at the same time, and the stock is marked as "up", "Down", and "flat" three categories. The goal of the forecast model is no longer to look for the next yield in the first 10% of the stock, but to peel the risk factor after the yield in the top 10% of the stock.

In this report, we use industry and market capitalization as a risk factor for risk-neutral processing.

Three, strategy and empirical analysis

(i) deep learning predictive models

The process of deep learning multi-factor stock-picking trading strategy is shown. In model training and stock selection transactions, the market data needs to be pre-processed. In the model training stage, we need to standardize the historical market data into the characteristic data suitable for the deep learning model, to calculate the characteristic data of each stock from the current market data in the stock exchange stage, and to standardize the data, and to predict the future trend of each stock by training good deep learning model. Then, according to the score of each stock to select stocks, build a portfolio.

Data preprocessing is divided into two steps: The calculation of stock factor and the standardization of stock factor.

First of all, we extract market data from wind, Tian Soft technology and other financial data terminals, and calculate stock factor as the characteristic of machine learning model. In this report, we have selected 156 features, including traditional stock-selection factors (such as valuation factors, scale factors, reversal factors, liquidity factors, volatility factors), price-volume technical indicators (such as MACD, KDJ and other indicators), and 28 0-1 variables representing the industry attributes of the claimant.

The factor normalization process is divided into the following sections:

1. Exception value, missing value processing

The outliers and missing values in the stock factor data are processed. For example, when the factor value of a stock period is missing or the data is abnormal, the factor value of the previous period is substituted.

2. Extreme pressure boundary treatment

When the factor data of the stock deviates significantly from the industry stock factor data, the boundary threshold can be set to deal with extreme value. For example, we can make the upper boundary ub a 3 times-fold standard deviation from the average value of the stock factor in the industry over the same period, and the lower boundary lb equals the average value of the stock factor in the industry minus 3 times times the standard deviation. When the stock factor value exceeds the upper boundary, that is, when X>ub, make x=ub; When the stock factor value exceeds the lower boundary, that is, when the x<lb, x=lb. This allows all factor data to be located between the lower boundary lb and the upper boundary UB.

3. Standardization of factors along the time direction

The factor normalization in the time direction makes the factor values comparable in different periods. For example, 2015 market turnover and the current market has a significant difference between the volume-related stock selection index and the current market has a large difference. Can be related to the volume of the stock selection factor in accordance with the previous period of the volume of the average value of processing, making different periods of factor value is comparable. The advantage of standardizing along time-directional factors is that the models we train with historical data can be used to predict future markets.

4, along the cross-sectional factor standardization

The standardization of the factors along the cross-sectional direction makes the values of different characteristics comparable, such as the value of the circulating market and the turnover ratio, and the standardization of the factors can make the normalized circulation market value and turnover ratio comparable. The methods of factor standardization include z-score standardization, Min-max standardization, sorting standardization and so on.

Suppose at the moment T, the value of the factor I of a stock k is x (t,k) ^i. Z-score Normalization handles the variable to a mean value of 0 and a variance of 1:

Min-max Normalization handles the variable to a number between 0 and 1:

Sorting normalization is based on the value of the stock in factor I, according to the ordinal number corresponding to 0 to 1. The normalization of the minimum factor value is 0, the maximum factor value is normalized to 1, and the other is normalized to a number less than 1 and greater than 0.

5, according to the machine learning model to adjust the distribution of factors

Different machine learning models differ in the assumptions of the input data, and the distribution of the factors needs to be adjusted according to the assumptions of the model. Deep neural networks generally do not have strong assumptions about input data, and can accommodate both continuous input data and discrete input data (such as industry 0, 1 dummy variables). If you are using a self-encoder or a restricted Boltzmann machine, you need to be aware of the model's assumptions about the distribution of input data.

After preprocessing the factor data, we can input the well-processed factor data into the deep learning Prediction model and give the prediction score in the process of stock selection. But in the model training stage, in order to train the deep learning Prediction model, we need to add "tag" to the different stock samples and select the training samples.

In the sample labeling and sample screening, we use the method shown in the previous section to return the stock sample to the risk factor, peel off the impact of the risk factor, and then do the sample labeling and model learning.

Outside of the sample, we can rate each stock in a forecast. The top 10% of the stock build portfolio is screened based on the stock's upward rating.

Unlike the implicit layer, which uses the Relu activation function, the output layer uses the Softmax activation function. At the time of prediction, the input vector of the output layer Softmax activation function is Z=[z1 Z2 Z3], after the Softmax function, the predicted value is

Of these, the y1,y2,y3 are more than 0 and less than 1, and the sum is equal to 1. The predicted value of the first output node is the rate at which we are predicting the stock's rise, that is, the probability that the stock will fall into the "up" category.

In this report, we use all-market stocks to train deep learning models, excluding stocks that have been traded for less than a year, excluding St shares, and excluding stocks that are suspended from trading days and stop and fall.

The rate of return for the next 10 trading days is forecasted. For the deep learning model of risk neutrality, the cross section regression was carried out with industry and market capitalization factors, and the influence of risk factor was stripped down, then the model was established.

The sample period is from January 2007 to April 2018. The model is updated every six months to train the model with market data for the last 4 years. The out-of-sample data was started in January 2011 (data from 2007 to 2010 were used to train the first deep learning predictive model).

(ii) The role of risk neutrality

In this report, we are rolling through the model and retraining the model every six months. The model was trained with data for the last 4 years of each training session.

The rank correlation coefficient of the cross section is calculated for the stock score and the yield of the next 10 trading days, as shown in the calculation IC. The average value of the IC is 0.082 and the standard deviation is 0.108. Most of the time, the stock selection model of the IC is greater than 0.

In order to test the role of risk neutrality, this report analyzes the relationship between the common deep learning strategy and the risk neutral deep learning strategy, respectively, and the circulating market value factor.

First of all, the relationship between the IC of the Deep learning stock selection model and the circulating market value factor IC. The diagram below shows the relationship between the common deep learning stock selection model IC and the circulating market value IC, the stronger the correlation indicates that the performance of the deep learning stock selection factor is stronger than that of the stock market factor. It can be seen that the current market value factor IC is strongly correlated with the common deep learning factor IC, and is negatively correlated (because the market capitalization factor is a negative factor). It is shown that the performance of the common deep learning factor is correlated with the performance of the circulating market value factor, and the common deep learning factor is better when the circulating market value factor is well performed.

As can be seen from the right image, the correlation between the deep learning factor IC and the circulating market value factor IC is significantly reduced after the risk-neutral treatment. This means that the performance of the deep learning factor of risk neutrality is less correlated with the performance of the liquidity market factor.

Secondly, the relationship between the rate of each period and the market value factor of the deep learning stock selection model. The left-hand image shows the cross-sectional correlation coefficient of the average depth learning model for each period and the circulating market value, and the average of the cross-sectional correlation coefficient is-0.20, which indicates that the smaller stock market value is higher in the common deep learning model, the smaller the market value of the model. In the risk-neutral deep learning model, the average of the cross-sectional correlation coefficient is-0.12, which has a significant decline and is not so biased to the small market value. And since 2017, most of the time cross-sectional correlation coefficient is greater than 0, indicating the model of the stock market-based style.

From the above analysis, we can see that the Deep learning stock selection model is not susceptible to the market capitalization factor after the risk-neutral treatment. However, the IC of the risk-neutral deep learning model still has a certain correlation with the market value factor IC, which shows that when the market capitalization factor is good, the deep learning model tends to get better income. This is because the market capitalisation factor is a long-term performance of the A-share markets, the deep learning model based on historical data can not completely peel off the impact of market capitalization factors.

(iii) Strategy backtesting

This report to the 500 index as a stock pool, to carry out stock selection strategy backtesting. The strategy is a 10-day trading period. Each time the stock is divided into 10 files, such as the right to buy deep learning prediction model scored the highest. The backtesting is based on 0.3% of the transaction cost.

First of all, the common deep learning model of stock selection performance. As shown, the ordinary deep learning stock selection model in the 500 constituent shares of the annual excess yield of 19.71%, the maximum return of excess returns-5.35%, the excess winning rate of 69.5%, information than 2.47.

In contrast, the risk-neutral deep learning model has a better stock selection performance. As shown, the risk-neutral deep learning stock selection model in the 500 constituent shares of the annual excess yield of 21.95%, the maximum return on excess returns-5.03%, the excess winning rate of 74.6%, information than 2.92.

The return drawdown of the hedging strategy for the year is shown in the following table. As you can see, the strategy receives positive excess returns every year, even in 2017, when the traditional market capitalization and reversal factors were underperforming, and the strategy gained a 6.93% per cent excess yield. (Note: Data for 2011 starts from the end of January 2011; 2018 Data ends in April 2018.) )

Because of the number of technical indicators in the stock selection factor, the strategy turnover rate is higher, the average turnover rate of each position is 70.8%, the annual turnover rate is 17.7 times times.

(iv) Strategic homogeneity analysis

The risk-neutral deep learning model presented in this report is not the same as the training objectives of the normal deep learning model during model training. The common deep learning model is to separate the stock areas with different yields on each time section, hoping to find stocks that can generate excess returns; the deep learning model of risk neutrality is the difference in the stock areas that affect the yield stripping risk factor after each section, We want to look for stocks that can generate excess returns after stripping the risk factors.

The strategies for training under different machine learning goals do not behave the same, as shown in overall, risk-neutral deep learning strategies perform better over the long term.

The correlation between the two strategies is relatively high, which shows the relationship between the IC of the common deep learning model and the risk neutral deep learning model IC. It can be seen that the performance of the risk-neutral deep learning model is highly correlated with the performance of the general deep learning model, and in general, the risk-neutral deep learning model behaves well when the normal deep learning model is good. The performance of the two strategies has strong homogeneity.

However, there are significant differences between the two types of deep learning models. When the portfolio size was 50 stocks, the two deep learning models averaged 21 coincident (41.9%) for each of the selected stocks; When the portfolio size was 100 stocks, the two deep learning models averaged 53.3 coincident (53.3%) for each period of the selected stock. As shown in. This shows that when the training target set for the machine learning model is different, the model will filter the stock from various angles.

Iv. Summary and discussion

The essence of machine learning is learning knowledge from data. The training sample of the common deep learning stock model will be influenced by the interval style wheel movement, and the model trained by the stock sample in the market of small market value style will be biased towards the small market value style, and the model trained by the stock sample obtained from the market with the big difference in the industry will deviate from the larger industry. In the model training, through the neutral treatment of risk factors, can alleviate to some extent the impact of the risk factor wheel movement on the model training, so that the training model has a more stable performance.

This report, through empirical analysis, confirms that the training of the deep learning stock selection model is less affected by the market capitalization factor after the risk factor is neutralized. Since 2011, the 500 of the stock hedging strategy in the annual yield of 21.95%, the largest drawdown-5.03%, winning rate of 74.6%, information than 2.92.

Risk-neutral deep learning stock selection strategy

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.