Basic methods and practical techniques used in the design of BP neural network

Last Update:2015-08-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Although the research and application of neural network has been very successful, but in the development and design of the network, there is still no perfect theory to guide the application of the main design method is to fully understand the problem to be solved on the basis of a combination of experience and temptation, through a number of improved test, finally selected a better design. The following are the basic methods and practical techniques used in the development of neural networks.

(1) Network information capacity and training sample number

The classification ability of multilayer neural networks is related to the network information capacity, if the weights and thresholds of the network represent the network information capacity, research shows that the training sample number n and the given training error ε should meet the following matching relationship:

N = NW/ ε.

The above indicates that there is a reasonable matching relationship between the information capacity of the network and the number of training samples. The number of training samples is often difficult to meet the above requirements when solving practical problems. For the number of samples, the network parameters are not enough to express all the rules contained in the sample is not fit, and the network parameters are too many, it may have occurred, resulting in the network's generalization ability weakened. Therefore, by the above-set network structure and error requirements, you can evaluate the approximate number of training samples required.

(2) Preparation of training sample set

The preparation of training data is the basis of network design and training, the scientific rationality of data selection and the reasonableness of data representation have a very important influence on network design. Data preparation is divided into the collection of raw data, data analysis, variable selection and data preprocessing and many other steps.

the selection of the input and output quantity.

Generally speaking, the output quantity represents the function target that the system realizes, often is the system performance index, the classification question category attribution, or the nonlinear function function value and so on. The input must select variables that have a large impact on the output and can be detected or extracted, as well as requiring little or no correlation between the input variables. If a variable is appropriate for the network input is not sure, you can train the two networks containing and does not contain the input variable, compare its effect. From the input and output of the nature, can be divided into two categories: one is a numerical variable, a class of language variables. Numerical variables refer to the continuous and discrete quantities determined by the numbers. The language variable is a concept expressed in natural language, whose "language value" uses natural language to identify the various properties of things, such as color, gender, size, etc. When you select a language variable as a network input or output, you need to encode the language value into discrete numerical quantities.

the extraction and representation of the input amount.

It is often necessary to use signal processing and feature extraction techniques to extract some characteristic parameters that reflect their characteristics from the original data as input to the network. Typical cases include text symbol input, curve input, function argument input, and image input. Character input is often encoded according to the characteristics of the character to be identified and then as a network input, the curve input is usually discretized sampling, in order to satisfy the Shannon sampling theorem under the premise of the inferior interval sampling, but also according to the idea of wavelet or short-time Fourier transform in the area of large curve changes in the place of the gap, relaxed interval function argument input is directly using the self-variable of the curve to be fitted as the network input, and the image input seldom uses the gray value of the pixel as the network input, usually extracts some useful characteristic parameters from the image according to the specific purpose of the recognition, then filters the input contribution according to these parameters. And these feature extraction belongs to the category of image processing.

the representation of the output amount.

0,1 or -1,1 is represented as two classification, multiple finite discrete values are expressed as multiple classifications, and [0,1] is represented as logistic regression; Continuous values are represented as fit.

preprocessing of input and output data.

scale Transformations , also called normalization or normalization (normalization). The reason for the scale transformation is that the input data of the network often has different physical meanings and different dimensions, and the scale transformation is the same important position to the input components from the beginning of the network training, and the neural network neurons all use the sigmoid excitation function, which can prevent the output saturation after 0~1. Sigmoid the output of the excitation function is 0~1, as the output data of the teacher signal is not transformed, it is bound to make the absolute error of the large output component large, the output component with small value has small absolute error, which will result in uneven weight adjustment. The scale transformation is a linear transformation, when the sample distribution is unreasonable, the linear transformation can only unify the variation range of the sample data, but not change its distribution law. Suitable for network training sample distribution should be more uniform, the corresponding sample distribution curve should be relatively flat, when the sample distribution is not ideal, the most commonly used is the distribution transformation . such as logarithmic transformation or square root, cubic root and so on. Because the transformation is nonlinear, the result not only compresses the range of the data change, but also improves its distribution law.

The design of the training set .

determine the number of training samples . In general, the more training sample number, the more correctly the training result reflects its inherent law, but the collection and collation of samples are often limited by objective conditions. In addition, when the number of samples to a certain degree, the accuracy of the network is also difficult to improve. Practice shows that the number of samples required for network training depends on the complexity of the input-output nonlinear mapping relationship, the more complex the mapping relationship, the larger the noise in the sample, and the larger the number of samples required to ensure the mapping accuracy, and the greater the network size. Therefore, it is possible to refer to the rule of thumb: the number of training samples is 5~10 times the total number of network connection rights. selection and organization of samples . The law of extracting in network training is contained in the sample, so the sample must be representative. Sample selection should pay attention to the balance of sample categories, as far as possible to make the number of each category roughly equal. Even samples of the same class take care of the diversity and uniformity of the sample. The sample chosen in accordance with this "egalitarian" principle allows the network to be well informed during training and avoids the network's "impression" of a large number of samples, while the category "impression" is less than the number of samples. The same kind of sample concentration will make the network training tend to only establish its matching mapping relationship, and when another class of different input, the weight of the adjustment to the new mapping relationship and the previous training results negative. When the various types of samples in the rotation of the input, network training will appear oscillation to prolong the training time.

(3) Design of initial weight value

The initialization of network weights determines how the network training starts at one o'clock of the error surface, so the initialization method is critical to shorten the training time of the network. The transformation functions of neurons are all about 0-point symmetry, and if each node's net input is near 0 points, its output is at the midpoint of the transformation function. This position is the most sensitive place of its transformation, which inevitably makes the network learning speed fast. In order to make the initial net input of each node near 0 points, there are two ways to use: one is to make the initial weight is small enough, the second is to make the initial weight of +1 and 1 of the number equal. In the application of the hidden layer weights can be the first method, and the output layer can use the second method. If the weight of the output layer is too small, it will make the hidden layer weight in the initial adjustment of the training is small, so the second weight and the net input to take into account the method.

(4) structure design of neural network

After the training sample problem is solved, the network input layer node and the output Layer node point are determined. Therefore, the neural network structure design mainly solves the problem of setting several hidden layers and setting several nodes in each hidden layer. The following are the experiences that neural network designers have accumulated through a great deal of practice.

the design of the hidden layer number.

The theory proves that the perceptron of the tow layer can map all continuous functions, and only two hidden layers are required when learning discontinuous functions (such as sawtooth waves), so the multilayer perceptron needs two hidden layers. When designing multilayer perceptron, it is generally considered that a hidden layer should be set first, and when a lot of hidden nodes in a hidden layer still can't improve the network performance, we should consider adding a hidden layer. The experience shows that when two hidden layers are used, it is advantageous to improve the performance of multilayer feedforward networks if more hidden nodes are set in the first hidden layer and fewer hidden nodes are set in the second layer. In addition, for some practical problems, the number of hidden nodes required to use a double-hidden layer may be less than the number of hidden nodes required by the single hidden layer. Therefore, for the hidden node still can not significantly reduce the training error situation, should try to increase the number of hidden layers.

the design of the number of hidden nodes. The function of the hidden node is to extract and store the intrinsic law from the sample, each hidden node has several weights, and each weight value is a parameter to enhance the network mapping ability. The number of hidden nodes is too small, the ability of the network to obtain information from the sample is poor, not enough to summarize and embody the concentration of the sample law, the number of hidden nodes too many, may be the sample non-regular content such as noise, also learn to memorize, resulting in overfitting, resulting in reduced network generalization ability. A common method to determine the optimal number of hidden nodes is to test the method, to set up fewer hidden nodes to train, then gradually increase the number of hidden nodes, using the same sample set to train, to determine the minimum network error corresponding to the number of hidden nodes. Some empirical formulae for determining the number of hidden nodes can be used when using the test method. The number of hidden nodes calculated by these formulas is only a rough estimate and can be used as the initial value of the test method. where m is the hidden layer node, n is the number of input layer nodes, the number of nodes in the output layer, and α is the constant between 1~10.

m = sqrt (n+l) +α

m = log2 (n)

m = sqrt (NL)

(5) Network training and testing

After the network design is complete, the design values should be used for training. At the time of training, all samples are running round and reverse modifying weights once called a training. Sample set data is used repeatedly during training, but it is best not to take data in a fixed order in each round. Training a network usually takes thousands of times.

the performance of the network mainly depends on whether it has a good generalization ability, and the test of generalization ability can not use training data, but should use the verification (validation) set outside the training data. Only trained networks perform well on the validation set to demonstrate that the network is well trained, so it is often used to terminate the training process with error accuracy on the validation set, rather than using the error accuracy on the training set. If the network has a very small error on the training set, and the error on the validation set is very large, it is possible that overfitting has occurred. The greater the number of training sessions, the smaller the error on the training set, and the smaller the error on the validation set, and therefore the best training times. In the implementation, it is often possible to determine the change of validation set error as the standard for terminating training. For example, set on the verification set, the error after the training is 20% greater than the previous training error, it is considered that the result of the previous training is the best, then retain the network weights of the previous training as a result of training.

Basic methods and practical techniques used in the design of BP neural network

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Basic methods and practical techniques used in the design of BP neural network

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Basic methods and practical techniques used in the design of BP neural network

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support