Neural networks are invented when mimicking neurons or neural networks in the brain. So, to explain how to represent the model hypothesis, let's first look at what a single neuron is like in the brain. For example, our brains are filled with neurons, which are cells in the brain, with two points worth noting, one is that neurons have cell bodies, and two neurons have a certain number of input nerves. These input nerves, called dendrites, can be thought of as input wires that receive information from other neurons, and the neurons ' output nerves are called axons, which are used to transmit signals or transmit information to other neurons. In short, a neuron is a computational unit that receives a certain amount of information from the input nerve and makes some calculations, and then transmits the result through its axon to other nodes, or to other neurons in the brain. Right is a group of neurons, neurons use a weak current to communicate, these weak currents are also called action potentials, is actually some weak current, so if the neuron wants to send a message, it will be through its axis bursts send a faint current to other neurons. This is a nerve that connects to the input nerve or to another neuron's dendrites, and the neuron then receives the message to do some calculations. It has the potential to transmit its own messages on the axon to other neurons. This is the model of all human thinking: our neurons compute the messages we receive and pass information to other neurons. This is how we feel and how our muscles work, and if you want to live a muscle, it triggers a neuron to send pulses to your muscles and cause your muscles to contract. If some senses, say, the eye wants to send a message to the brain, it sends electrical impulses to the brain like this.
Neuron model:logistic Unit
In a neural network, or in an artificial neural network that we implement on a computer, we will use a very simple model to simulate the work of neurons. We simulate a neuron as a logical unit, and when we draw a yellow circle (the yellow small circle represents a single neuron, as shown), you should think of it as something that works like neurons, and then we pass it some information through its dendrites or its input nerves, and then the neurons do some calculations, and through its output nerve, its axon output calculation results, when drawing a chart like this, it represents the calculation of H (x), H (x) equals 1 divided by the negative θ transpose of 1+e multiplied by X. Typically, x and θ are parameter vectors. This is a simple model, even one that is too simplistic for simulating neurons. It is entered X1, x2, and X3, and then outputs some results similar to this. When drawing a neural network, it is usually only the input nodes x1, x2, and X3 are drawn. But sometimes you can do this: add an extra node x0, the X0 node, sometimes called a biased unit or a biased neuron, but because x0 is always equal to 1, it is sometimes drawn and sometimes not drawn, depending on whether it is advantageous to the example. Now we're going to talk about the last term on neural networks, and sometimes we say it's a neuron, an S-type function or a logical function as an excitation function of an artificial neuron. In neural network terminology, the excitation function is just another term for a similar nonlinear function, g (z), where g (z) equals 1 divided by 1 plus e-Z. So far, θ has been called the parameter of the model, and will probably continue to correspond to the term "parameters", rather than with neural networks. In the literature on neural networks, it is sometimes possible to see people talking about the weight of a model, which is actually the same thing as the parameters of the model.
Neural network is actually a collection of different neurons together, specifically, we input units x1, X2 and X3 (such as), sometimes can also draw an additional node x0. There's a neuron here, we write A1 (2), A2 (2), A3 (2). And then again, we can add a a0 here, and an extra skewness unit, whose value is always 1, and finally we have a third node on the last layer, which is the third node, which outputs the result of the hypothetical function h (x) calculation. Say a little bit more about neural networks the first layer in the network is also called the input layer, because we enter our features in this layer x1, X2 and X3, the last layer is also called the output layer, the middle layer is also called the hidden layer, hidden layer is not a very appropriate terminology, but intuitively we know in the supervised learning, You can see the input and you can see the correct output. The hidden layer value, you can not see in the training set. Its value is not X, nor y, so we call it a hidden layer. We will see later that the neural network can have more than one hidden layer, but in this case we have an input layer-layer 1th, a hidden layer-2nd layer and an output layer-3rd layer. But actually any layer that is not an input layer or a non-output layer is called a hidden layer.
In order to explain the specific computational steps of the neural network, some marks are to be interpreted, using a superscript (j), subscript I for the first I neuron or unit of layer J. Specifically, a superscript (2), subscript 1, represents the first activation value of the 2nd layer, the so-called excitation (activation) refers to a specific neuron read into the calculation and output value. Furthermore, the neural network is parameterized by these matrices, and the θ superscript (j) becomes a wave matrix that controls the action from the first layer to the second or second to the third layer. The first hidden unit calculates its value in this way: A (2) 1 equals the S function or S-excitation function, also called the logical excitation function, which acts on the linear combination of this input. The second hidden unit equals the value of the S function on this linear combination. The parameter matrix controls the mapping from three input units and three hidden units. So the dimension of θ1 will become 3. The θ1 will become a 3-by-4-dimensional matrix. More generally, if a network in the J Layer has SJ units, in the J+1 layer has sj+1 units, then the matrix θ (j), that is, the control of the layer J to the J+1 layer mapping of the Matrix dimension is sj+1* (sj+1). So the dimension of θ (j) is sj+1 row, sj+1 column.
Stanford University public Class machine learning: Neural Network-model Representation (neural network model and Neural Unit understanding)