8 neural network architectures that machine learning researchers need to understand

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In this article, I want to share with you 8 neural network architectures. I believe that any machine learning researcher should be familiar with this process to promote their work.

Why do we need machine learning?

Machine learning is necessary for tasks that are too complex for humans to encode directly. Some tasks are so complex that it is impractical for humans to explicitly calculate and code all the nuances. Instead, we provide a large amount of data to the machine learning algorithm, allowing the algorithm to solve the problem by exploring the data and searching for a model that can be programmed by the programmer.

Let's take a look at these two examples:

It is very difficult to write a problem-solving program, such as identifying a three-dimensional object from a novel perspective in a chaotic scene under new lighting conditions. We don't know what program to write because we don't know how it is done in our brains. Even if we know how to do it, this program can be very complicated.

It is difficult to write a program to calculate the possibility of credit card transaction fraud. There may not be any simple and reliable rules. We need to combine a lot of weak rules. Fraud is a moving goal, and programs need to change.

Then there is the machine learning method: instead of manually writing the program for each specific task, we collect a large number of examples to specify the correct output for a given input. The machine learning algorithm then uses these examples and generates a program that performs the task. The program generated by the learning algorithm may look very different from a typical handwriting program. It may contain millions of numbers. If we do it right, the plan applies to new cases and cases we train. If the data changes, the program can also be changed by training the new data. You should note that a lot of calculations are now cheaper than paying someone to write a specific task.

In view of this, some of the tasks that machine learning can best solve include:

Recognition mode: objects in real scenes, face recognition or facial expressions, spoken list

Identify anomalies: unusual credit card transaction sequence, nuclear power plant sensor reading anomaly mode

Forecast: Future stock prices or currency exchange rates, movies that will be of interest

What is a neural network?

Neural networks are a type of model in the general machine learning literature. For example, if you attend a Coursera course on machine learning, the neural network is likely to be covered. Neural networks are a specific set of algorithms that have revolutionized the field of machine learning. They are inspired by biological neural networks, and the so-called deep neural network has proven to be very effective. Neural networks are themselves general function approximations, which is why they can be applied to almost any machine learning problem, where the key to the problem is to learn the complex mapping from input to output space.

Here are three reasons to convince you to learn neural computing:

To understand the actual workings of the brain: it is very large and very complex, and it will die when you poke it. So we need to use computer simulation.

Learn about parallel computing styles inspired by neurons and their adaptive connections: this is a very different style than sequential computing.

Solve real problems with new learning algorithms: Learning algorithms are very useful, even if they are not the way the brain actually works.

After completing the famous machine learning Coursera course by Andrew Ng, I became interested in neural networks and deep learning. So I started looking for the best online resources to learn about these topics and found Geoffrey Hinton's Machine Learning Neural Network course. If you are a deep learning practitioner or someone who wants to enter the deep learning/machine learning world, you should really take this course. Geoffrey Hinton is without a doubt the godfather of the world of deep learning. And he provided something special in this course. In this blog post, I want to share eight neural network architectures in a course that I think machine learning researchers should be familiar with to advance their work.

Typically, these architectures can be divided into three specific categories:

Feedforward neural network

These are the most common types of neural networks in practical applications. The first layer is the input and the last layer is the output. If there are multiple hidden layers, we call them "deep" neural networks. They calculate the transformation of similarities between a series of change cases. The activity of each layer of neurons is a nonlinear function of activity in the lower layer.

2. Recurrent neural network

They have a direct loop in their connection diagram. This means that you can sometimes go back to where you started. They may have complex dynamics that may make them difficult to train. They are more biorealistic.

There is currently a lot of interest in finding an effective training regression network. Recurrent neural networks are a very natural way to model time series data. They are equivalent to a very deep network with a hidden layer per time slice; except that they use the same weight on each time slice and they are input at each time slice. They have the ability to remember hidden state information for a long time, but it is difficult to train them to use this potential.

3. Symmetric connection network

These are like recursive networks, but the connections between cells are symmetric (they have the same weight in both directions). Symmetric networks are easier to analyze than recursive networks. Because they obey the energy function, they are also more restricted in what they do. A network with no symmetric connections to hidden units is called the "Hopfield Network." A symmetric connection network with hidden units is called a "Boltzmann machine."

Sensor

Considering the first generation of neural networks, the perceptron is just a computational model of a single neuron. They were promoted by Frank Rosenblatt in the early 1960s. They seem to have a very powerful learning algorithm and do a lot of big publicity about what they can learn. In 1969, Minsky and Papers published a book called Perceptron, which analyzed what they could do and demonstrated their limitations. Many people believe that these restrictions apply to all neural network models. However, the perceptron learning process is still widely used today for tasks involving huge feature vectors of millions of features.

In the standard paradigm for statistical pattern recognition, we first convert the original input vector into a feature activation vector. Then we use common sense-based handwriting to define features. Next, we will learn how to weight each feature activation to get a single scalar quantity. If this number is above a certain threshold, we decide that the input vector is a positive example of the target class.

The standard perceptron architecture follows the feedforward model, which means that the input is sent to the neuron, processed and produces an output. In the image below, this means that the network is bottom-up: the input comes from the bottom and the output is output from the top.

However, perceptrons do have limitations: if you follow the manual selection feature and use enough features, then you can do almost anything. For binary input vectors, we can set a feature unit for each binary vector with many exponents, so we can make any possible distinction between binary input vectors. However, once the characteristics of manual coding are determined, there is a great limit to the learning of the perceptron.

This result is devastating to the perceptron because the focus of the entire pattern recognition is on the recognition pattern, despite conversions like translation. Minsky and Papert's "group invariance theorem" says that if the transformation forms a group, the part of the learning sensor cannot learn to do so. To handle this conversion, the perceptron needs to use multiple feature units to identify the transformation of the informational sub-mode. So the tricky part of pattern recognition must be solved by manually coding the feature detector instead of the learning process.

Networks without hidden cells are very limited in their ability to learn to model the input and output mapping. More levels of linear units do not work. It is still linear. Fixed output nonlinearity is not enough. Therefore, we need multiple layers of adaptive nonlinear hidden elements. But how do we train such a network? We need an effective way to adapt to all the weights, not just the last layer. This is very difficult. Learning to enter the hidden unit's weight is equivalent to the learning function. This is hard because no one directly tells us what the hidden unit should do.

Convolutional neural network

Machine learning research has always focused on object detection issues. There are many things that make it difficult for us to identify objects:

Split: The real scene is intermingled with other objects. It is difficult to tell which parts are part of the same object. Some parts of the object can be hidden behind other objects.

Illumination: The intensity of a pixel depends on the degree of illumination of the object.

Deformation: Objects are deformed in various non-affine ways. A handwritten can also have a big circle, or just a pointed one.

Availability: Object classes are usually defined by how they are used. For example, chairs are designed for sitting, so they come in a variety of shapes.

Opinion: Changes in perspective lead to image changes that standard learning methods cannot handle. Enter information jumps between dimensions (ie pixels).

Imagine a medical database in which the patient's age sometimes wants to reach the input dimension of the usual coded weight! In order to apply machine learning, we must first eliminate this dimension jump.

The replication feature method is currently the main method for neural network to solve the target detection problem. It uses many different copies of the same function detector in different locations. It can also be replicated in size and direction, which is tricky and expensive.

Replication greatly reduces the number of parameters available to learn. It uses several different feature types, each with its own copy detector map. It also allows each image block to be represented in multiple ways.

So how is the replication feature detector implemented?

Equivalent activity: The nature of the copy does not cause the invariant transformation of neural activity. These activities are equivalent.

Invariant knowledge: If a feature is useful at some point during training, the detector for that feature will be available at all locations during the test.

In 1998, Yann LeCun and his collaborators developed a handwritten digit recognizer called LeNet. It uses backpropagation in a feedforward network with many hidden layers, copying many of the mappings of cells in each layer, pooling the output of nearby copy cells, even if they overlap, can handle a wide range of characters at a time, and smart Train a complete system, not just a recognizer. Later, it was named a convolutional neural network. Interesting fact: This network is used to read about 10% of checks in North America.

Convolutional neural networks can be used for all work related to object recognition, from handwritten numbers to 3D objects. However, downloading real objects from color photos on the web is much more complicated than recognizing handwritten numbers. There are 100 times the class (1000 to 10), 100 times the pixels (256 x 256 colors vs 28 x 28 gray), 3D scenes of 2D images, messy scenes need to be segmented, and multiple object images in each object . Does the same type of convolutional neural network work?

The ILSVRC-2012 competition was then conducted on ImageNet, which contained approximately 1.2 million high-resolution training images. The test image will show no initial annotations (no splits or labels) and the algorithm will have to generate labels for what objects are present in the specified image. Some of the best computer vision methods are tested by a computer vision group from Oxford, INRIA, XRCEa. Often, computer vision systems use complex multi-level systems and early are usually manually adjusted by optimizing several parameters.

The winner of the contest, Alex Krizhevsky (NIPS 2012), developed a very complex neural network pioneered by Yann LeCun. Its architecture consists of seven hidden layers, not including some hybrid pools. The early layers were convoluted, while the last two layers were globally connected. The activation function is corrected to a linear unit in each hidden layer. These trainings are faster and more expressive than logistics units. In addition, when nearby units have stronger activities, it also uses competitive standardization to suppress hidden activities. This contributes to changes in strength.

There are several techniques that can significantly increase the generalization of neural networks:

Randomly extract 224 x 224 patches from 256 x 256 images to get more data and use left and right reflections of the image. In testing, combine the opinions of 10 different patches: 4 224 x 224 corner patches plus a central 224 x 224 patch plus the reflection of these 5 patches.

Use dropout to adjust the weight (including most parameters) in the layer of the global connection. Halfway out means that half of the hidden units in each training instance are randomly removed. This prevents hidden units from relying too much on other hidden units.

In terms of hardware requirements, Alex uses a very efficient convolutional network implementation on two Nvidia GTX 580 GPUs (more than 1000 fast small cores). The GPU is well suited for matrix matrix multiplication and also has a very high memory bandwidth. This allows him to train the network within a week and quickly combine the results of 10 patches with the test. If we can communicate state fast enough, we can extend the network on many cores. As the core becomes cheaper and the data set grows larger, large neural networks will increase faster than older computer vision systems.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More