Deep Learning (deep learning) Study Notes series (2)

Source: Internet
Author: User

Connect

 

Because we want to learn the expression of features, we need to know more about features or hierarchical features. So before we talk about deep learning, we need to explain the features again (haha, we actually see such a good explanation of the features, but it is a pity that we don't put them here, so we are stuck here ).

 

Iv. Features

Features are the raw material of the machine learning system, and the impact on the final model is beyond doubt. If the data is well expressed as a feature, the linear model can achieve satisfactory accuracy. So what do we need to consider for features?

4.1. granularity of Feature Representation

LearningAlgorithmWhat granularity does Feature Representation play a role? For an image, pixel-level features have no value at all. For example, the following motorcycles do not have any information at the pixel level, and they cannot distinguish between motorcycles and non-motorcycles. When features are structured (or meaningful), such as handle and wheel, it is easy to distinguish between motorcycles and non-motorcycles, learning algorithms can play a role.

 

 

 

4.2 elementary (superficial) Feature Representation

Since the pixel-level feature representation method does not work, what kind of representation is useful?

Before and after 1995, Bruno olshausen and David field served at Cornell University. They tried to study visual problems through both physiological and computer techniques.

They collected a lot of black and white landscape photos and extracted 400 small fragments from these photos. Each each part of the photo is 16 16 pixels, mark these 400 fragments are marked as s [I], I = 0 ,.. 399. Next, we will randomly extract another part from these black and white landscape images. The size is also 16x16 pixels. We may mark this Part as T.

The question they raised is how to select a group of fragments from these 400 shards, s [K], and combine them to form a new shard, the the new fragments should be as similar as the randomly selected target fragment T, and as few as possible. It is described in mathematical language:

Sum_k (A [k] * s [k]) --> T, where a [k] is the weight coefficient when the part s [k] is superimposed.

To solve this problem, Bruno olshausen and David field invented an algorithm called Sparse Coding ).

Sparse encoding is a process of repeated iterations. Each iteration involves two steps:

1) select a group of S [K], and then adjust a [k] So that sum_k (A [k] * s [k]) is closest to T.

2) fix a [K]. Among the 400 shards, select another more suitable shard S' [k] to replace the original s [K]. so that sum_k (A [k] * s '[k]) is closest to T.

After several iterations, the best s [k] combination was selected. Surprisingly, the selected s [k] is basically the edge line of different objects in the photo. These line segments are similar in shape, but the difference lies in the direction.

The results of the Bruno olshausen and David field algorithms coincide with the physiological findings of David Hubel and Torsten Wiesel!

That is to say, complex graphics are often composed of some basic structures. For example, a graph can be linearly expressed by using 64 orthogonal edges (which can be understood as the basic structure of orthogonal. For example, the X in the example can be adjusted by the weights of 0.8, 0.3, and 0.5 in three of the 1-64 edges. Other basic edges do not contribute, so they are all 0.

 

In addition, we also found that not only does the image have this pattern, but the sound also exists. 20 types of basic sound structures are found in the sound they have never been labeled, and the rest of the sounds can be synthesized from these 20 basic structures.

 

 

 

4.3 Structural Feature Representation

A small graph can be composed of a basic edge. It is more structured and complex. How can a conceptual graph be expressed? This requires a higher level of Feature Representation, such as V2 and V4. Therefore, V1 is pixel-level. In V2, V1 is pixel-level, which is progressive, and high-level expressions are composed of underlying expressions. The major is basis. The basis proposed by V1 is the edge, and the V2 layer is the combination of the V1 layer and the basis. At this time, the V2 area obtains the basis of the higher layer. That is, the result of the upper basis combination, and the upper layer is the combination of the upper layer basis ...... (For this reason, Daniel said that deep learning is "base-based". Because it is hard to hear, it is called deep learning or unsupervised feature learning)

 

Intuitively, it is to find the small patch of make sense and then combine it to get the feature of the previous layer and learn the feature recursively.

Training is performed on different objects, and the obtained edge basis is very similar, but the object parts and models will completely different (then we can tell whether car or face is much easier ):

 

in terms of text, what does a doc mean? Which of the following statements is suitable for describing one thing? I don't think a word is a pixel level. At least it should be a term. In other words, every Doc is composed of a term, but it is enough to express the concept, it may not be enough. You need to repeat the previous step to reach the topic level. With the topic, it is reasonable to go to the doc level. However, there is a big gap in the quantity of each layer, such as the concept represented by Doc-> topic (thousands-tens of thousands)-> term (hundreds of thousands)-> word (millions ).

When a person looks at a doc, his eyes will see the word. These words automatically cut words in the brain to form a term. In terms of concept organization, he will learn before and obtain the topic, then perform high-level learning.

 

4.4. How many features are required?

We know that we need to build features in layers from a simple perspective, but how many features should each layer have?

The more features a method provides, the more reference information it provides, and the higher the accuracy. However, many features mean that the computing is complex and the exploration space is large. The data that can be used for training is sparse on each feature, which may cause various problems. The more features, the better.

 

 

Now, we can finally talk about deep learning. We talked about why deep learning (enable machines to automatically learn good features without manual selection. There is also a reference to the layered Visual Processing System of people. We can conclude that deep learning requires multiple layers to obtain more abstract feature expressions. So how many layers are suitable? What architecture is used for modeling? How can we conduct unsupervised training?

 

Continue

 

 

 

Source: http://blog.csdn.net/zouxy09/article/details/8775488

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.