7.27 after the summer vacation, I started to run the deep learning program after I completed the financial project.
Hinton ran the article code on nature for three days, and then DEBUG changed the batch from 200 to 20.
Later, I started reading articles and felt dizzy.
It turns to: Deep Learning tutorials installs theano and so on, but the Python code debug is really disgusting.
Then go
Ufldl, and we can see that there is exercise.
Nine exercises were refreshed in five days.
About a year later, I saw it on Weibo.
@ Andrew Ng, translated by Deng Kan
Deep Learning Series (ufldl)
At that time, the cold water poured out. At that time, when I was reading the machine learning book, I thought English was not a problem. Why should I translate it?
I think reading should be a classic reading of good books; the content quality of many books in China is understandable to everyone.
I want to learn a new direction (such as ml, CV, and IP). My approach is to first find a medium document (which can be translated) framework/keyword that is familiar with knowledge; then find some classic books (mostly in English and occasionally there will be good Chinese Books) to read, and further study requires code reading paper. Code can also be carried out during reading, when I was reading PRML, after reading a chapter, I combined my own derivation/code and gained a lot.
Now let's take a look at the translation organized by Deng's predecessors, which is especially in line with the requirements of my getting started DL, and Ng's exercise is very hard and difficult.
For the first time, it's disrespectful. Salute to the translation team!
========================================================== ====
1. sparse autoencoder
In neural networks, sparsity is achieved by limiting the average activation of the Hidden Layer
Exercise result:
This shows the patch size of each line of the first layer of coefficient w. Think of eignfaces...
Bytes -----------------------------------------------------------------------------------------
2. vectorization
Vectorized programming; in MATLAB, The for loop is very slow. If the for loop is included in the cost function, the optimization will be very slow.
At first, I didn't take it for granted. In the next exercise, the program ran very slowly, optimized the sparse autoencoder code, and removed all for loops, the speed is increased by about 8 times.
Bytes -----------------------------------------------------------------------------------------
3. PCA and Whitening
Although the whitening part is relatively new to me, it is not difficult; PCA is a major method for dimensionality reduction; PCA reconstruction has also been explored by myself;
White has been seen on PRML, so I didn't pay too much attention to it. Image preprocessing is still very important.
Bytes -----------------------------------------------------------------------------------------
4. softmax Regression
Direct promotion of Logistic Regression multiclass classification. There is also a little story about this. I used to understand LR and see some engineer of Douban's mistake in SR deduction.
Mnist:
This accuracy is similar to the previous time when using one-vs-all logistic regression for kaggle (91.x%)
Bytes -----------------------------------------------------------------------------------------
5. Self-taught Learning
Use the mnist data of 5-9 to train an autoencoder to obtain the W1 B1 parameter.
Reshape W1:
Use W1 B1 to extract features 0-4
Then, use softmax regression to train a classifier (a lazy autoencoder is stolen for only 200 iterations)
Bytes -----------------------------------------------------------------------------------------
6. Implement deep learning networks for digital classification
The first deep network built in the true sense: the first two layers use sparse autoencoder to train feature I II, and finally use softmax regression to classify feature II.
The number of iterations must be set by yourself.
Bytes -----------------------------------------------------------------------------------------
7. Linear decoders with autoencoders
The value range of the sigmoid/tanh function is limited. The input data X can only be within the same range.
Linear excitation functions can be used at the output end to overcome this problem.
Bytes -----------------------------------------------------------------------------------------
8. convolution and pooling
Bytes -----------------------------------------------------------------------------------------
9. Sparse Coding
The sparse model does not talk much about sparse/low-rank model. It has read more than 20 articles on computer vision and has also coded several algorithms.
Click Next. The analytical solution requires matrix derivation and trace (AA ') Derivation to add convergence conditions.
From the Code provided by cost function, the first reconstruction difference of cost function needs to be divided by the number of patches. In fact, this division can achieve the same effect with the corresponding scale Lambda Gamma, cosnt * f (x) the optimal solution is the same as that of f (x.
========================================================== ==========
Thanks again to the translation team for better understanding of the native language than the English language.
Previously, only the Neuron Network was coded once,
-- When I was doing exercise6, my chest hurt for a while and I was actually working on a multi-layer network!
Next read some
The articles on readlist enrich the knowledge of neuron networks.
Autoencoder knowledge Uf...