Depth Learning Face test questions _ machine learning

Source: Internet
Author: User
Tags svm

In order to prepare for the interview, so on the internet to collect some in-depth study interview questions, as well as their own interview process encountered some problems.

I interviewed for myself:

1 SVM Derivation, SVM Multiple classification method (1 to 1, 1 to many, Many-to-many), LR loss funtion derivation, decision tree meaning.
2 solve the fitting method. L1 L2 Detailed introduction to solve the gradient explosion/dispersion method
3 common CNN and introductions, innovation points for each classic model
4 own alchemy (modulation) skills
5 Kmeans AdaBoost
6 LR inside the formula why E (LR is the exponential group distribution and generalized linear model, logical regression is also a generalized linear model, so E is derived from the exponential distribution of the group)
7. The generation model can be solved by means of joint probability distribution, and the discriminant model does not need joint probability distribution.
8. The problem of sample imbalance can be solved by sampling and sampling, that is, many samples are reused by taking part of the sample and using less samples. or by modifying the loss function, modify the weight of the sample to give a less-powerful value to the sample.

9. Decision tree how to do regression. Make all nodes average.
10. Quasi-Newton method: Higertel expansion of the loss function to minimize the first order term and the second order term, that is, the derivative of the function to the parameter is 0, and the iterative formula is obtained.

summed up by others:

Summarize the key points of the top two links
1.BP SGD formula derivation, although I haven't been asked yet, I think it's very important.
2. Solves the network to fit the means to have some what AH dropout why can solve the fitting AH Batch-normalization's thought is what ah what to do when the category imbalance? Anchor in target detection What is the difference between the box's approach and the adaboost of the sliding window detection in human face detection? What's the difference between tracking and testing? Several frames have been used. The pros and cons of their analysis

Author: Xiao Bai in retreat
Source: Know
Copyright belongs to the author. Commercial reprint please contact the author to obtain authorization, non-commercial reprint please indicate the source.

3 The difference between CPU and GPU
4. CNN's most successful application is in the CV, and why many of the problems of NLP and speech can be solved with CNN. Why CNN was also used in Alphago. The similarity of these unrelated questions is where. By what means CNN captures this commonality.

The relevance of these unrelated issues is that there are local and global relationships, which are composed of low-level features, which form high-level features and have spatial correlations between different features. Low-level lines/curves and other characteristics, the combination of different shapes, and finally get the car's expression.
There are four main ways CNN captures this commonality: local connection/weight sharing/pooling operations/multi-level architecture. The local connection enables the network to extract the local characteristics of the data, the weight sharing greatly reduces the difficulty of the network training, one filter extracts only one feature, the whole picture (or the voice/text) of the convolution; the pooling operation, together with the multi-level structure, realizes the dimensionality reduction of the data, The low-level local features are combined into higher level features to represent the whole picture.

5 What kind of data collection is not suitable for in-depth study?
When the dataset is too small and the data sample is insufficient, depth learning has no obvious advantage over other machine learning algorithms.
The data set has no local correlation characteristic, the domain of the present good depth learning performance is mainly the image/speech/natural language processing and so on, one of these domains common is the local correlation. The pixels in the image are composed of objects, the phonetic signals are grouped into words, the words in the text data are combined into sentences, and once the combination of these feature elements is disturbed, the meaning of the expression is also changed. For data sets without such local dependencies, it is not appropriate to use the depth learning algorithm for processing.

6. What causes the gradient extinction problem?
In the training of neural networks, by changing the weights of the neurons and making the output of the network as close to the label as possible to reduce the error value, the BP algorithm is commonly used in training, and the core idea is to calculate the loss function value between the output and the label, then calculate its gradient relative to each neuron, and carry out the weights.
The loss of the gradient will cause the weight to be updated slowly and the model training is more difficult. One reason for the gradient disappearance is that many activation functions squeeze output values into very small intervals, with a gradient of 0 in the defined field at both ends of the activation function. Cause Learning to stop

7. The answer of Wen Yan Dong
Mainly divided into understanding and application first see if he can understand the physical meaning of each module of the depth network, the method of optimizing the network, and the application scenarios and advantages and disadvantages of various existing network structures. For example, the convolution layer shares the parameters of the purpose of the pooling layer of the role. What Finetune is. Dropout,bn and other functions. Why a network needs these to make up. Then see if he can work on his own task, the rational application of various existing technologies to solve. For example, the choice of network structure, should not be finetune, according to the results of cross test to improve the network, the addition layer or the deletion, do not converge how to find reasons, the choice of monitoring functions and so on. If all is the positive answer, personally think is the introduction. If we can find a problem to solve and a valuable research point, I think it is not just the entry level. For example, FCN's prediction are independent, whether or not to introduce constraints to their modeling (Oxford Group of work), the existing supervisory functions do not meet the requirements, reasonable introduction of other loss to assist (CUHK) or to propose more appropriate (Google)

Author: Wen Yan Dong
Source: Know
Copyright belongs to the author. Commercial reprint please contact the author to obtain authorization, non-commercial reprint please indicate the source.

8. Why the network is deep enough (neurons enough), always can avoid poor local Optima.
: The Loss Surfaces of multilayer Networks

Loss. What are the definitions (based on what?). ), what are the optimization methods, how to optimize, the respective benefits, and explanations.

Dropout. How to do, what use, explain.

11.Activation Function. Choose what, what good, why there is such a benefit.
Several main activation functions: Sigmond/relu/prelu

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.