On Github, Afshinea contributed a memo to the classic Stanford CS229 Course, which included supervised learning, unsupervised learning, and knowledge of probability and statistics, linear algebra, and calculus for further studies.
Project Address: https://github.com/afshinea/stanford-cs-229-machine-learning
According to the project, the repository aims to summarize all the key concepts of the Stanford CS 229 machine learning program, including:
- Important preparatory knowledge required for this course, such as probability and statistics, algebra and calculus, and other advanced courses.
- A memo on each machine learning domain and the tips and tricks you need to train your model.
- All the above elements are finally assembled in a memo.
VIP cheatsheets
In this section, the project provides a focus on supervised learning, unsupervised learning, deep learning, and machine learning skills, based on CS 229. Supervised learning mainly introduces the regression, classification and generation, unsupervised mainly introduces clustering and dimensionality reduction algorithm, deep learning summarizes three kinds of neural networks.
Supervised learning
Supervised learning, as shown below, introduces a very large number of basic concepts, including loss function, gradient descent, and maximum likelihood estimation. The loss function shows the commonly used least squares loss function, the folding loss function and the cross entropy loss function, and the image, definition and application algorithms of each loss function are shown therein.
Supervised learning section A total of four pages of memo, in addition to the general linear and Logistic regression, but also focus on the SVM, naive Bayesian and K neighbors and other non-parametric models. These are basically directly given definitions, so there is not too much redundant information, which is useful for machine learning developers and researchers as a reference.
In addition to the standard definition, many of the key concepts are also expressed in image representations, as shown in the supervised learning support vector machine:
The above definition clearly describes the definition of SVM, which wants to maximize the spacing between classification boundaries based on the support vector, and the classification model will be more stable. Basically a picture tells the basic idea of SVM, but also shows the classification principle, according to it again "recall" hinge loss function is easier.
Unsupervised learning
Unsupervised learning mainly recorded the EM algorithm, the clustering algorithm and the reduced dimension algorithm, in which the cluster also introduced the K-means clustering, hierarchical clustering and other clustering distance measurement methods, while the descending dimension algorithm mainly showed the principal component analysis method and the Independent component analysis method of the two.
In addition to the standard definition, the schematic diagram of these algorithms is also very important, as shown above in K-mean clustering, four graphs show the specific process of the algorithm. First the mean is randomly initialized, then the sample that is close to the mean is assigned to the class represented by the mean, and then the position of the mean is updated according to the error until the model converges. Principal component analysis also has a very good visualization, the following PCA will first normalized the data characteristics, and then based on the singular value decomposition to find the principal component, and finally all the data mapped to the main component to achieve dimensionality reduction.
Deep learning
Many readers have learned more about deep learning, especially the fully connected network, convolutional networks, and circular networks. This memo also shows the important concepts and definitions of these three networks, and describes some of the basic concepts of reinforcement learning, such as Markov decision making process, Behrman equation value iterative algorithm and Q learning.
We think it is very important in the figure CNN to calculate the formula for the size of the output feature graph, i.e. N = (w-f+2p)/S + 1. Where W represents the length and width of the input feature graph, F denotes the convolution kernel size, P indicates the number of 0 values to fill at each end, and S represents the convolution stride, so the calculated N represents the size of the output feature graph. This is important for designing convolutional networks, and we often need this formula to control the size of the feature map in the middle of the network.
Machine learning Skills
This memo shows some of the techniques in ML from classification, regression, model selection, and model diagnostics. The classification and regression are mainly from the perspective of measurement methods, that is, what kind of method can determine the quality of the model, and their specific properties. The same model selection and diagnosis also want to judge the quality of the model, but one is from the cross-validation and regularization of the angle of consideration, the other is from the angle of deviation and variance.
VIP refreshers
This section of the author provides a memo on refresher courses, including the introduction of probabilities and statistics, algebra and calculus.
Probability and statistics
Starting with permutations and combinations, this section introduces the concept definition of probability and statistics. including conditional probability, Bayesian rule, probability density function, probability distribution function and random variable mean and square difference. The following statistics also show a lot of definitions and rules, including the K-order moment of distribution, the distribution of common discrete and continuous random variables, and the data characteristics of sample mean, variance, covariance, etc.
Finally, the memo also records parameter estimation, which is one of the most critical concepts for machine learning, because machine learning essentially involves estimating the parameters of a model through a large number of samples, or "learning." Furthermore, the Gaussian distribution is so important that the central limit theorem of the last surface can give us an answer. That is, if the sample n is subjected to an independent distribution of samples, then when n approaches Infinity, the unknown distribution must be close to the Gaussian distribution.
Linear Algebra and Calculus
Matrix operations and differentiation are very important in the actual construction of the model, because whether it is traditional machine learning or deep learning, we actually use matrices and even tensor to understand their laws in order to understand the actual process of the model. In this memo, the author describes the definition of vectors and matrices, the definitions of various common matrix operations, and a large number of matrix concepts, such as traces of matrices, inverse of matrices, rank of matrices, positive definite and eigenvalues of matrices, and eigenvectors.
The basic concept of matrix differentiation is also shown above, as we are basically using matrix differentiation when updating parameters based on the inverse propagation. This also requires us to understand the Jacobian matrix and the Hessian matrix.
Resources | From Stanford CS229, the machine learning memorandum was assembled