1. Activate the function:
2. Hyper-parameter : A parameter that sets the value before starting the learning process, rather than the parameter data obtained by training. In general, it is necessary to optimize the super-parameters to select a set of optimal parameters to improve the performance and effect of learning.
3. Feature Extraction:
Feature Engineering: Use a single-hot code if the feature is a string type
characteristics of good features: 1. The eigenvalues should appear more than 2 times in the dataset in the form of non-0 values . should have a clear and unambiguous meaning. 3. features should not use the "magic" value 4. The definition of a feature should not change over time (note dependence on other machine learning systems)
4. feature combination: 1. linear learners can be well extended to a large number of data 2. The performance of some models will be limited without the use of feature combinations 3. using feature combinations + Large amounts of data is an effective strategy for learning highly complex models.
5. Reduce Losses:
6. Optimize the learning rate: if the optimization learning rate setting is relatively large, then one step crosses the point where the gradient descent method reaches the bottom, and each step jumps back and forth on the curve, climbing up the curve instead of falling down
regularization of the 7.L1:
- Sparsity: Avoids overfitting and reduces memory usage
- will be the weight of the L0 norm for punishment
regularization of the 8.l2 : Avoid overfitting.
- Reduce the complexity of the model: the sum of squares of weights
- Reduce very large weights
- For linear models: prefer a more gentle slope
- Bayesian priori probabilities: weights should be centered at 0 and weights should be normally distributed
the optimization algorithm of the training model is a function consisting of two items: one is the loss term, which is used to measure the fit between the model and the data, and the other is the regularization item, which is used to measure the complexity of the model ( 1. Model complexity as a function of the weights of all features in the model, and the complexity of the model as a function of the total number of features with non-0 weights)
9. Logistic regression : Two-dollar category, extremely efficient Giallo Computer System (many problems need to use probability estimates as output) two ways: "As is" "converted to two-dollar category" Application: Automatic diagnosis of disease (to investigate the risk factors that cause disease, and to predict the probability of disease occurrence according to risk factors), economic forecasts and other fields
Category: Evaluation indicators: accuracy and recall rates
deep Neural Networks: self-learning without the need for us to manually add parameters to it, using reverse propagation ( TensorFlow can be implemented to reduce the gradient of non-convex functions.
Reverse propagation considerations: 1. gradients are important 2. might go away . 3. may explode (learning rate is important)4.ReLu may disappear by 5. Discard regularization
Embedding: Adding dimensions to the model
two categories: ROC/AUC evaluation of the advantages of a two-valued classification
The composition of Face recognition system: Classification problem in machine learning and pattern recognition
The algorithm mainly consists of three modules: 1. Face Detection 2. Face Alignment 3. Facial Feature characterization
Ctr Pre-estimate method:FFM
Machine learning Algorithms