Common optimization algorithms for machine learning

Source: Internet
Author: User

1. Gradient Descent method

The gradient descent method is the simplest and most commonly used optimization algorithm. The gradient descent method is simple, and when the objective function is a convex function, the solution of the gradient descent method is the global solution. Under normal circumstances, the solution is not guaranteed to be the global optimal solution, gradient descent method is not necessarily the fastest speed. The optimization idea of gradient descent method is to use the current position negative gradient direction as the search direction, because this direction is the fastest descent direction of the current position, so it is also called the "steepest descent method". The closer the steepest descent method is to the target value, the smaller the step, the slower the progression.

In machine learning, two kinds of gradient descent methods are developed based on the basic gradient descent method, namely the gradient descent method and the batch gradient descent method.

Batch gradient descent: Minimize the loss function of all training samples, so that the final solution is the global optimal solution, that is, the parameters of the solution is to minimize the risk function, but for large-scale sample problems inefficient.

Stochastic gradient descent method: Minimizing the loss function of each sample, although not every iteration of the loss function is toward the global optimal direction, but the direction of the large whole is toward the global optimal solution, the final result is often near the global optimal solution, so that the large-scale training sample situation.

2, Newton and Quasi-Newton method

In essence, Newton's method is the second-order convergence, and the gradient descent is the first convergence , so Newton's method is faster . If more popular words, such as you want to find a shortest path to the bottom of a basin, gradient descent method each time only from your current position to choose one of the most slope direction, Newnedon method in the selection direction, not only will consider the slope is large enough, but also consider whether you take a step, the slope will become larger. So, it can be said that Newton's method than gradient descent method to see a little farther, can go to the bottom faster.

Advantages: Second order convergence, faster convergence rate;

Disadvantage: Newton method is an iterative algorithm, each step needs to solve the objective function of the Hessian matrix inverse matrix, the calculation is more complex.

Quasi-Newton method

The basic idea of quasi-Newton method is to improve Newton's method to solve the inverse matrix of complex Hessian matrix every time, it uses positive definite matrix to approximate the inverse of Hessian matrix, thus simplifying the complexity of operation. The quasi-Newton method, like the steepest descent method, knows the gradient of the objective function as long as each iteration is iterative. By measuring the change of gradient, a model of the objective function is constructed to produce the super-linear convergence. This method is superior to the steepest descent method, especially for difficult problems, and because the quasi-Newton method does not require second-order reciprocal information, it is sometimes more effective than Newton's method. Today, the Optimization software contains a large number of quasi-Newton algorithms to solve unconstrained, constrained, and large-scale optimization problems.

3. Conjugate gradient method

Conjugate gradient method is a method between the steepest descent method and Newton's method, it only needs to use the first derivative information, but overcomes the disadvantage of the steepest descent method convergence slow, and avoids the disadvantage that Newton method needs to store and compute the Hesse matrix and find the inverse, the conjugate gradient method is not only one of the most useful methods to solve the large scale linear equations, It is also one of the most effective algorithms for solving large-scale nonlinear optimization. In various optimization algorithms, the conjugate gradient method is very important. Its advantage is that the required storage capacity is small, has step convergence, high stability, and does not require any external parameters.

4. Heuristic Optimization method

Heuristic method refers to a method of discovering the problem of optimization based on the rule of experience. It is characterized by the use of past experience in the solution of problems, the selection of methods that have been effective, rather than the systematic and determined steps to seek answers. There are many kinds of heuristic optimization methods, including classical simulated annealing method, genetic algorithm, ant colony algorithm, particle swarm algorithm and so on.

There is also a special optimization algorithm called multi-Objective optimization algorithm, which is mainly aimed at simultaneous optimization of multiple targets (two and more than two) optimization problem, which is more classical algorithm has NSGAII algorithm, moea/d algorithm and artificial immune algorithm.

5. EM algorithm

The EM algorithm is a generic term for a class of algorithms. The EM algorithm is divided into two steps: E-step and M-step. EM algorithm has a wide range of applications, the basic machine learning needs to iteratively optimize the parameters of the model in the optimization can use the EM algorithm.

The thought and process of EM algorithm

The full name of E-step:e is exception, that is, the meaning of expectation. E-step is also the process of acquiring expectations. Based on the existing model, the results of each observation data input into the model are calculated. This process is called the expected value calculation process, the E process.

The full name of M-step:m is maximization, which means maximizing. M-step is also the process of maximizing expectations. After you get a round of expectations, recalculate the model parameters to maximize the expected value. This process is to maximize the process, that is, the M process.

Maximizing means that when we use this model we want the function we define to maximize the results, and the larger the result, the closer we get to the results we want. The goal of our optimization is also the function that can get the maximum value.

The Common EM algorithm is: Baum-welch algorithm of the hidden Markov model, training method of maximum entropy model, GIS algorithm and so on.

EM algorithm Results

The EM algorithm does not necessarily guarantee the global optimal solution, but if we optimize the objective function is a convex function, then the global optimal solution must be guaranteed. Otherwise, the local optimal solution may be obtained. Because if the optimized target function has more than one peak point, if it is optimized to one that is not the highest peak point, it will no longer continue, thus obtaining a local optimal solution.

Summarize

The EM algorithm only needs to enter some training data, define a maximization function, and then, after several iterations, we can read out the model we need.

Common optimization algorithms for machine learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.