Stanford Machine Learning Open Course Notes (14th)-large-scale machine learning

Source: Internet
Author: User

Public Course address:Https://class.coursera.org/ml-003/class/index 

INSTRUCTOR:Andrew Ng

1. Learning with large datasets ( Big Data Learning )

The importance of data volume has been mentioned in the previous lecture on machine learning design. Remember this sentence:

It is not who has the best algorithm that wins. It is who has the most data.

In the application of machine learning, it is helpful to increase the data size if there is an over-fitting condition, but as mentioned in the application suggestions of machine learning, sometimes it is not effective to simply add data. If you still remember the learning curve, it is not difficult to tell the following two situations:

The left side is obviously a large variance. In this case, we can increase the number of samples for improvement. However, if the deviation on the right side is large, we should increase more features for improvement.

2. Stochastic Gradient Descent ( Random Gradient Descent )

The most common practice for regression is gradient descent, for example:

The gradient descent is performed for all samples at the same time, also known as batch gradient descent. It can be seen that the calculation amount here is large, in order to reduce the calculation amount when processing big data, we introduce a random gradient descent function, which is a cost function for a single sample, unlike the previous cost function:

After the cost function is defined, the gradient descent process is as follows:

In the descent of the random gradient, all samples are not executed at once. Instead, some samples are randomly selected and the calculation workload is greatly reduced. However, there are some problems with the random gradient descent. From the picture on the right, each iteration cannot be guaranteed to fall toward the minimum value. At the same time, it cannot be guaranteed to stop when it reaches the minimum value, instead, it can be infinitely executed in a certain area, and the batch gradient can be decreased once for 1ToMAll samples are iterated, and m needs to be executed hereOperation. It can be said that random gradient descent has advantages and disadvantages.

3. Mini-Batch gradient descent ( Small Batch Gradient Descent )

This is another gradient descent method, which is different from batch gradient descent and random gradient descent. The comparison of the three methods is as follows:

During iteration, one target is all sample data, one is a single sample data, and the other isBSample Data. Examples of small batch gradient descent are as follows:


BFetch10That is, only10Sample processing, which can also reduce the calculation amount to a certain extent. Here, we can also use vectorized computing to further increase the calculation speed.

4. Stochastic Gradient Descent convergence ( Random gradient descent convergence )

In the drop of random gradient, the measure is the cost function for a single sample:

Similar to batch gradient descent, random gradient descent also has some problems in convergence, suchAlphaValue Selection:


ForAlphaGenerally, when the constant is viewed, the gradient decreases slowly.AlphaOr you can use the following formula to calculateAlphaValue:


5. Online Learning ( Online Learning )

Assume that you are running an online shipping service website. You can select the starting point and destination on the website. You have provided you with a series of shipping quotations, now we want to create a model to predict the probability of a user using the shipping service at a given price. This example has a feature that data samples come in the form of data streams. Real-time online modeling is required instead of offline modeling. This is similar to the random gradient descent, where only one sample is processed at a time.(Note: Not necessarily only1Depends on the data.), Can be changed at any time according to user preferences.


Once data is used, it can be discarded without saving all the data. This is another advantage of online learning. Other examples are as follows:

6. Map-Reduce and data parallelism ( Distributed Data Parallelism )

In fact, you can useMap-ReduceThe calculation is divided into multiple subcomputations and then merged. Here we can split the sample set for calculation, as shown below:


Each machine is only responsible for processing100Sample, and then run the merge operation to get the result of an iteration. A more visualized representation is as follows:


However, sometimes four computers are not required. If one computer has multiple cores, it can also run.Map-ReduceOperation:


It can be said thatMap-ReduceCan accelerate computing in many cases, whether it is linear regression or logistic regression.Map-Reduce.

-----------------------------------------Weak split line---------------------------------------

This lecture mainly introduces how to perform machine learning in big data. When the amount of data is too large, the gradient descent requires a large amount of computing. Therefore, the random or small batch gradient descent is used to reduce the calculation workload, the idea of processing online website data is adopted. Finally, we mentionedMap-ReduceIt can also improve the computing efficiency.Map-ReduceThought, should lookGoogleThis paper:

Http://research.google.com/archive/mapreduce.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.