Machine Learning Public course note Fifth Week optimization machine learning algorithm

Source: Internet
Author: User

A method to improve the accuracy of machine learning algorithms

When our machine learning algorithms do not accurately predict our test data, we can try to improve the accuracy of our machine learning algorithms by the following methods

1), get more training examples

2), reduce the number of features

3), increase the number of features

4), add polynomial features

5), increase or decrease \ (\lambda\)

Second, evaluate the machine learning model

If we just use a training set alone, we are not very good at evaluating the machine the algorithm is not accurate, because it may be over-fitting (Overfitting), we can divide the test set into two datasets

Take 70% as a training set, 30% as a test set

1), using the training set to learn, get make \ (J (\theta) \) the smallest \ (\theta\)

2), using the test set to evaluate the accuracy of the algorithm

Methods for evaluating the accuracy of algorithms

1), linear regression, \ (J_{test} (\theta) = \dfrac{1}{2m_{test} \sum_{i=1}^{m_{test}} (H_\theta (x^{(i)}_{test})-y^{(i)}_{test}) ^ 2\)

2), logistic regression, \ (Err (H_\theta (x), y) = \begin{matrix} 1 & \mbox{if} h_\theta (x) \geq 0.5\ and\ y = 0\ or\ h_\theta (x) < 0. 5\ and\ y = 1\newline 0 & \mbox otherwise \end{matrix}\)

\ (\text{test Error} = \dfrac{1}{m_{test}} \sum^{m_{test}}_{i=1} err (H_\theta (x^{(i)}_{test}), y^{(i)}_{test}) \)

Three, the choice of machine learning algorithm model

If you have more than one machine learning algorithm model to choose from, you can divide the dataset into three parts, 60% training sets, 20% cross-validation, 20% test sets

1), using the training set to learn, get each model to make \ (J (\theta) \) the smallest \ (\theta\)

2), select the model that minimizes the test error of the cross-validation set

3), using the test set to evaluate the second step of the selected model of the generalization error to see if it meets our requirements

Four, deviation (Bias or underfitting) and variance (Variance or Overfitting)

How can we improve the accuracy of our models when our machine learning model does not meet our requirements? Although there are many methods, but can not be tried in turn, all methods either solve high variance or solve high deviations, so we first determine whether our model is high deviation or high variance

In linear regression, when we increase the maximum d of the assumed function square feature x, the deviations and variances are as shown in the change, high deviation when \ (j_{train}^{(\theta)} \approx j_{cv}^{(\theta)} \), Gaofangcha when \ (j_{cv}^{(\ Theta)}\) much larger than \ (j_{train}^{(\theta)} \)

Machine Learning Public course note Fifth Week optimization machine learning algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.