Machine Learning Public Course notes (1)

Source: Internet
Author: User

Preliminary introduction

Supervised learning: Given a DataSet and know what its correct output should be like, feedback (feedback), divided into

    • Regression (regressioin): Map input to a continuous output value.
    • Classification (classification): Map output to discrete output values.

Unsupervised learning: Given a data set, it is not known what the correct output is, no feedback, divided into

    • Cluster (clustering): Examples:google News, computer clustering, Markert segmentation.
    • Association (associative): Examples: Estimates the condition based on the patient's characteristics.
Unary linear regression

Hypothesis (hypothesis): $h _\theta (x) =\theta_0+\theta_1 (x) $

Parameters (Parameters): $\theta_0, \theta_1$

Cost function: $J (\theta_0, \theta_1) = \frac{1}{2m}\sum\limits_{i=1}^{m}\left (H_\theta (x^{(i)})-y^{(i)}\ right) ^2$, least squares

Objective function (Goal): $\min\limits_{\theta_0, \theta_1}j (\theta_0, \theta_1) $

Gradient Descent algorithm (Gradient descent)

Basic idea:

    • Initializes $\theta_0, \theta_1$
    • adjusts $\theta_0, \theta_1$ until $j (\theta_0, \theta_1) $ reaches the minimum, updates the formula ($\theta_j = \ Theta_j-\alpha\frac{\partial}{\partial \theta_j}j (\theta_0, \theta_1) $)

For unary linear regression problems, the partial derivative of $j (\theta_0, \theta_1) $ is obtained
$$\frac{\partial j}{\partial \theta_0} = \frac{1}{2m}\sum\limits_{i=1}^{m}2\times\left (\theta_0 + \theta_1x^{x (i)}-y ^{(i)} \right) = \frac{1}{m}\sum\limits_{i=1}^{m}\left (H_\theta (x^{(i)})-y^{(i)} \right) $$
$$\frac{\partial j}{\partial \theta_1} = \frac{1}{2m}\sum\limits_{i=1}^{m}2\times\left (\theta_0 + \theta_1x^{x (i)}-y ^{(i)} \right) x^{(i)} = \frac{1}{m}\sum\limits_{i=1}^{m}\left (H_\theta (x^{(i)})-y^{(i)} \right) x^{(i)}$$
Thus the parameter $\theta_0, the \theta_1$ update formula is
$$\theta_0 = \theta_0-\alpha\frac{1}{m}\sum\limits_{i=1}^{m}\left (H_\theta (x^{(i)})-y^{(i)} \right) $$
$$\theta_1 = \theta_1-\alpha\frac{1}{m}\sum\limits_{i=1}^{m}\left (H_\theta (x^{(i)})-y^{(i)} \right) x^{(i)}$$
Where $\alpha$ is called the learning rate (learning rates), if it is too small, the algorithm converges too slowly; conversely, if it is too large, the algorithm may miss the minimum value, or even not converge. Another thing to note is that, above $\theta_0, \theta_1$ 's update formula uses all the data in the dataset (called "Batch" Gradient descent), which means that for every update, we have to scan the entire data set, Causes the update to be too slow.

Review of linear algebra
    • Matrix and Vector definitions
    • Matrix addition and multiplication
    • Matrix-Vector Product
    • Matrix-matrix Product
    • Properties of matrix Multiplication: Binding law, Exchange law not established
    • Inverse and transpose of matrices: matrices without inverses are called "singular (singular) matrices"
Reference documents

[1] Andrew Ng Coursera public class first week

Machine Learning Public Course notes (1)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.