Machine-learning Course Learning Summary (1-4)

Source: Internet
Author: User

First, Introduction

1. Concept :

    • The field of study that gives computers the ability to learn without being explicitly programmed. --an older, informal definition by Arthur Samuel (for tasks that cannot be programmed directly to enable the machine to learn)
    • "A computer program was said to learn from experience E with respect to some class of tasks T and performance measure P, if Its performance on tasks in T, as measured by P, improves with experience E. (Machine does task T by practicing E to improve its performance P)

2. machine learning problems fall into two main categories: supervised learning (supervised learning) and unsupervised learning (unsupervised learning)

    • Supervised Learning: We is given a data set and already know what we correct output should look like, have th E idea-There is a relationship between the input and the output. (For input data, know exactly what the output data should be). Supervised learning is also divided into regression (regression) and classification (classification) problems. The regression problem is to map the input data to a continuous function (corresponding to the output data) and then to predict the result. The classification problem is to map the input data into discrete categories (corresponding to the output data) and then predict the results based on the discrete output values.
    • Unsupervised Learning: allows us to approach problems with little or no idea what we results should look like. We can derive structure from data where we don ' t necessarily know the effect of the variables. (I don't know what the result is, but you can try to find some connection to the data, for The result of the prediction, we can not judge whether it is right or wrong).

  Examples of non-supervised learning:

Clustering: Take a collection of $ essays written on the US Economy, and find a-to automatically Grou P These essays into a small number that is somehow similar or related by different variables, such as Word Frequen CY, sentence length, page count, and so on.

non-clustering: The ' Cocktail Party algorithm ', which can find structure in messy data (such as the IdentiFi cation of individual voices and music from a mesh of sounds in a cocktailparty).

linear regression of a single variable ( univariate linear regression )

1. concept :predict a single output value y from a single input value x. (Predict y with a variable x)

2. hypothesis function: Use a function to estimate the output value, here with H (x) to estimate Y, the function is as follows:

3. cost function: Used to evaluate whether the hypothesis function is accurate, similar variance (squared error Function/mean squared error), the function is as follows:

third, gradient descent (gradient descent)

1. concept: A-automatically improve the parameters of our hypothesis function

In short, gradient descent is the most appropriate parameter to find. Taking θ0,θ1 as the X, Y axis, we draw the cost function J, we want to make the cost function minimum, is to find the minimum value of the function J. The θ0,θ1 can be obtained by J to find the extremum point, thus finding the minimum value.

The gradient descent algorithm is as follows:

2. apply it to our linear regression above, that is:

Iv. linear regression of multivariable (multivariate linear regression)

1. The same as the univariate, just more parameters. Let's define some labels first:

: The first training sample, which contains a series of eigenvalues, is a column vector.

: J eigenvalues in the first training sample.

Θ: Parameter θ0,θ1 ... A column vector. (n*1)

M: Number of training samples.

N: The number of characteristic values.

So our training sample X is:

(m*n)

2. hypothesis function:

To write with a matrix is:

To write with the label we have defined above is:

3. cost Function:

To write with a matrix is:

(where y→ is worth a column vector for all y)

4. Gradient Drop:

To get the biased guide into the formula, you get:

5. feature Normalization (feature Nomalization):

Adjusting the input values to an identical range speeds up the gradient descent process, ideally adjusting each input value to [ -0.5,0.5],[-1,1].

Method:

feature Scaling (/s) and mean normalization(-μ), in statistics:

(μ is the characteristic mean, S is the standard deviation )

6. polynomial regression (polynomial regression):

Linear regression may not fit the data well, when the hypothesis function is added square, cubic, and radical to better fit the data

For example, if we hypothesis function ishθ (x ) =θ0+ θ1x Then we can create additional features based onx , to get the quadratic functionHTheta (x) =θ0 +θ1 x+ θ2 x^2 or the cubic functionHTheta(X)=Theta0+Theta1x +θ2 x^2+θ3 x^3

7. Normal equation:

Incessantly directly to the best parameters, directly solve the θ: (let bias 0, with inverse matrix solution)

where x ' x is sometimes irreversible, irreversible conditions are: eigenvalue redundancy, such as may be proportional to the two, can be removed one; too many features (m<=n), you can remove some

Comparison of the two methods:

Machine-learning Course Learning Summary (1-4)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.