Introduction to Machine learning (i) Basic concepts

Source: Internet
Author: User

Shanghai Jiao Tong University Zhang Zhihua teacher's public course "Introduction to Machine learning", Course Link: http://ocw.sjtu.edu.cn/G2S/OCW/cn/CourseDetails.htm?Id=397 for three days, take notes. OK, straight to the subject. (i) Basic Concepts data Mining and machine learning essence is matter son, ML more close to mathematics. (In my eyes the ML is lower, the data mining, computer vision, NLP all use it)  machine learning definition (Mike jodan) A field that bridge computations and statistics, with ties to information theory, signal processing, algorithms, Control theory and optimization theory.&nb Sp ML can be expressed in such a formula:  ml=matrix + statistics + optimization + algorithm  1.definition data $ X=[x_1,..., x_n]^t_{(n\times p)}$ is a $n\times p$ matrix that contains n sample. Sample $x _i= (x_{1i},..., X_{pi}) $ is a p-dimensional vector that contains P features (features). For each sample, you can give a label $y _i$. For example, a person is a sample, height and weight is feature, sex is a label. Often, we want to predict the label of sample, that is, the input sample-> output label  classification problem: The value of the label is limited, if the label has two (generally 0/1 or -1/+1), it is a two classification problem, Otherwise, it is a multi-classification problem. Regression problem: The value of the label is infinite, for example $y\in \mathbb{r}$.  supervised learning: First given some sample (training samples) and their label, and then predicted the new sample. Classification and regression belong to supervised learning.   2.linear model\[y=x^t A\] The linear model predicts the label by a linear combination of feature, in other words, each feature is considered to have weights and a feature weighted sum to predict Y.   To determine weight $a $, the most straightforward is to estimate by the least squares in the statistics, that is, minimize \begin{align*} &nbsP   L=&\frac{1}{2}\sum_{i=1}^n (y_i-x_i^ta) ^2 \     =&\frac{1}{2}\|y-xa\|_2^2     \end{align*} by derivation, \[\frac{\partial l}{\partial a}=x^t (Y-XA) =0\] If $x^tx$ reversible, we can solve \[  a= (X^TX) ^{- 1}x^ty    .   \] When $n>p$, $,x^tx$ is generally reversible. But sometimes feature a lot, sample not so much, when irreversible, there is no single solution (underdetermined).   You can then add a penalty $\lambda P (a) $ to $l$ (that is, loss function), where $\lambda>0$. Often we make $p (a) =a^ta$, the problem becomes Ridge regression (ridge regression): \[L (a) +\lambda p (a) = \frac{1}{2}\|y-xa\|_2^2+\frac{1}{2}\ Lambda a^ta    \] derivative: \[\frac{\partial l}{\partial a}=x^t (Y-XA)-\lambda a=0\] At this time because the $x^tx+\lambda i_p$ is the positive definite matrix inevitable reversible , we have \[  a= (x^tx+\lambda i_p) ^{-1}x^ty       \]  so $\lambda $ The value of this number how can we give it? To do this, we need to divide the data into three categories: Training data (Training), Validation data (validation) and test data. Training data is used to learn $a$, test data is used to adjust the $\lambda$, test data is to predict the data (or to verify the final result).   In addition, $p (a) =\|a\|_1=\sum_{i=1}^p\|a_i\|$ is also more common, then becomes lasso problem, that is \[ \frac{1}{2}\|y-xa\|_2^2+\frac{1}{2}\ Lambda \|a\|_1 \]With the 1 norm as a penalty has such a feature, it will make $a$ some of the items are 0, so that can play the function of automatic selection feature.   3. Maximum likelihood estimation (MLE)   Note that, in the discussion just now, the $y$ we get using a linear model is continuous, so how do we use the classification problem? For example two classification problem $y\in\{0,1\}$, one of the simplest methods is given a $\alpha$, if $y<\alpha$, then $y=0$, otherwise $y=1 (0<\alpha<1) $.   In order to have a more rigorous mathematical basis, it can be assumed that Y obeys a Bernoulli distribution, $\{y_i\} i.i.d. ~ber (\alpha) $, which is distributed by Bernoulli, Loss function: \[l= \prod_{i=1}^n p (y_i ) = \prod_{i=1}^n \alpha^{y_i} (1-\alpha) ^{(1-y_i)} \] We need to consider how to link $l$ with data $x$, and how to set $\alpha$. \begin{align*} f=&-ln L \\=&-\sum_{i=1}^n [y_iln\alpha+ (1-y_i) ln (1-\alpha)] \end{align*}  order $$\alpha=\ Frac{1}{1+exp (-x^ta)},$$ $f $ becomes a function of $a$, and this problem becomes an optimization problem. In addition, the same can be added penalty (penalty) or regularization (regularization).   4. Unsupervised and semi-supervised   before referring to p very large situation, in addition to home plus can also be reduced dimension, that is, by some kind of transformation, from $x\in{\mathbb{r}^p}$ to $z\in\mathbb{r}^q (p<min\ {p,q\}) $ into a new feature representation. dimensionality reduction can be divided into two ways: the first is through a linear transformation, that is, $ z=bx,b\in\mathbb{r}^{q\times p} $,  such as PCA. The second type is the nonlinear $z=f (x) $.   Unsupervised Learning: consider only sample. In addition to dimensionality reduction, another typical unsupervised is a clustering problem, and only samples does not have a label, it divides the samples into several categories by feature. No test data, training data points.    semi-supervised learning: a small sample with a label, a large number of sample without labEl                          

Introduction to Machine learning (i) Basic concepts

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.