Introduction to Machine learning (i) Basic concepts

Last Update:2014-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Shanghai Jiao Tong University Zhang Zhihua teacher's public course "Introduction to Machine learning", Course Link: http://ocw.sjtu.edu.cn/G2S/OCW/cn/CourseDetails.htm?Id=397 for three days, take notes. OK, straight to the subject. (i) Basic Concepts data Mining and machine learning essence is matter son, ML more close to mathematics. (In my eyes the ML is lower, the data mining, computer vision, NLP all use it) machine learning definition (Mike jodan) A field that bridge computations and statistics, with ties to information theory, signal processing, algorithms, Control theory and optimization theory.&nb Sp ML can be expressed in such a formula: ml=matrix + statistics + optimization + algorithm 1.definition data $ X=[x_1,..., x_n]^t_{(n\times p)}$ is a $n\times p$ matrix that contains n sample. Sample $x _i= (x_{1i},..., X_{pi}) $ is a p-dimensional vector that contains P features (features). For each sample, you can give a label $y _i$. For example, a person is a sample, height and weight is feature, sex is a label. Often, we want to predict the label of sample, that is, the input sample-> output label classification problem: The value of the label is limited, if the label has two (generally 0/1 or -1/+1), it is a two classification problem, Otherwise, it is a multi-classification problem. Regression problem: The value of the label is infinite, for example $y\in \mathbb{r}$. supervised learning: First given some sample (training samples) and their label, and then predicted the new sample. Classification and regression belong to supervised learning. 2.linear model\[y=x^t A\] The linear model predicts the label by a linear combination of feature, in other words, each feature is considered to have weights and a feature weighted sum to predict Y. To determine weight $a $, the most straightforward is to estimate by the least squares in the statistics, that is, minimize \begin{align*} &nbsP L=&\frac{1}{2}\sum_{i=1}^n (y_i-x_i^ta) ^2 \ =&\frac{1}{2}\|y-xa\|_2^2 \end{align*} by derivation, \[\frac{\partial l}{\partial a}=x^t (Y-XA) =0\] If $x^tx$ reversible, we can solve \[ a= (X^TX) ^{- 1}x^ty . \] When $n>p$, $,x^tx$ is generally reversible. But sometimes feature a lot, sample not so much, when irreversible, there is no single solution (underdetermined). You can then add a penalty $\lambda P (a) $ to $l$ (that is, loss function), where $\lambda>0$. Often we make $p (a) =a^ta$, the problem becomes Ridge regression (ridge regression): \[L (a) +\lambda p (a) = \frac{1}{2}\|y-xa\|_2^2+\frac{1}{2}\ Lambda a^ta \] derivative: \[\frac{\partial l}{\partial a}=x^t (Y-XA)-\lambda a=0\] At this time because the $x^tx+\lambda i_p$ is the positive definite matrix inevitable reversible , we have \[ a= (x^tx+\lambda i_p) ^{-1}x^ty \] so $\lambda $ The value of this number how can we give it? To do this, we need to divide the data into three categories: Training data (Training), Validation data (validation) and test data. Training data is used to learn $a$, test data is used to adjust the $\lambda$, test data is to predict the data (or to verify the final result). In addition, $p (a) =\|a\|_1=\sum_{i=1}^p\|a_i\|$ is also more common, then becomes lasso problem, that is \[ \frac{1}{2}\|y-xa\|_2^2+\frac{1}{2}\ Lambda \|a\|_1 \]With the 1 norm as a penalty has such a feature, it will make $a$ some of the items are 0, so that can play the function of automatic selection feature. 3. Maximum likelihood estimation (MLE) Note that, in the discussion just now, the $y$ we get using a linear model is continuous, so how do we use the classification problem? For example two classification problem $y\in\{0,1\}$, one of the simplest methods is given a $\alpha$, if $y<\alpha$, then $y=0$, otherwise $y=1 (0<\alpha<1) $. In order to have a more rigorous mathematical basis, it can be assumed that Y obeys a Bernoulli distribution, $\{y_i\} i.i.d. ~ber (\alpha) $, which is distributed by Bernoulli, Loss function: \[l= \prod_{i=1}^n p (y_i ) = \prod_{i=1}^n \alpha^{y_i} (1-\alpha) ^{(1-y_i)} \] We need to consider how to link $l$ with data $x$, and how to set $\alpha$. \begin{align*} f=&-ln L \\=&-\sum_{i=1}^n [y_iln\alpha+ (1-y_i) ln (1-\alpha)] \end{align*} order $$\alpha=\ Frac{1}{1+exp (-x^ta)},$$ $f $ becomes a function of $a$, and this problem becomes an optimization problem. In addition, the same can be added penalty (penalty) or regularization (regularization). 4. Unsupervised and semi-supervised before referring to p very large situation, in addition to home plus can also be reduced dimension, that is, by some kind of transformation, from $x\in{\mathbb{r}^p}$ to $z\in\mathbb{r}^q (p<min\ {p,q\}) $ into a new feature representation. dimensionality reduction can be divided into two ways: the first is through a linear transformation, that is, $ z=bx,b\in\mathbb{r}^{q\times p} $, such as PCA. The second type is the nonlinear $z=f (x) $. Unsupervised Learning: consider only sample. In addition to dimensionality reduction, another typical unsupervised is a clustering problem, and only samples does not have a label, it divides the samples into several categories by feature. No test data, training data points. semi-supervised learning: a small sample with a label, a large number of sample without labEl

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Introduction to Machine learning (i) Basic concepts

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Introduction to Machine learning (i) Basic concepts

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support