Machine Learning Overview

Source: Internet
Author: User

First, what is machine learning?

1. Overview

A, machine learning is a more generic concept

b, Do you think that machine learning and artificial intelligence, data mining is much like?
(1) machine learning is a sub-direction of artificial intelligence
(2) machine learning is a way to realize data mining

C, Three ways of data mining
(1) The database --- emphasizes efficiency, that is, how to organize information so as to find information quickly and efficiently.
(2) machine learning ---- emphasis on effectiveness, in data analysis
(3) Statistical methods ------ Emphasis on correctness, more is to provide theoretical support for machine learning algorithms

D, give an example


Give some apples and bananas, people will be judged by the characteristics of the distinction, and memory of these characteristics, the next time a new Apple or banana, you can determine whether it is bananas or apples.

The left column is the input, the upper column is the output, and the data in this table is equivalent to the trained model.

2. Mathematical expression

A, an algorithm, or a mathematical problem that is executed by a program
(1) Enter a vector of vectors----multiple features, or a matrix .
( If only one eigenvector is a weakened matrix concept, unity is called a matrix )
(2) output ---- The rule of distinguishing by judging the characteristics

b, at present, the core technology of machine learning is matrix-based optimization technology
– Input : Matrix – Information to learn
– Output : Model – summed up the rules

3. Formal representation

A, the mainstream machine learning technology can be formalized as

--- input : feature matrix X, callout vector y

(X is a feature matrix , excluding "sample name" and "Sample Callout")

(y is the callout vector , which is the "Sample callout" column )

--- output : model Vector w

( expected : X· W as close as possible y)

( multiple optimization algorithms can be solved W, The difference is how to define "as close as possible")

b, for example: Ctr(CTR) to calculate ads


There are only two features in the example,query keyword and "Whether floating red"- feature input

The goal is to guess whether the target ad will be clicked- model output based on known AUC characteristics

Ii. What are the general machine learning steps?

1, problem analysis

Input and output are available, specifically to solve this problem, you can choose the relevant algorithm
(1) the preceding definition has such an implicit key point: X W as close as possible y
(2) How to define this "approach", different ideas, the algorithm is different
--LR (Logistic Regression)
--SVM(supportVectormachine)
(3) Practical Example analysis (or the Apple example, because it's simple)

A. visualizing the expression


After that, the use of logistic regression is actually through a batch of sample data to train the coefficients, a matrix of equation coefficients, then the unknown classification sample as input, combined with the trained matrix, the output is the result of classification, can be imagined as a curve simulation.


B. mathematical objective function representation


The definition of " close " is different, academic speaking, is "optimization goal is different", whether or not the same, we can at least choose an optimization goal, with the optimization goal, the following is how to solve this optimization goal

General now use the idea is mainly l-bfgs,CDN,SGD.

2, using machine learning method to solve the problem

Optimization steps:

(1) algorithm --- optimization goal + optimization algorithm

(2) data --- as much as possible with the actual distribution of data + as much data as possible

(3) feature --- contain as much information as possible for the object to be identified

Optimization algorithm:

From the perspective of optimization objectives (academia, discover new algorithms)

From the perspective of the optimization algorithm (industry, based on the existing algorithm to expand)
(1) Smaller computational costs --- machines can do other things.
(2) faster convergence --- can better put the results into use
(3) better parallel --- solve problems with existing big data frameworks

Training data:

(1) Same distribution

---the sample data and the actual data distribution is very different, then the results of your algorithm is basically no use, that is, the poor generalization ability.

(2) sufficient data

--- because the expectation is often the total error is the smallest, it is likely that the data volume of the category of small discrimination errors, so the more sufficient data, training is more adequate, the lower the rate of miscarriage.

Characteristics:

Requirements: A large quantity, high quality.

There is a disease that is easy to feel in a high and fat population, and a short, thin crowd. Then arbitrarily from a feature perspective, such as obesity, then the fat part of the people, the probability of this disease is 50%, not fat is also 50%, then "fat" this feature does not express, so the quality of this feature is 0.

An engineering problem, characterized by often the most determined

An optimization of the algorithm may only optimize the 2~3% effect, but the feature may be 50~60% .

A project seven or eight of the energy is often spent on feature selection, but the high quality is not so easy to be found, such as Baidu CTR Forecast, the characteristics of the data is currently at the billion level, through the quantity to compensate for the quality, So how to improve the quality of the features, the answer is deep learning, so let's look at how deep learning is to extract high-quality features.

Iii. What is the difference between machine learning and deep learning?

1. Introduction

Called Deep learning So there must be a relative shallow learning(machine learning aliases), the biggest difference is the number of feature layers.


The current machine learning algorithms are mainly one layer, which is to infer the results directly from the features.

The result of deep learning and human's treatment problem is closer, it is the result of some middle layer deduced by the characteristic, and then deduce the final result, that is, the characteristic is just a kind of external performance, how to excavate the characteristic is how to express, actually this is a problem of characteristic expression force, The traditional method features expressive force, which is inferior to the more effective expression of the deep learning Multi-layer learning.

2, the granularity of the characteristic representation


In a picture, pixel-level features have no value at all and are not able to differentiate between positive and negative examples of motorcycles, and if the feature is a structure (or meaning), such as whether it has a handlebars(handle), whether it has Wheel(wheel), it is easy to distinguish between positive and negative examples, the learning algorithm can play a role

Digression: Why is LDA(a thematic model) justified?

From the text, the concept of text, or say a word, What is the meaning of a doc, what is the appropriate way to express?

The simplest is to use a single word, that is, we commonly used word bag model, but this is only easy to simplify, the word is pixel-level, at least it should be term, in other words, each Doc is by term composition, but the ability to express the concept is enough?

The

may not be enough, you need to go one step further to reach topic class, with topic doc doc ->topic (Thousand - ->term ( 10 ->word (million levels)

This is the rationality of the thematic model from the deep learning point of view

Deep meaning is the multilayer neuron network, each layer represents a level of concept, the more the lower the concept of orthogonality, the more the concept of the orthogonality of the worse, the more similar degree. Because high-level concepts may contain the same basic structure to each other.

High-level concepts separate the basic structure of the method is also easy to understand, is to break up the cluster, such as Doc can be done by LDA method to make Topic , a limited number of Topic you can describe a Doc , Topic The interior can also be broken up by similar methods, and the more shallow layers of Topic , this is like news cluster can from small clustering to big class, from big clustering to Super Big class is this idea. Data mining in the Data Warehouse also has a similar situation, rolled up or drill down.

2, why to use multilayer neural network to dig features?

Formally, height and body size are the variables of two marginally independent , which means they are not independent if they are observed to produce results.


That is, if height and body size are used to detect the disease, they are not independent, so a characteristic representation is needed to express their independence and to combine them to form better features.

This more structured feature requires a lot of corpus to training out. The characteristics of independence, often a little corpus can get good results, but with the increase in the number of corpus, can not be observe to the structural characteristics, so more corpus is wasted. This is the design of some of the problems in machine learning about the selection of samples, how to choose samples, training how many samples enough.

We all know that this is an XOR problem in the field of AI , that is, two-layer neural networks can solve, in other words, multilayer neural networks can dig out better features.

4, how to dig?


Small pieces of graphics can be composed of basic edge , more structured, more complex, conceptual graphics how to express it? This requires a higher level of feature representation, such as V2,V3. So V1 See pixel level is pixel level. V2 See V1 is the pixel level, this is the level progressive, like high school students see Junior High School students naive, college student see Senior like naive .

5. What are the similarities with the human brain? What is the rationale?

Most of the perceptron in the brain is a very simple computational process. But the combination can achieve a high level of understanding.

But the question is how to combine, how the process of perception from the low to the advanced stage, from the light and shade, the perception of color, to the emotions of human emotions, every step of the process may be naive , but the end of the whole cognitive chain must be semantic, emotional, and rise to the concept.

For example, to read a novel, the mind can naturally emerge from the picture, the different neural sensors are not completely independent, but connected to each other.

6, how to train?

The

Corpus is cut into n combine hidden layer Can be well expressed from different shard shard TAGGED DATA 

Usually the student tests at the level of million connections , because again large has exceeded the ability to calculate, but if the use of concurrent methods,160000 million CPUs The case can reach 1 billion connections this level. If you use a special GPU to calculate, you can reach The level of billion connections.

7. How many features do you need?

We know the need to build a hierarchy of features, shallow into the deep, but how many characteristics of each layer?

Any method, as long as the characteristics of enough, the effect can always improve, but the characteristics of many means that the computational complexity, the exploration of the space, the data may be used to train in each feature will be sparse, will bring a variety of problems, not necessarily more characteristics of the better.

8, ML and DL comparison

The large use of shallow learning is mainly due to its very good mathematical characteristics:
--- solution space is a convex function

There are a lot of methods to solve---convex function


Machine Learning Overview

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.