Machine learning algorithms: the key to making computers smart

Source: Internet
Author: User
Keywords deep learning algorithm function artificial intelligence neural network

The algorithm "trains" in some way by using known inputs and outputs to respond to specific inputs. It represents a systematic approach to describing the strategic mechanisms for solving problems. The development of artificial intelligence is inseparable from the continuous improvement of machine learning algorithms.

Machine learning algorithms can be divided into traditional machine learning algorithms and deep learning. Traditional machine learning algorithms mainly include the following five categories:

Regression: Establish a regression equation to predict the target value for continuous distribution prediction

Classification: Given a large number of tagged data, calculate the tag value of the unknown tag sample

Clustering: Integrate unlabeled data into different clusters based on distance, each cluster has common characteristics

Association analysis: Calculate the set of frequent items between data

Dimensionality reduction: data points in the original high-dimensional space are mapped to low-dimensional space

Below we will select several common algorithms, one by one.

Linear regression: find a straight line to predict the target value

A simple scenario: historical data on the price and size of a house is known. What is the selling price when the area is 2000?

Such problems can be solved with regression algorithms. Regression is a statistical analysis method that determines the quantitative relationship between two or more variables. By establishing a regression equation (function) to estimate the possible values ??of the target variables corresponding to the eigenvalues. The most common is linear regression (Y = a X + b), which is to find a straight line to predict the target value. The solution of the regression is the process of solving the regression coefficients (a, b) of the regression equation and minimizing the error. In the housing price scenario, based on the relationship between the area of ??the house and the selling price, the regression equation can be used to predict the selling price for a given floor area.

The application of linear regression is very extensive, for example:

Predicting customer lifetime value: Based on the relationship between old customer historical data and customer life cycle, establish a linear regression model, predict the lifetime value of new customers, and then carry out targeted activities.

Airport Passenger Flow Distribution Forecast: Based on the massive airport WiFi data and security check-in check-in data, the data flow algorithm is used to analyze and predict the passenger flow of the airport terminal.

Monetary Fund Fund Inflow and Outflow Forecast: Through the user's basic information data, user purchase redemption data, profit rate table and interbank lending rate and other information, grasp the user's purchase redemption data, and accurately predict the future daily capital inflow and outflow situation. .

Movie box office forecast: Based on historical public box data, film evaluation data, public opinion data and other Internet public data, the movie box office is predicted.

Logistic regression: find a straight line to classify data

Although the name of the logistic regression is regression, it belongs to the classification algorithm. The result of the linear function is mapped to the Sigmoid function by the Sigmoid function, and the probability of occurrence of the event is estimated and classified. Sigmoid is a normalized function that converts continuous values ??into a range of 0 to 1, providing a way to discretize continuous data into discrete data.

Therefore, logistic regression is intuitively drawing a classification line. The data located on the side of the classification line, the probability > 0.5, belongs to the classification A; the data located on the other side of the classification line, the probability <0.5, belongs to the classification B. For example, by calculating the probability of suffering from a tumor, the results are classified into two categories, which are located on both sides of the logical classification line.

The application of logistic regression is also very broad, such as:

The medical community: explore the risk factors of a disease, predict whether the disease occurs, and the probability of occurrence based on risk factors.

The financial community: predict whether the loan will default, or estimate the probability that the borrower will default or default in the future before lending.

Consumer industry: predict whether a certain consumer will buy a certain product, whether to buy a membership card, and thus target advertisements, or vouchers, etc. for users with high probability of purchase.

K-nearest: the most adjacent classification label with distance

A simple scene: Know the number of fighting and kissing shots in a movie to determine whether it belongs to a love piece or an action movie. When the number of kissing shots is large, we judge it as a romance based on experience. So how does the computer make the discrimination?

You can use the K-nearest neighbor algorithm, which works as follows:

(1) Calculate the distance between the point in the sample data and the current point

(2) The algorithm extracts the classification label of the most similar data (nearest neighbor) of the sample

(3) Determine the frequency of occurrence of the category of the first k points. Generally, only the top k most similar data in the sample data set is selected. This is the source of k in the k-nearest neighbor algorithm, usually k is an integer not greater than 20.

(4) Returning the category with the highest frequency appearing in the first k points as the prediction classification of the current point

In the movie classification scene, k takes a value of 3, and the three points sorted by distance are action pieces (108, 5), action pieces (115, 8), and love pieces (5, 89). Among these three points, the action movie appears two-thirds, and the love movie appears one-third of the frequency, so the movie marked by the red dot is an action movie.

A common application of the K-nearest neighbor algorithm is handwritten digit recognition. Handwritten fonts For the human brain, the number seen is an image, and in the computer's view it is a two-dimensional or three-dimensional array, so how to identify the number?

The specific steps for identifying using the K-nearest neighbor algorithm are:

(1) Each picture is first processed to have the same color and size: the width and height are 32 pixels x 32 pixels.

(2) Convert the 3232 binary image matrix to a test vector of 11024.

(3) The training samples are stored in the training matrix, and a training matrix of m rows and 1024 columns is created, and each row of data of the matrix stores one image.

(4) Calculate the distance between the target sample and the training sample, and select the highest frequency of the first k points as the predicted classification of the current handwritten font.

Naive Bayes: The class with the highest posterior probability is the classification label.

A simple scene: the first bowl (C1) has 30 fruit candy and 10 chocolate candy, and the second bowl (C2) has 20 fruit candy and chocolate candy. Now randomly select a bowl and take out a sugar from it and find it is fruit candy. Ask which bowl of fruit sugar (X) is most likely to come from?

This type of problem can be calculated using the Bayesian formula, and there is no need to model the target variable. At the time of classification, the probability that the sample belongs to each category is calculated, and then the category with the large probability value is taken as the classification category.

P(X|C) : conditional probability, probability of occurrence of X in C

P(C): prior probability, probability of occurrence of C

P(C|X) : posterior probability, the probability that X belongs to class C

Suppose there are two classes, C1 and C2. Since P(X) is the same, there is no need to consider P(X).

Just consider the following:

If P(X|C1) P(C1) > P(X|C2) P(C2), then P(C1|X) > P(C2|X), X is C1;

If P(X|C1) P(C1) < P(X|C2) P(C2), then P(C2|X) < P(C2|X), and X belongs to C2.

For example, in the above example:

P(X): The probability of fruit candy is 5/8

P(X|C1): The probability of fruit candy in the first bowl is 3/4

P(X|C2): The probability of fruit candy in the second bowl is 2/4

P(C1)= P(C2): The probability that two bowls are selected is the same, 1/2

The probability that the fruit candy comes from the No. 1 bowl is: P(C1|X)=P(X|C1)P(C1)/P(X)=(3/4)(1/2)/(5/8) =3/5

The probability that fruit candy comes from the second bowl is: P(C2|X)=P(X|C2)P(C2)/P(X)=(2/4)(1/2)/(5/8)= 2/5

P(C1|X)>P(C2|X), so this sugar is most likely to come from the No. 1 bowl.

The main applications of Naive Bayes include text categorization, spam filtering, sentiment discrimination, and multi-class real-time prediction.

Decision Tree: Construct a classification tree with the fastest decline in entropy

A simple scenario: When you are dating, you may first check if the blind person has a room. If so, consider further contact. If there is no room, observe if it is self-motivated, if not, directly Say Goodbye. If there is, it can be included in the shortlist.

This is a simple decision tree model. A decision tree is a tree structure in which each internal node represents a test on an attribute, each branch represents a test output, and each leaf node represents a category. A top-down recursive method is used to select the feature with the greatest information gain as the current split feature.

Decision trees can be applied to: user rating assessment, loan risk assessment, stock selection, bidding decisions, etc.

Support Vector Machine (SVM): Constructing hyperplanes, classifying nonlinear data

A simple scenario: It is required to separate the balls of different colors with one thread, and it is still required to apply as much as possible after putting more balls.

Both A and B lines can meet the conditions. Continue to increase the ball, line A can still be a good separation of the ball, and line B is not.

Further increase the difficulty. When the ball has no clear dividing line, it is impossible to separate the ball with a straight line. How to solve it?

Two scenarios involving support vector machines in this scenario:

(1) When a classification problem, the data is linearly separable, as long as the position of the line is placed at a position that maximizes the distance of the ball from the line, the process of finding the maximum interval is called optimization.

(2) The general data is linear and inseparable. The data can be mapped from two-dimensional to high by the kernel function, and the data is segmented by the hyperplane.

The classification interval of the optimal decision surface in different directions is usually different, and the decision surface with the "maximum interval" is the optimal solution that the SVM is looking for. The sample point through which the dashed line on both sides of the true optimal solution passes is the supporting sample point in the SVM, called the support vector.

SVM is widely used and can be applied to spam recognition, handwriting recognition, text classification, stock picking, etc.

K-means: Calculate centroids, cluster unlabeled data

In the classification algorithm described above, the data set to be classified has been marked, for example, the data set has been marked as ○ or ×, and the two types of data are divided by learning a hypothesis function. For unmarked data sets, it is hoped that there will be an algorithm that can automatically divide the same elements into closely related subsets or clusters. This is the clustering algorithm.

For a specific example, for example, the age of a group of people, roughly know that there are a bunch of children, a bunch of young people, a bunch of seniors.

Clustering is the automatic discovery of these three piles of data, and the aggregation of similar data into the same heap. If you want to get together into 3 heaps, then the input is a bunch of age data. Note that the age data at this time does not have a class label, which means that you only know that there are roughly three people in it. As for who is the pile, now is I don't know, and the output is the class label of each data. After the clustering is completed, I know who and who are a bunch.

The classification is to tell you in advance what the age of children, young people and the elderly is. Now that a new age is entered, enter its age and output the classification she belongs to. The general classifier needs to be trained in order to recognize new data.

The K-Means algorithm is a common clustering algorithm. The basic steps are as follows:

(1) randomly generate k initial points as centroids;

(2) Divide the data in the data set into the clusters according to the distance from the center of mass;

(3) Average the data in each cluster as a new centroid, repeat the previous step until all clusters no longer change.

The further the two classification intervals, the better the clustering effect.

One example of the K-means algorithm is: customer value segmentation, precision investment. Taking airlines as an example, because of fierce business competition, the focus of corporate marketing has shifted from product center to customer center; establishing a reasonable customer value assessment model, classifying customers, and conducting precise marketing are the key to solving problems.

Identify customer value through five indicators: recent consumption interval R, consumption frequency F, flight mileage M and average of discount factor C, customer relationship length L (LRFMC model). The K-Means algorithm is used to cluster customer data into five categories (in combination with business understanding and analysis to determine the number of categories of customers) to map the customer group characteristics.

Customer value analysis:

Important to keep customers: C, F, M are higher, R is low. Resources should be prioritized for such customers, differentiated management, and increased customer loyalty and satisfaction.

Important development customers: C is higher, R, F, M are lower. Such customers have short participation (L), low current value and great development potential, and should encourage customers to increase their consumption at the company and partners.

Important retention customers: C, F or M is higher, R is higher or L is smaller, and the uncertainty of customer value changes is high. Keep up-to-date with customer information and maintain interaction with customers.

General and low value customers: C, F, M, L are low and R is high. Such customers may choose to spend when they are on a discount.

An interesting case of the K-means algorithm is image compression. In a color image, each pixel is 3 bytes (RGB) in size, and the total number of colors that can be represented is 256 256 256. The K-means algorithm is used to put similar colors in K clusters, so it is only necessary to keep the labels of each pixel, and the color coding of each cluster can complete the image compression.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.