Summary of machine learning algorithms

Last Update:2017-12-14 Source: Internet

Author: User

Tags svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Machine Learning Algorithms Summary:

Linear regression (Linear Regression) (ml category) y=ax+b	Use continuity variables to estimate actual values The optimal linear relationship between the independent variable and the dependent variable is identified by the linear regression algorithm, and an optimal line can be determined on the graph from Sklearn Import Linear_model X_train=input_variables_values_training_datasets Y_train=target_variables_values_ Training_datasets X_test=input_variables_values_test_datasets Linear = Linear_model. Linearregression () Linear.fit (X_train, y_train) Linear.score (X_train, y_train) Print (' coefficient: \ n ', linear.coef_) Print (' Intercept: \ n ', Linear.intercept_) Predicted = Linear.predict (x_test)	Room rate, number of calls and total sales
Logistic regression (ml category)	Predict the value of a discrete dependent variable (like a binary value of 0/1, yes/No, true/false) using known arguments. Simply put, it is to predict the probability of an event occurring by fitting a logical function (Logit fuction). So it predicts a probability value, naturally, its output value should be between 0 and 1. #Import Library From Sklearn.linear_model import logisticregression # Create Logistic Regression object Model = logisticregression () # Train the model using the training sets and check score Model.fit (X, y) Model.score (X, y) #Equation coefficient and Intercept Print (' coefficient: \ n ', model.coef_) Print (' Intercept: \ n ', Model.intercept_) #Predict Output predicted= model.predict (x_test)	Binary value 0/1, yes/No, true/False
Decision tree (ML category)	can be applied to category variables (categorical Variables) can also be used for continuous variables. This algorithm allows us to divide a population into two or more groups. Grouping # Create Tree object Model = tree based on the most important feature variables/arguments that can differentiate the population. Decisiontreeclassifier (criterion= ' Gini ') # model = tree. Decisiontreeregressor () for regression # Train the model using the training sets and check score Model. Fit (x, y) Model.score (x, y) #Predict Output predicted= Model.predict (x_ Test)	Divide a population into two or more groups to solve the classification problem
Support vector Machine (SVM) (ml category)	Each data is plotted as a point in an n-dimensional space (n is the number of features), and each characteristic value represents the size of the corresponding coordinate value. For example, we have two characteristics: a person's height and hair length. We can draw these two variables in a two-dimensional space, each point on the graph has two coordinate values (these axes are also called support vectors). #Import Library from Sklearn Import SVM # Create SVM Classification object Model = SVM.SVC () # Train the model using the training sets and check score Model.fit (X, Y) Model.score (X, y) #Predict Output predicted= model.predict (x_test)	is to divide the balls of different colors into different spaces
Naive Bayes (ml category)	The hypothesis is that the arguments are independent of each other. In short, naive Bayes assumes that the appearance of a feature is unrelated to other characteristics. How to calculate the Posteriori probability P (c\|x) from the priori probability P (c), p (x) and conditional probability P (x\|c). #Import Library From Sklearn.naive_bayes import GAUSSIANNB # Create NB Classification Object model = GAUSSIANNB () Model = GAUSSIANNB () # Train the model using the training sets and check score Model.fit (X, y) #Predict Output predicted= model.predict (x_test)	If a fruit it is red, round, and about 7cm in diameter, we may guess it for Apple Weather variable and target variable "whether to go out to play
KNN (k-Neighbor algorithm) (ML classification)	Find the nearest K-group data in the known data, and then predict the event according to the most common categories in the K-group data, the distance function can be European distance, Manhattan distance, he distance (Minkowski Distance), and Hamming distance (Hamming Distance). The first three types are used for continuous variables, and hamming distances are used for categorical variables. If k=1, the problem is simplified to classify according to the most recent data. The selection of K-value is often the key in KNN modeling. #Import Library From sklearn.neighbors import Kneighborsclassifier #Assumed you has, X (predictor) and Y (target) for training data set and x_test (predictor) of Test_dataset # Create kneighbors Classifier Object model Kneighborsclassifier (n_neighbors=6) # Default value for N_neighbors is 5 # Train the model using the training sets and check score Model.fit (X, y) #Predict Output predicted= model.predict (x_test)	Classification problems can also be used in regression problems The calculation cost of KNN is very high. All characteristics should be standardized in order of magnitude, otherwise the magnitude of the characteristics will be offset at the calculated distance. Pre-processing data prior to KNN, such as removing outliers, noise, etc.
K-mean Algorithm (K-means) (ML cluster unsupervised learning)	A certain number of clusters (assuming K-clusters) are used to classify a given data. The data points in the same cluster are homogeneous, and the data points of different clusters are different classes. #Import Library From sklearn.cluster import Kmeans #Assumed You has, X (attributes) for training data set and X_test (attributes) of Test_dataset # Create kneighbors Classifier object Model K_means = Kmeans (n_clusters=3, random_state=0) # Train the model using the training sets and check score Model.fit (X) #Predict Output predicted= model.predict (x_test)	An unsupervised learning algorithm to solve clustering problems.
Random forest (ml category)	Random Forest is a unique name for the decision tree collection. In random forests we have multiple decision trees (so called "forests"). To classify a new observation, according to its characteristics, each decision tree will give a classification. Random forest algorithm chooses the most voting classification as the classification result. #Import Library from sklearn.ensemble Import randomforestclassifier #Assumed, X ( Predictor) and Y (target) for training data set and x_test (predictor) of test_dataset # Create R Andom Forest Object model= randomforestclassifier () # Train the model using the training sets &N Bsp And check score Model.fit (X, y) #Predict Output predicted= model.predict (x_test)
Descending dimension algorithm (dimensionality Reduction algorithms	How can I find the most important variable from 1000 or 2000 variables? This reduced-dimensionality algorithm and other algorithms, such as decision trees, random forests, PCA, factor analysis, correlation matrices, and default ratios, can help us solve problems. #Import Library from Sklearn Import decomposition #Assumed training and T EST data set as train and test # Create PCA obeject pca= decomposition. PCA (n_components=k) #default value of K =min (n_sample, N_features) # for Factor analysis #fa = &N Bsp Decomposition. Factoranalysis () # reduced the dimension of the training dataset using PCA Train_reduced & nbsp = Pca.fit_transform (train) #Reduced The dimension of test dataset test_reduced = Pca.transform (test)
Gradient boosing and AdaBoost	Is the boosting algorithm that improves predictive accuracy when there is a lot of data. Boosting is an integrated learning approach. It improves prediction accuracy by combining the estimated results of several weaker classifiers/estimators in an orderly manner. These boosting algorithms are well-played in data science competitions such as Kaggle,av Hackthon, Crowdanalytix. #Import Library from sklearn.ensemble Import gradientboostingclassifier #Assumed You has, X (predictor) and Y (target) for training data set and X_test (predictor) of Test_dataset # & nbsp Create Gradient boosting Classifier object model= Gradientboostingclassifier (n_estimators=100, Learning_ rate=1.0, Max_depth=1, random_state=0) # Train The model using the training sets and CHEC K Score Model.fit (X, y) #Predict Output predicted= model.predict (x_test)

Summary of machine learning algorithms

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More