Summary of machine learning algorithms

Source: Internet
Author: User
Tags svm

Machine Learning Algorithms Summary:

Linear regression

(Linear Regression) (ml category)

y=ax+b

Use continuity variables to estimate actual values

The optimal linear relationship between the independent variable and the dependent variable is identified by the linear regression algorithm, and an optimal line can be determined on the graph

from   Sklearn Import Linear_model

X_train=input_variables_values_training_datasets

Y_train=target_variables_values_ Training_datasets

X_test=input_variables_values_test_datasets

Linear   = Linear_model. Linearregression ()

Linear.fit (X_train,   y_train)

Linear.score (X_train,   y_train)

Print (' coefficient:   \ n ', linear.coef_)

Print (' Intercept:   \ n ', Linear.intercept_)

Predicted =   Linear.predict (x_test)

Room rate, number of calls and total sales

Logistic regression (ml category)

Predict the value of a discrete dependent variable (like a binary value of 0/1, yes/No, true/false) using known arguments. Simply put, it is to predict the probability of an event occurring by fitting a logical function (Logit fuction). So it predicts a probability value, naturally, its output value should be between 0 and 1.

#Import   Library

From   Sklearn.linear_model import logisticregression

#   Create Logistic Regression object

Model   = logisticregression ()

#   Train the model using the training sets and check score

Model.fit (X,   y)

Model.score (X,   y)

#Equation   coefficient and Intercept

Print (' coefficient:   \ n ', model.coef_)

Print (' Intercept:   \ n ', Model.intercept_)

#Predict   Output

predicted=   model.predict (x_test)

Binary value 0/1, yes/No, true/False

Decision tree (ML category)

can be applied to category variables (categorical Variables) can also be used for continuous variables. This algorithm allows us to divide a population into two or more groups. Grouping

#   Create Tree object

Model   = tree based on the most important feature variables/arguments that can differentiate the population. Decisiontreeclassifier (criterion= ' Gini ')

#   model = tree. Decisiontreeregressor () for regression

#   Train the model using the training sets and check score

Model. Fit (x,   y)

Model.score (x,   y)

#Predict   Output

predicted=   Model.predict (x_ Test)

Divide a population into two or more groups to solve the classification problem

Support vector Machine (SVM) (ml category)

Each data is plotted as a point in an n-dimensional space (n is the number of features), and each characteristic value represents the size of the corresponding coordinate value. For example, we have two characteristics: a person's height and hair length. We can draw these two variables in a two-dimensional space, each point on the graph has two coordinate values (these axes are also called support vectors).

#Import   Library

from   Sklearn Import SVM

#   Create SVM Classification object

Model   = SVM.SVC ()

#   Train the model using the training sets and check score

Model.fit (X,   Y)

Model.score (X,   y)

#Predict   Output

predicted=   model.predict (x_test)

is to divide the balls of different colors into different spaces

Naive Bayes (ml category)

The hypothesis is that the arguments are independent of each other. In short, naive Bayes assumes that the appearance of a feature is unrelated to other characteristics.

How to calculate the Posteriori probability P (c|x) from the priori probability P (c), p (x) and conditional probability P (x|c).

#Import Library

From Sklearn.naive_bayes import GAUSSIANNB

# Create NB Classification Object model = GAUSSIANNB ()

Model = GAUSSIANNB ()

# Train the model using the training sets and check score

Model.fit (X, y)

#Predict Output

predicted= model.predict (x_test)

If a fruit it is red, round, and about 7cm in diameter, we may guess it for Apple

Weather variable and target variable "whether to go out to play

KNN (k-Neighbor algorithm) (ML classification)

Find the nearest K-group data in the known data, and then predict the event according to the most common categories in the K-group data, the distance function can be European distance, Manhattan distance, he distance (Minkowski Distance), and Hamming distance (Hamming Distance). The first three types are used for continuous variables, and hamming distances are used for categorical variables. If k=1, the problem is simplified to classify according to the most recent data. The selection of K-value is often the key in KNN modeling.

#Import Library

From sklearn.neighbors import Kneighborsclassifier

#Assumed you has, X (predictor) and Y (target) for training data set and x_test (predictor) of Test_dataset

# Create kneighbors Classifier Object model

Kneighborsclassifier (n_neighbors=6) # Default value for N_neighbors is 5

# Train the model using the training sets and check score

Model.fit (X, y)

#Predict Output

predicted= model.predict (x_test)

Classification problems can also be used in regression problems

The calculation cost of KNN is very high.

All characteristics should be standardized in order of magnitude, otherwise the magnitude of the characteristics will be offset at the calculated distance.

Pre-processing data prior to KNN, such as removing outliers, noise, etc.

K-mean Algorithm (K-means) (ML cluster unsupervised learning)

A certain number of clusters (assuming K-clusters) are used to classify a given data. The data points in the same cluster are homogeneous, and the data points of different clusters are different classes.

#Import   Library

From   sklearn.cluster import Kmeans

 

#Assumed   You has, X (attributes) for training data set and X_test (attributes) of   Test_dataset

#   Create kneighbors Classifier object Model

K_means   = Kmeans (n_clusters=3, random_state=0)

 

#   Train the model using the training sets and check score

Model.fit (X)

 

#Predict   Output

predicted=   model.predict (x_test)

An unsupervised learning algorithm to solve clustering problems.

Random forest (ml category)

Random Forest is a unique name for the decision tree collection. In random forests we have multiple decision trees (so called "forests"). To classify a new observation, according to its characteristics, each decision tree will give a classification. Random forest algorithm chooses the most voting classification as the classification result.

#Import Library

from sklearn.ensemble Import   randomforestclassifier

#Assumed, X ( Predictor) and Y   (target) for training data set and x_test (predictor) of test_dataset

 

# Create R Andom Forest Object

model= randomforestclassifier ()

 

# Train the model using the training sets &N Bsp And check score

Model.fit (X, y)

 

#Predict Output

predicted= model.predict (x_test)

 

Descending dimension algorithm (dimensionality Reduction algorithms

How can I find the most important variable from 1000 or 2000 variables? This reduced-dimensionality algorithm and other algorithms, such as decision trees, random forests, PCA, factor analysis, correlation matrices, and default ratios, can help us solve problems.

#Import   Library

from   Sklearn Import decomposition

#Assumed   training and T EST data set as train and test

#   Create PCA obeject pca= decomposition. PCA (n_components=k) #default value of K   =min (n_sample, N_features)

#   for Factor analysis

#fa = &N Bsp Decomposition. Factoranalysis ()

#   reduced the dimension of the training dataset using PCA

 

Train_reduced & nbsp = Pca.fit_transform (train)

 

#Reduced   The dimension of test dataset

test_reduced   = Pca.transform (test)

 

Gradient boosing and AdaBoost

Is the boosting algorithm that improves predictive accuracy when there is a lot of data. Boosting is an integrated learning approach. It improves prediction accuracy by combining the estimated results of several weaker classifiers/estimators in an orderly manner. These boosting algorithms are well-played in data science competitions such as Kaggle,av Hackthon,   Crowdanalytix.

#Import   Library

from   sklearn.ensemble Import gradientboostingclassifier

#Assumed   You has, X (predictor) and Y (target) for training data set and   X_test (predictor) of Test_dataset

# & nbsp Create Gradient boosting Classifier object

model=   Gradientboostingclassifier (n_estimators=100, Learning_ rate=1.0, Max_depth=1,   random_state=0)

 

# Train   The model using the training sets and CHEC K Score

Model.fit (X,   y)

#Predict   Output

predicted=   model.predict (x_test)

 

Summary of machine learning algorithms

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.