Linear regression (Linear Regression) (ml category) y=ax+b |
Use continuity variables to estimate actual values The optimal linear relationship between the independent variable and the dependent variable is identified by the linear regression algorithm, and an optimal line can be determined on the graph from Sklearn Import Linear_model X_train=input_variables_values_training_datasets Y_train=target_variables_values_ Training_datasets X_test=input_variables_values_test_datasets Linear = Linear_model. Linearregression () Linear.fit (X_train, y_train) Linear.score (X_train, y_train) Print (' coefficient: \ n ', linear.coef_) Print (' Intercept: \ n ', Linear.intercept_) Predicted = Linear.predict (x_test) |
Room rate, number of calls and total sales |
Logistic regression (ml category) |
Predict the value of a discrete dependent variable (like a binary value of 0/1, yes/No, true/false) using known arguments. Simply put, it is to predict the probability of an event occurring by fitting a logical function (Logit fuction). So it predicts a probability value, naturally, its output value should be between 0 and 1. #Import Library From Sklearn.linear_model import logisticregression # Create Logistic Regression object Model = logisticregression () # Train the model using the training sets and check score Model.fit (X, y) Model.score (X, y) #Equation coefficient and Intercept Print (' coefficient: \ n ', model.coef_) Print (' Intercept: \ n ', Model.intercept_) #Predict Output predicted= model.predict (x_test) |
Binary value 0/1, yes/No, true/False |
Decision tree (ML category) |
can be applied to category variables (categorical Variables) can also be used for continuous variables. This algorithm allows us to divide a population into two or more groups. Grouping # Create Tree object Model = tree based on the most important feature variables/arguments that can differentiate the population. Decisiontreeclassifier (criterion= ' Gini ') # model = tree. Decisiontreeregressor () for regression # Train the model using the training sets and check score Model. Fit (x, y) Model.score (x, y) #Predict Output predicted= Model.predict (x_ Test) |
Divide a population into two or more groups to solve the classification problem |
Support vector Machine (SVM) (ml category) |
Each data is plotted as a point in an n-dimensional space (n is the number of features), and each characteristic value represents the size of the corresponding coordinate value. For example, we have two characteristics: a person's height and hair length. We can draw these two variables in a two-dimensional space, each point on the graph has two coordinate values (these axes are also called support vectors). #Import Library from Sklearn Import SVM # Create SVM Classification object Model = SVM.SVC () # Train the model using the training sets and check score Model.fit (X, Y) Model.score (X, y) #Predict Output predicted= model.predict (x_test) |
is to divide the balls of different colors into different spaces |
Naive Bayes (ml category) |
The hypothesis is that the arguments are independent of each other. In short, naive Bayes assumes that the appearance of a feature is unrelated to other characteristics. How to calculate the Posteriori probability P (c|x) from the priori probability P (c), p (x) and conditional probability P (x|c). #Import Library From Sklearn.naive_bayes import GAUSSIANNB # Create NB Classification Object model = GAUSSIANNB () Model = GAUSSIANNB () # Train the model using the training sets and check score Model.fit (X, y) #Predict Output predicted= model.predict (x_test) |
If a fruit it is red, round, and about 7cm in diameter, we may guess it for Apple Weather variable and target variable "whether to go out to play |
KNN (k-Neighbor algorithm) (ML classification) |
Find the nearest K-group data in the known data, and then predict the event according to the most common categories in the K-group data, the distance function can be European distance, Manhattan distance, he distance (Minkowski Distance), and Hamming distance (Hamming Distance). The first three types are used for continuous variables, and hamming distances are used for categorical variables. If k=1, the problem is simplified to classify according to the most recent data. The selection of K-value is often the key in KNN modeling. #Import Library From sklearn.neighbors import Kneighborsclassifier #Assumed you has, X (predictor) and Y (target) for training data set and x_test (predictor) of Test_dataset # Create kneighbors Classifier Object model Kneighborsclassifier (n_neighbors=6) # Default value for N_neighbors is 5 # Train the model using the training sets and check score Model.fit (X, y) #Predict Output predicted= model.predict (x_test) |
Classification problems can also be used in regression problems The calculation cost of KNN is very high. All characteristics should be standardized in order of magnitude, otherwise the magnitude of the characteristics will be offset at the calculated distance. Pre-processing data prior to KNN, such as removing outliers, noise, etc. |
K-mean Algorithm (K-means) (ML cluster unsupervised learning) |
A certain number of clusters (assuming K-clusters) are used to classify a given data. The data points in the same cluster are homogeneous, and the data points of different clusters are different classes. #Import Library From sklearn.cluster import Kmeans #Assumed You has, X (attributes) for training data set and X_test (attributes) of Test_dataset # Create kneighbors Classifier object Model K_means = Kmeans (n_clusters=3, random_state=0) # Train the model using the training sets and check score Model.fit (X) #Predict Output predicted= model.predict (x_test) |
An unsupervised learning algorithm to solve clustering problems. |
Random forest (ml category) |
Random Forest is a unique name for the decision tree collection. In random forests we have multiple decision trees (so called "forests"). To classify a new observation, according to its characteristics, each decision tree will give a classification. Random forest algorithm chooses the most voting classification as the classification result. #Import Library from sklearn.ensemble Import randomforestclassifier #Assumed, X ( Predictor) and Y (target) for training data set and x_test (predictor) of test_dataset # Create R Andom Forest Object model= randomforestclassifier () # Train the model using the training sets &N Bsp And check score Model.fit (X, y) #Predict Output predicted= model.predict (x_test) |
|
Descending dimension algorithm (dimensionality Reduction algorithms |
How can I find the most important variable from 1000 or 2000 variables? This reduced-dimensionality algorithm and other algorithms, such as decision trees, random forests, PCA, factor analysis, correlation matrices, and default ratios, can help us solve problems. #Import Library from Sklearn Import decomposition #Assumed training and T EST data set as train and test # Create PCA obeject pca= decomposition. PCA (n_components=k) #default value of K =min (n_sample, N_features) # for Factor analysis #fa = &N Bsp Decomposition. Factoranalysis () # reduced the dimension of the training dataset using PCA Train_reduced & nbsp = Pca.fit_transform (train) #Reduced The dimension of test dataset test_reduced = Pca.transform (test) |
|
Gradient boosing and AdaBoost |
Is the boosting algorithm that improves predictive accuracy when there is a lot of data. Boosting is an integrated learning approach. It improves prediction accuracy by combining the estimated results of several weaker classifiers/estimators in an orderly manner. These boosting algorithms are well-played in data science competitions such as Kaggle,av Hackthon, Crowdanalytix. #Import Library from sklearn.ensemble Import gradientboostingclassifier #Assumed You has, X (predictor) and Y (target) for training data set and X_test (predictor) of Test_dataset # & nbsp Create Gradient boosting Classifier object model= Gradientboostingclassifier (n_estimators=100, Learning_ rate=1.0, Max_depth=1, random_state=0) # Train The model using the training sets and CHEC K Score Model.fit (X, y) #Predict Output predicted= model.predict (x_test) |
|
|
|
|