The machine learning process mainly includes: Data feature extraction, data preprocessing, training model, test model, model evaluation improvement, etc.
Traditional machine learning algorithms mainly include the following five categories:
Regression: Establish a regression equation to predict the target value for continuous distribution prediction
Classification: Given a large number of tagged data to calculate the label value of the unknown label sample
Clustering: Aggregating non-tagged data into different clusters based on distance, with common characteristics for each cluster of data
Correlation analysis: Calculating frequent item sets between data
dimensionality reduction: Data points in the original high-dimensional space are mapped to spaces in the lower dimensions
1 linear regression: Find a line to predict the target value
2 Logistic regression: Find a line to classify the data
3 KNN: Measure the nearest neighbor's category label with distance
4 NB: The class with the most posterior probability selected as the category label
5 Decision Tree: Constructing a classification tree with the fastest descending entropy value
A decision tree is a tree structure in which each internal node represents a test on an attribute, each branch represents a test output, and each leaf node represents a category. The top-down recursive method is used to select the feature with the greatest information gain as the current splitting feature.
6 SVM: Constructing super plane, classifying nonlinear data
7 K-means: Calculating centroid, clustering without tag data
8 Correlation analysis
9 PCA dimensionality Reduction: Reduce data dimensions and reduce data complexity
Machine learning interview--a word summarizing the traditional ML algorithm