Common machine learning & Data Mining Knowledge points reproduced please indicate the source basis (base):
- MSE (Mean squared error, mean square error)
- RMSE (root Mean squared error, RMS error)
- RRSE (root Relative squared error, relative square root error)
- MAE (Mean Absolute error, mean absolute error)
- RAE (Root Absolute error, square root of mean absolute error)
- LSM (Least Mean squared, min mean square)
- LSM (Least square Methods, least squares)
- MLE (Maximum likelihood estimation, maximum likelihood estimation)
- QP (quadratic programming, two-time plan)
- CP (Conditional probability, conditional probability)
- JP (Joint probability, joint probability)
- MP (marginal probability, edge probability)
- Bayesian Formula (Bayesian formula)
- L1/L2 regularization (l1/l2 regular, and more, now compared to the L2.5 regular of fire, etc.)
- GD (Gradient descent, gradient descent)
- SGD (Stochastic Gradient descent, random gradient descent)
- Eigenvalue (eigenvalues)
- Eigenvector (eigenvector)
- CC (Correlation coefficient, correlation coefficient)
- Quantile (number of digits)
- Covariance (covariance matrix)
Common distribution (common distribution): Discrete distribution (discrete distribution):
- Bernoulli distribution/binomial Distribution (Bernoulli min./two items)
- Negative binomial distribution (negative two-item distribution)
- Multinomial distribution (polynomial distribution)
- Geometric distribution (geometric distribution)
- hypergeometric distribution (hypergeometric distribution)
- Poisson Distribution (Poisson distribution)
Continuous distribution (continuous type distribution):
- Uniform distribution (evenly distributed)
- Normal Distribution/guassian distribution (normal/Gaussian distribution)
- Exponential distribution (exponential distribution)
- Lognormal distribution (logarithmic normal distribution)
- Gamma distribution (gamma distribution)
- Beta distribution (Beta distribution)
- Dirichlet Distribution (Dirichlet distribution)
- Rayleigh Distribution (Rayleigh distribution)
- Cauchy Distribution (Cauchy distribution)
- Weibull Distribution (Weber distribution)
Three sampling distribution (three large sample distributions):
- Chi-Square distribution (CHI-square distribution)
- T-distribution (T-distribution)
- F-distribution (f-Distribution)
Data pre-processing (preprocessing):
- Missing value imputation (missing value padding)
- Discretization (discretization)
- Mapping (map)
- Normalization (normalization/normalization)
Sampling (sampling):
- Simple random sampling (easy stochastic sampling)
- Offline sampling (offline, etc. possible K sampling)
- Online sampling (possible k sampling on-line)
- ratio-based Sampling (equal-proportional random sampling)
- Acceptance-rejection sampling (Accept-Reject sampling)
- Importance sampling (importance sampling)
- MCMC (Markov Chain Montecarlo Marcof Montecaro sampling algorithm:metropolis-hasting& Gibbs)
Clustering (cluster):
- K-meansk-mediods
- Two minutes K-means
- Fk-means
- Canopy
- Spectral-kmeans (Spectral clustering)
- Gmm-em (mixed Gaussian model-expected maximization algorithm solution)
- K-pototypes
- Clarans (based on division)
- BIRCH (based on hierarchy)
- CURE (based on hierarchy)
- STING (Grid based)
- Clique (density-based and grid-based)
- Density clustering algorithm in science of 2014, etc.
Clustering effectiveness Evaluation (Cluster effect evaluation):
- Purity (Purity)
- RI (Rand index, Richter indicator)
- ARI (Adjusted Rand Index, adjusted Richter indicator)
- NMI (normalized Mutual information, normalized mutual information)
- F-meaure (f measurement)
Classification®ression (Classification & regression):
- LR (Linear Regression, linear regression)
- LR (Logistic Regression, logistic regression)
- SR (Softmax Regression, multi-categorical logistic regression)
- GLM (generalized Linear model, generalized linear models)
- RR (Ridge Regression, Ridge regression/l2 Regular least squares regression), LASSO (Least Absolute Shrinkage and Selectionator Operator, L1 Regular least squares regression)
- DT (decision tree Decision Trees)
- RF (random Forest, stochastic forest)
- GBDT (Gradient boosting decision tree, gradient descent decision trees)
- CART (Classification and Regression tree category regression trees)
- KNN (k-nearest Neighbor, K nearest neighbor)
- SVM (Support vector machines, SVM, including SVC (classification) &SVR (regression))
- CBA (classification based on association rule, classification based on association rules)
- KF (Kernel function, kernel functions)
- Polynomial Kernel function (polynomial kernel functions)
- Guassian Kernel function (Gaussian kernel functions)
- Radial Basis function (RBF radial basis function)
- String Kernel function String kernel functions
- NB (Naive Bayesian, Naive Bayes)
- BN (Bayesian Network/bayesian belief Network/belief network Bayesian networks/Bayesian Reliability Network/belief network)
- LDA (Linear discriminant analysis/fisher Linear discriminant linear discriminant Analysis/fisher linear discriminant)
- EL (Ensemble Learning, integrated learning)
- Boosting
- Bagging
- Stacking
- AdaBoost (Adaptive boosting adaptive enhancement)
- MEM (Maximum Entropy model, maximum entropy models)
Classification Effectivenessevaluation (Classification effect evaluation):
- Confusion matrix (Confusion matrix)
- Precision (accuracy)
- Recall (recall rate)
- Accuracy (accuracy rate)
- F-score (F-Score)
- Roc Curve (ROC Curve)
- AUC (AUC area)
- Lift Curve (Lift curve)
- KS Curve (KS curve)
PGM (Probabilistic graphical Models, probability map model):
- BN (Bayesiannetwork/bayesian belief Network/belief Network, Bayesian networks/Bayesian Reliability Network/belief network)
- MC (Markov Chain, Markov chain)
- MEM (Maximum Entropy model, maximum entropy models)
- HMM (Hidden Markov model, Markov models)
- Memm (Maximum Entropy Markov model, maximum entropy Markov model)
- CRF (Conditional random field, conditional stochastic field)
- MRF (Markov Random Field, Markov with Airport)
- Viterbi (Viterbi algorithm)
NN (neural network, neural networks)
- Ann (Artificial Neural Network, artificial neural networks)
- SNN (Static neural network, Ann)
- BP (Error back propagation, error reverse propagation)
- HN (Hopfield Network)
- DNN (Dynamic neural Network, dynamical neural networks)
- RNN (recurrent neural network, recurrent neural networks)
- SRN (Simple recurrent network, easy recursive neural networks)
- ESN (Echo State network, echo status net)
- LSTM (Long Short term memory, neural network of short and short duration)
- CW-RNN (clockwork-recurrent Neural network, clock-driven recurrent neural network, 2014ICML), etc.
Deep Learning (Depth learning):
- Auto-encoder (Automatic encoder)
- SAE (stacked auto-encoders stacking automatic encoder)
- Sparse auto-encoders (Sparse automatic encoder)
- Denoising auto-encoders (de-noising automatic encoder)
- Contractive auto-encoders (Shrink Auto Encoder)
- RBM (Restricted Boltzmann machine, restricted Boltzmann machines)
- DBN (Deep Belief network, depth belief networks)
- CNN (convolutional neural Network, convolutional neural networks)
- Word2vec (Word vector learning model)
dimensionality Reduction (Descending dimension):
- LDA (Linear discriminant analysis/fisher Linear discriminant, linear discriminant analysis/fish linear discriminant)
- PCA (Principal Component Analysis, principal component analyses)
- ICA (Independent Component analysis, independent component analyses)
- SVD (Singular value decomposition singular value decomposition)
- FA (Factor Analytical factor Analysis method)
Text Mining (Textual mining):
- VSM (vectors space model, vector spaces models)
- Word2vec (Word vector learning model)
- TF (term Frequency, word frequency)
- TF-IDF (termfrequency-inverse document Frequency, Word frequency-reverse file frequency)
- MI (Mutual information, mutual information)
- ECE (expected cross Entropy, expected crossover entropy)
- Qemi (two information entropy)
- IG (Information Gain, information gain)
- IGR (Information Gain Ratio, information gain rate)
- Gini (Gini coefficient)
- X2 Statistic (x2 statistics)
- TEW (text Evidence Weight, textual evidence right)
- OR (Odds Ratio, dominance rate)
- N-gram Model
- LSA (latent Semantic analyses, latent semantic analysis)
- pLSA (Probabilistic latent Semantic analysis, probabilistic-based potential semantic analyses)
- LDA (latent Dirichlet Allocation, latent Dirichlet model)
- SLM (statistical Language model, statistical language models)
- NPLM (Neural probabilistic Language model, neural probabilistic language models)
- Cbow (Continuous bag of Words model, continuous word bag models)
- Skip-gram (Skip-gram Model)
Association Mining (Association Mining):
- Apriori algorithm
- Fp-growth (Frequency pattern tree growth, frequent pattern trees growth algorithm)
- Msapriori (Multi support-based Apriori, Apriori algorithm based on multi-support degree)
- Gspan (graph-based substructure Pattern Mining, frequent sub-graph mining)
Sequential Patterns analysis (sequence pattern analyses)
- Aprioriall
- Spade
- GSP (generalized sequential Patterns, generalized sequence pattern)
- Prefixspan
Forecast (forecast)
- LR (Linear Regression, linear regression)
- SVR (Support vector Regression, SVM regression)
- ARIMA (autoregressive Integrated moving Average model, autoregressive integral sliding average models)
- GM (gray model, grey models)
- BPNN (BP Neural Network, reverse propagation neural networks)
- SRN (Simple recurrent network, simply recurrent neural networks)
- LSTM (Long Short term memory, neural network of short and short duration)
- CW-RNN (Clockwork recurrent neural network, clock-driven recurrent neural network)
- ......
Linked Analysis (link analyst)
- HITS (hyperlink-induced Topic Search, hyperlink-based theme retrieval algorithm)
- PageRank (Page rank)
Recommendation engine (recommended engines):
- Svd
- Slope One
- DBR (demographic-based recommendation, based on demographic recommendations)
- CBR (context-based recommendation, Content-based recommendations)
- CF (collaborative Filtering, collaborative filtering)
- UCF (user-based Collaborative Filtering recommendation, user-based collaborative filtering recommendations)
- ICF (item-based Collaborative Filtering recommendation, project-based collaborative filtering recommendations)
Similarity measure&distance Measure (similarity and distance measurement):
- Euclideandistance (European distance)
- Chebyshev Distance (Chebyshev snow distance)
- Minkowski Distance (Minkowski distance)
- Standardized euclideandistance (standardized Euclidean distance)
- Mahalanobis Distance (Markov distance)
- Cos (cosine, cosine)
- Hamming distance/edit Distance (Hamming distance/edit distance)
- Jaccard Distance (Jaccard distance)
- Correlation coefficient Distance (correlation coefficient distance)
- Information Entropy (Information entropy)
- KL (Kullback-leibler divergence, KL divergence/relative Entropy, relative entropy)
Optimization (optimization): non-constrained Optimization (unconstrained optimization):
- Cyclic Variable Methods (variable rotation method)
- Variable Simplex Methods (variable simplex method)
- Newton Methods (Newton method)
- Quasi-Newton Methods (Quasi-Newton method)
- Conjugate Gradient Methods (conjugate gradient method).
Constrained Optimization (constrained optimization):
- Approximation programming Methods (approximate planning method)
- Penalty function Methods (penalty functions method)
- Multiplier Methods (multiplier method).
- Heuristic algorithm (heuristic algorithm)
- SA (simulated annealing, simulated annealing algorithm)
- GA (Genetic algorithm, genetic algorithm)
- aco (Ant Colony optimization, ant colony algorithm)
Feature Selection (Feature selection):
- Mutual information (Mutual information)
- Document Frequence (Documentation frequency)
- Information Gain (Information gain)
- Chi-squared test (Chi-square test)
- Gini (Gini coefficient)
Outlier Detection (anomaly detection):
- Statistic-based (based on statistics)
- Density-based (based on density)
- Clustering-based (based on clustering).
Learning to Rank (based on learning sort):
- Pointwise
- Pairwise
- Rankingsvm
- Ranknet
- Frank
- Rankboost;
- Listwise
- Adarank
- Softrank
- Lamdamart
Tool (Tools):
- Mpi
- Hadoop Eco-Circle
- Spark
- Igraph
- Bsp
- Weka
- Mahout
- Scikit-learn
- Pybrain
- Theano
...
As well as some specific business scenarios with case ...
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Common knowledge points for machine learning & Data Mining