Transfer from http://www.36dsj.com/archives/20135

**Basis (Basic):**

MSE (Mean square error mean squared error), LMS (leastmean square min squared), LSM (Least square Methods least squares), MLE (Maximumlikelihood Estimation maximum likelihood estimation), QP (quadratic programming two-time plan), CP (Conditional probability conditional probability), JP (Joint probability joint probability), MP ( Marginal probability edge probability), Bayesian Formula (Bayesian formula), L1/l2regularization (L1/l2 Regular, and more, now compare fire L2.5 regular, etc.), GD ( Gradientdescent gradient descent), SGD (Stochastic Gradient descent random gradient descent), eigenvalue (eigenvalues), eigenvector (eigenvectors), Qr-decomposition ( QR decomposition), Quantile (number of bits), covariance (covariance matrix).

**Common distribution (common distribution):**

Discrete distribution (discrete distribution): bernoullidistribution/binomial (Bernoulli distribution/two-item distribution), negative binomialdistribution (negative two-item distribution), Multinomialdistribution (polynomial distribution), geometric distribution (geometric distribution), hypergeometricdistribution (hypergeometric distribution), Poisson Distribution (Poisson distribution)

Continuous distribution (continuous type distribution): Uniformdistribution (evenly distributed), normal Distribution/guassian distribution (normal distribution/Gaussian distribution), Exponentialdistribution (exponential distribution), lognormal distribution (logarithmic normal distribution), gammadistribution (gamma distribution), beta distribution (beta distribution ), Dirichlet distribution (Dirichlet distribution), Rayleigh distribution (Rayleigh distribution), Cauchy distribution (Cauchy distribution), Weibull distribution (Weber distribution)

Three sampling distribution (three sample distributions): Chi-squaredistribution (Chi-square distribution), T-distribution (t-distribution), F-distribution ( F-Distribution)

**data pre-processing (preprocessing)**:

Missing value imputation (missing value padding), discretization (discretized), Mapping (map), normalization (normalized/normalized).

**Sampling (sampling):**

Simple random sampling, offlinesampling (offline, etc. possible K-sampling), online sampling (possibly K-sampling on-line), ratio-based sampling (equal-proportional random sampling), Acceptance-rejectionsampling (Accept-Reject sampling), importance sampling (importance sampling), MCMC (Markovchain Monte Carlo MARCOF Montecaro sampling Algorithm:metropolis-hasting& Gibbs).

**Clustering (cluster):**

K-means,k-mediods, dichotomy K-means,fk-means,canopy,spectral-kmeans (spectral clustering), Gmm-em (mixed Gaussian model-desired maximization algorithm solution), K-pototypes,clarans ( Based on partitioning), BIRCH (hierarchy-based), CURE (hierarchy-based), DBSCAN (based on density), clique (density-based and grid-based)

**Classification®ression (Classification & regression):**

LR (Linear Regression linear regression), LR (logisticregression logistic regression), SR (Softmax Regression Multi-categorical logistic regression), GLM (Generalizedlinear Model Generalized linear model), RR (Ridge Regression Ridge regression/l2 Regular least squares regression), LASSO (Least Absolute Shrinkage andselectionator Operator L1 Regular Least squares regression), RF ( Random forest), DT (DecisionTree decision Tree), GBDT (Gradient boostingdecision tree gradient descent decision Trees), CART (Classificationand Regression tree classification regression trees) , KNN (k-nearest Neighbor K nearest neighbor), SVM (Support Vectormachine), KF (kernelfunction kernel functions polynomialkernel function polynomial kernel functions, Guassian kernelfunction Gaussian kernel function/radial basisfunction RBF radial basis function, string Kernelfunction string kernel function), NB (Naive Bayes naive Bayes), BN ( Bayesian Network/bayesian Belief Network/belief network Bayesian networks/Bayesian Reliability Network/Belief network), LDA (Linear discriminant analysis/ Fisherlinear discriminant linear discriminant Analysis/fisher linear discriminant), EL (Ensemble Learning Integrated Learning boosting,bagging,stacking), AdaBoost (Adaptive Boosting adaptive enhancement), MEM (Maximumentropy model maximum entropy models)

**Effectiveness Evaluation (Classification effect evaluation):**

Confusion matrix (confusion matrix), Precision (accuracy), Recall (recall rate), accuracy (accuracy), F-score (F-Score), Roc Curve (ROC Curve), AUC (AUC area), Liftcurve (lift curve), KS Curve (KS curve).

**PGM (Probabilistic graphical models probability map model):**

BN (Bayesian Network/bayesian belief network/beliefnetwork Bayesian network/Bayesian Reliability Network/Belief network), MC (Markov Chain Markov chain), HMM ( Hiddenmarkov Model Markov models), Memm (Maximum Entropy Markov model maximum Entropy Markov model), CRF (conditionalrandom field conditional random field), MRF ( Markovrandom Field Markov with the airport).

**NN (neural network neural Networks):**

Ann (Artificial Neural Network artificial Neural networks), BP (Error backpropagation reverse propagation)

**Deep Learning (Depth learning):**

Auto-encoder (Automatic encoder), SAE (stacked auto-encoders Stacking Automatic encoder: Sparse auto-encoders sparse Automatic encoder, denoising auto-encoders de-noising automatic encoder, Contractive auto-encoders Shrink Auto Encoder), RBM (Restrictedboltzmann machine restricted Boltzmann machines), DBN (deep belief network depth belief networks), CNN ( Convolutionalneural Network convolutional neural Networks), Word2vec (Word vector learning model).

**Dimensionalityreduction (dimensionality reduction):**

LDA lineardiscriminant analysis/fisher Linear discriminant linear discriminant analysis/fisher linear discriminant, PCA (Principal Component Analysis of principal components), ICA (independentcomponent analysis of independent components), SVD (Singular value decomposition singular value decomposition), FA (factoranalysis factor analysis method).

**Text Mining (Textual mining):**

VSM (vector space model), Word2vec (Word vector learning model), TF (term Frequency frequency), TF-IDF (terms Frequency-inverse Documentfrequency Word Frequency-reverse document rate), MI (mutualinformation Mutual information), ECE (expected cross Entropy desired crossover entropy), Qemi (two information entropy), IG ( Informationgain information Gain), IGR (information Gain Ratio information gain rate), Gini (Gini coefficient), x2 statistic (x2 statistics), TEW (textevidence weight text evidence right) , OR (Odds Ratio dominance rate), N-gram Model,lsa (latent Semantic analysis of potential semantic analyses), pLSA (Probabilisticlatent Semantic Latent semantic analysis based on probability, LDA (latent dirichletallocation potential Dirichlet model)

**Association Mining (Association Mining):**

Apriori,fp-growth (Frequency pattern tree growth frequent pattern trees growth algorithm), Aprioriall,spade.

**recommendation engine (recommended engines)**:

DBR (demographic-based recommendation based on demographic recommendations), CBR (Context-basedrecommendation content-based recommendations), CF (collaborative Filtering Collaborative filtering), UCF (user-basedcollaborative Filtering recommendation user-based collaborative filtering recommendations), ICF (item-basedcollaborative Filtering recommendation Project-based collaborative filtering recommendations).

**Similarity measure&distance Measure (similarity and distance measurement):**

Euclidean Distance (European distance), Manhattandistance (Manhattan distance), Chebyshev Distance (Chebyshev snow Distance), minkowskidistance (Minkowski distance), Standardized Euclidean Distance (standardized Euclidean distance), mahalanobisdistance (ma distance), Cos (cosine cosine), Hammingdistance/edit Distance ( Hamming distance/edit distance), Jaccarddistance (Jaccard distance), Correlation coefficient Distance (correlation coefficient distance), informationentropy (information entropy), KL ( Kullback-leibler divergence KL divergence/relative Entropy relative entropy).

**Optimization (optimized):**

Non-constrainedoptimization (unconstrained Optimization): Cyclic variablemethods (variable rotation method), Pattern search Methods (pattern searching method), Variablesimplex Methods (variable simplex method), Gradient descent Methods (Gradient descent method), Newton Methods (Newton method), Quasi-newtonmethods (Quasi-Newton method), conjugate Gradient Methods (conjugate gradient method).

Constrainedoptimization (constrained Optimization): Approximation programming Methods (approximate planning method), Feasibledirection Methods (feasible direction method), penalty function Methods (penalty function method), Multiplier Methods (multiplicative sub-method).

Heuristic algorithm (heuristic algorithm), SA (simulatedannealing, simulated annealing algorithm), GA (Genetic algorithm genetic algorithm)

**Feature Selection (Feature selection algorithm):**

Mutual information (Mutual information), Documentfrequence (document frequency), information Gain (information gain), chi-squared test (Chi-square test), Gini (Gini coefficient).

**Outlier Detection (anomaly detection algorithm):**

Statistic-based (based on statistics), distance-based (distance based), density-based (based on density), clustering-based (based on clustering).

Learning to Rank (based on learning sort):

Pointwise:mcrank;

Pairwise:rankingsvm,ranknet,frank,rankboost;

Listwise:adarank,softrank,lamdamart;

**Tool (Tools):**

Mpi,hadoop Eco-circle, Spark,bsp,weka,mahout,scikit-learn,pybrain ...

"Basics" Common machine learning & data Mining knowledge points