Recently looking at a book called "Big Talk Data Mining", a simple summary summarizes some of the basic theoretical knowledge of data mining:
1.Data Mining (also known in academia as Kdd:knowledge discovery in database) is extracted from a large number of incomplete, noisy, fuzzy, random data, which we do not know beforehand, The process of knowledge of potentially useful information. (most algorithms are based on the law of large numbers of statistics)
What 2.Data Mining can do: Data mining tasks include descriptive tasks and predictive tasks in two ways:
descriptive tasks include Clustering, association analysis, sequencing, anomaly detection , etc.
predictive tasks include regression and classification .
(1): Association rule Mining (1994 Apriori algorithm): also includes sequence and time series
(2): Cluster analysis (continuous: K-means,k-medoids,
(Discrete: K-mode, K_ prototype)
(Non-spherical clusters: density-based clustering algorithm: DBSCAN, OPTICS, Denclue)
(Hierarchical Clustering algorithm: coalescing && splitting)
(Visual clustering algorithm)
(3) Prediction: The basic principle is the black-box sub-model (regardless of the specific relationship between the input and output, only the relationship between them, but does not care about the causal relationship between them)
(4) Regression: linear regression, linear fitting
(5) Detection of deviations: Description of a few, extreme exceptions to the analysis object, revealing the underlying causes
Follow-up supplement ...
Data Mining (introductory knowledge)