The research report, the author is Chen SHUWI software data expert, in a 1-year time to create a best practice, today and you share, about the "Data Mining and Operations analysis", together Explore ~

Chen is a high-priority cloud software (from monitoring, to application experience, to automated continuous delivery of full stack service platform)

**Data Mining (Mining) is the extraction or "mining" of knowledge from a large amount of data.**

Generalized data Mining: Data mining is the process of mining interesting knowledge from a large amount of data stored in a database, data warehouse, or other information base.

Data mining technology focuses on: 1) Probability and Mathematical Statistics 2) database technology 3) AI Technology 4) machine learning.

1. Data cleansing: Eliminating noise or inconsistent data

2. Data integration: Multiple data sources can be grouped together

3. Data selection: Extract data from the database that is relevant to the analysis task

4. Data Transformation: Data transformation or unification into a form suitable for mining

5. Data mining: Basic steps to extract data patterns using intelligent methods

6. Pattern Evaluation: Identify the truly interesting patterns that provide knowledge based on a certain degree of interest measurement

7. Knowledge Representation: Use of visualization and knowledge representation techniques to provide users with knowledge of mining

**Process diagram of data mining**

**Excellent Data Mining software toolkit**

**OFFICE EXCEL:** The most common data analysis mining tool.

**SPSS a set of tools** : including SPSS spreadsheet, SPSS SAS, Spssclementine.

**matlab:** Matrix Lab, also has a variety of MATLAB toolbox.

Introduction to Association Rules

**Shopping Basket Analysis** : Beer diaper problem, association rule Mining first find the frequent itemsets, the set of items, such as a and B, meet the minimum support threshold, and satisfy the minimum confidence threshold, and produce a strong association rule like a B.

**Apriori algorithm** is an effective association rule mining algorithm, which is probed, connected and pruned to find the maximal frequent set. Nature: All non-empty sets of frequent itemsets must be frequent.

**FP (frequent mode) tree algorithm** : Frequent pattern growth is a method that does not produce candidate mining frequent itemsets. It constructs a highly compressed data structure fp-tree, compresses the original transactional database, focuses on frequent pattern fragment growth, avoids high-cost candidate generation, and gets better efficiency.

**degree of Ascension** : Relevance measurement, interest: Not all strong association rules are interesting. For statistics-related items, you can dig into the relevant rules.

**Apriori Algorithm Example**

**Find the corresponding strong association rule**

**Application of association rules in operation and maintenance**

**correlation mining of 0 alarms**

Mining the frequent itemsets of alarms, such as alarm a alarm B, to analyze the linkage of alarms. Predictive management and processing and optimization for alarms.

0 **user Behavior correlation analysis**

The collection and analysis of user behavior based on log information is advantageous to adjust and optimize the position of the function, and improve the user's experience effect.

0**Server Request correlation Analysis**

Analyzing the linkage correlation of user behavior is advantageous to adjust and optimize the position of the function, and improve the user's experience effect.

0 **Crash and error correlation analysis**

Mining causes crashes or errors, that is, in what situations often lead to crashes or errors, facilitate the processing of crashes or errors, and propose improvement programs.

Application of classification in operation and maintenance

**Classification-Supervised learning**

**Decision Tree:** CLS (most basic), ID3 (information gain), C4.5 (information gain rate), CART (binary decision tree) are greedy algorithms for decision tree induction. Each algorithm uses an information theory metric that selects test attributes for each non-leaf node in the tree. The pruning algorithm tries to improve the accuracy by cutting off the branches that reflect the noise in the data.

**Random Forest (classification and regression)**: is a classifier that contains multiple decision trees, and its output category is determined by the number of categories the individual tree outputs.

**Neural Network** : A set of connected input/output units in which each connection is associated with a weight. A multilayer feedforward neural network consists of an input layer, one or more hidden layers, and an output layer.

**Support Vector Machine (SVM):** A classification algorithm for linear and nonlinear data. It transforms the original data into a higher-dimensional space, using a basic training tuple called a support vector, from which the hyper-plane of the detached data is discovered.

**Associative classification** : Association mining techniques Search for frequently occurring patterns in large databases, patterns can generate rules that can be analyzed for classification.

**Bayesian Classification:** based on Bayesian theorem, it is assumed that the class conditions are independent. Naive Bayesian classification and Bayesian belief network based on Bayesian theorem of posterior probability. Bayesian belief networks allow the definition of class conditional independence between subsets of variables.

**k Nearest Neighbor taxonomy:** distance-based classification algorithm, distance-based classification algorithm, lazy learning method.

**Decision Tree Examples**

1. Decision tree (after pruning) whether the operator is handling the alarm in a timely manner

2. Calculate the impact of the various dimensions on the final decision (the information gain rate) is branched from high to low.

3.c4.5 is also the first of the top ten algorithms for data Mining (J48 in Weka)

**Classification-Supervised learning**

**Decision Tree:** CLS (most basic), ID3 (information gain), C4.5 (information gain rate), CART (binary decision tree) are greedy algorithms for decision tree induction. Each algorithm uses an information theory metric that selects test attributes for each non-leaf node in the tree. The pruning algorithm tries to improve the accuracy by cutting off the branches that reflect the noise in the data.

**Random Forest (classification and regression)**: is a classifier that contains multiple decision trees, and its output category is determined by the number of categories the individual tree outputs. --The regression analysis of the detailed said.

**Neural Network** : A set of connected input/output units in which each connection is associated with a weight. A multilayer feedforward neural network consists of an input layer, one or more hidden layers, and an output layer.

**Support Vector Machine (SVM):** A classification algorithm for linear and nonlinear data. It transforms the original data into a higher-dimensional space, using a basic training tuple called a support vector, from which the hyper-plane of the detached data is discovered.

**Associative classification** : Association mining techniques Search for frequently occurring patterns in large databases, patterns can generate rules that can be analyzed for classification.

**Bayesian Classification:** based on Bayesian theorem, it is assumed that the class conditions are independent. Naive Bayesian classification and Bayesian belief network based on Bayesian theorem of posterior probability. Bayesian belief networks allow the definition of class conditional independence between subsets of variables.

**k Nearest Neighbor taxonomy:** distance-based classification algorithm, distance-based classification algorithm, lazy learning method.

**Examples of BP neural networks**

BP network: Back propagation is a neural network algorithm for classification, using gradient descent method

**Classification-Supervised learning**

Decision Tree: CLS (most basic), ID3 (information gain), C4.5 (information gain rate), CART (binary decision tree) is the greedy algorithm of decision tree induction. Each algorithm uses an information theory metric that selects test attributes for each non-leaf node in the tree. The pruning algorithm tries to improve the accuracy by cutting off the branches that reflect the noise in the data.

Random Forest (classification and regression): is a classifier that contains multiple decision trees, and its output category is determined by the number of categories the individual tree outputs.

Neural network: A set of connected input/output units in which each connection is associated with a weight. A multilayer feedforward neural network consists of an input layer, one or more hidden layers, and an output layer.

Support Vector Machine (SVM): A classification algorithm for linear and nonlinear data. It transforms the original data into a higher-dimensional space, using a basic training tuple called a support vector, from which the hyper-plane of the detached data is discovered.

Associative classification: Association mining Techniques Search for frequently occurring patterns in large databases, patterns can generate rules that can be analyzed for classification.

Bayesian classification: Based on Bayesian theorem, it is assumed that the class conditions are independent. Naive Bayesian classification and Bayesian belief network based on Bayesian theorem of posterior probability. Bayesian belief networks allow the definition of class conditional independence between subsets of variables.

K Nearest Neighbor Taxonomy: Distance-based classification algorithm, distance-based classification algorithm, lazy learning method.

**Support Vector Machine Examples**

Handle issues by searching for **maximum marginal hyperplane**(MMH)

**Classification-Supervised learning**

**Decision Tree:** CLS (most basic), ID3 (information gain), C4.5 (information gain rate), CART (binary decision tree) are greedy algorithms for decision tree induction. Each algorithm uses an information theory metric that selects test attributes for each non-leaf node in the tree. The pruning algorithm tries to improve the accuracy by cutting off the branches that reflect the noise in the data.

**Random Forest (classification and regression)**: is a classifier that contains multiple decision trees, and its output category is determined by the number of categories the individual tree outputs.

**Neural Network** : A set of connected input/output units in which each connection is associated with a weight. A multilayer feedforward neural network consists of an input layer, one or more hidden layers, and an output layer.

**Support Vector Machine (SVM):** A classification algorithm for linear and nonlinear data. It transforms the original data into a higher-dimensional space, using a basic training tuple called a support vector, from which the hyper-plane of the detached data is discovered.

**Associative classification:** Association Mining techniques Search for frequently occurring patterns in large databases, patterns can generate rules that can be analyzed for classification.

**Bayesian Classification:** based on Bayesian theorem, it is assumed that the class conditions are independent. Naive Bayesian classification and Bayesian belief network based on Bayesian theorem of posterior probability. Bayesian belief networks allow the definition of class conditional independence between subsets of variables.

**k Nearest Neighbor taxonomy:** distance-based classification algorithm, lazy learning method.

**K Nearest Neighbor Taxonomy example**

5-Nearest Neighbor category

All training tuples are stored in the pattern space until the experience tuple is present for categorization.

**Application of classification in operation and maintenance**

**Because there is supervision, there must already be some decision data to train the classification model**

Application of cluster analysis in operation and maintenance

**Clustering-unsupervised learning**

Sample is not marked, and the sample is clustered into K-class according to distance

**Cluster Analysis** is an active field of research.

Data type: Data matrix: P Variables n objects

dissimilarity degree matrix: dissimilarity degree (distance) definition needs to satisfy

1) d (i,j) >=0;

2) d (i,i) = 0;

3) d (i,j) =d (j,i);

4) d (i,j) <=d (i,k) +d (k,j).

The most common distances are European and Manhattan distances.

Commonly used distances are: the distance of the similarity matrix conversion

**Methods of Clustering**

Cluster analysis Many clustering algorithms have been developed, which can be divided into **partition method** , **hierarchical method** , density-based method and grid-based method.

The **partitioning method** first gets the initial set of K-partitions, where the parameter k is the number of divisions to be constructed, and it uses iterative relocation techniques to try to improve the quality of partitioning by moving objects from one cluster to another. Representative partitioning methods include K-means clustering, EM (desired maximization) algorithm.

A **hierarchical method** creates a hierarchical decomposition of a collection of given data objects. Depending on the formation process of the hierarchical decomposition, such methods can be divided into bottom-up, or top-down. The representative hierarchical methods include system clustering method and fuzzy clustering method.

**K-means Clustering**

K-means Clustering algorithm is the distance square and minimum clustering method

[1] Suppose to be clustered into K classes. By artificial decision K-class centers.

[2] In the first iteration, the distance of each sample point to the K-Class center is computed, and it is grouped into the nearest class.

[3] Calculating the class center of the new class for each class of centroid, and recalculate each sample point to the K-Class center distance, reclassify.

[4] until the change in the center of the class is small or up to the maximum iteration.

**Methods of Clustering**

Cluster analysis Many clustering algorithms have been developed, which can be divided into **partition method** , **hierarchical method** , density-based method and grid-based method.

The **partitioning method** first gets the initial set of K-partitions, where the parameter k is the number of divisions to be constructed, and it uses iterative relocation techniques to try to improve the quality of partitioning by moving objects from one cluster to another. Representative partitioning methods include K-means clustering, EM (desired maximization) algorithm.

A **hierarchical method** creates a hierarchical decomposition of a collection of given data objects. Depending on the formation process of the hierarchical decomposition, such methods can be divided into bottom-up, or top-down. The representative hierarchical methods include system clustering method and fuzzy clustering method.

**System Clustering Method**

System Clustering method is the **pedigree clustering** method or **hierarchical clustering method. **

**Pedigree Chart**

**Application of clustering in operation and maintenance**

The application of outlier detection in operation and maintenance

**Off-Group Point detection**

Outliers (outlier) analysis is also called anomaly detection.

A outliers is a data object that is significantly different from other data objects.

**Off-Group Point type**

**The method of outlier detection**

**The application of outlier detection in operation and maintenance**

**Application of statistical analysis in operation and maintenance**

**Statistical analysis method**

▲ Principal Component Analysis: It is a statistical method to convert multiple indexes into a few indexes. (dimensionality reduction)

Application: (1) explanation, application in psychology and Sociology (2) comprehensive evaluation, such as the measurement of enterprise indicators are many, various indicators for different enterprises, the difference is very large, through the principal component analysis, with very few comprehensive indicators for evaluation. (3) Classification: With two main components, can be categorized on the chart. Together belong to the same class, far away, the difference is very big.

Steps: 1) standardize the data matrix, 2) Calculate the correlation coefficient array r,3) compute the feature root sort, 4) determine the principal component, 5) compute the unit eigenvector, 6) write out the principal component.

▲ Factor Analysis: The generalization of principal component analysis.

Objective: (theory) is to study the internal relations of primitive variables, simplify the covariance structure of the original variables, analyze the complex relationships in the variables, and (application) is a common factor to search for many variables, that is, to explore how several directly measured and correlated indicators are governed by a few relatively independent factors that cannot be measured directly.

Basic idea: The variables are grouped according to the size of correlation, so that the correlations between variables in the same group are higher, while the variables in different groups are less correlated.

Step: 1) standardize the data matrix, 2) Calculate correlation coefficient array R (covariance), 3) Find R feature Root and Eigenvector, 4) Calculate factor load a,5) factor rotation (variance maximal rotation), 6) Calculate factor score

The difference between the two: Principal component analysis focuses on how to transform the original variable into a number of comprehensive new indicators. Different from principal component analysis, factor analysis is concerned with how to explain the common variation problem between variables.

**▲ Canonical Correlation analysis** : Study of correlation between two groups of random variables

**Application** : (1) Explain the relationship between: y means weight, X is height, age. Does height, age have any effect on weight? (2) Prediction and control on the basis of 1, using the X variable to predict or control the Y variable. If the stock market is not doing well in the west, the banks are cutting interest rates and boosting the stock market; Generally, control should be based on good predictions. (3) Looking for structural linkage: explain the internal structure mechanism through the linear function.

▲ **discriminant Analysis** : When a new sample (or an individual) is obtained on the observed value of indicator x, it is the discriminant analysis to determine which type the sample belongs to.

**1. Distance discrimination** : By defining the measurement x (P-Dimension) of the sample index to the overall distance, the size of the sample to determine which population.

**2.Bayes discriminant:** for a given sample x, calculates the value of the probability density function of the two populations at x.

**3.Fisher discriminant:** The basic idea is projection, that is, the K-type of M-dimensional data projection (transformation) to a certain direction, so that the transformed data, the same kind of point "as possible together", not similar points "as far as possible to separate" to achieve the purpose of classification.

**Regression analysis**

**regression analysis** (prediction): The most basic and widely used method to study correlation relationship. Based on the mastery of a large number of observational data, the regression relation function between the dependent variable and the independent variable is established by means of mathematical statistics.

1) Linear regression analysis, method: least squares.

2) Nonlinear regression analysis: Parabolic model, hyperbola model, power function model, exponential function model, logarithmic function model, logical curve model, polynomial model, etc. Methods: Linearization of nonlinear model and least squares method.

3) Two logistic regression analysis: Dependent variable y=0,1, can be used for classification decision.

4) regression based on stochastic forest model: The real-state argument may contain both the factor (factor), the continuity variable of the numerical value, the importance of the independent variable to the dependent variable, and the prediction of the future situation.

**Examples of stochastic forest regression in operation and maintenance**

Number of new devices impact factors and forecasts: Time, week, new release, advertising, the day before the number of active devices, the number of active devices, and so on.

The importance of the independent variable to the dependent variable can be judged by two indicators:

1)%incmse: The importance of the mean square error, if the greater the value of the indicator, the greater the impact of this independent variable on the dependent variable, if 0 indicates that the independent variable has no relation to the dependent variable, if the negative value indicates that this argument may have a misleading effect on the variation of the dependent variable.

2) Incnodepurity: Importance in the sense of diminishing precision. The calculation method is the sum of squares (non-negative) of the residuals, and the greater the value of this indicator, the greater the effect of this argument on the dependent variable, and if 0 indicates that the argument has no relation to the dependent variable.

Examples of datasets affected by the number of new devices and other factors

New_date Week New_ver advertising yestoday_act today_act new_device

636,147 Sunday 0 0 884 459 79

636,148 Weeks 10 0 459 701 45

636,149 Weeks 20 2.8 701 185 13

636,150 Weeks 30 3.6 185 112 12

636,151 weeks 46 3.2 112 827 87

636,152 Weeks 58 2.8 827 892 32

636,153 Weeks 69 2 892 716 89

636,154 Sunday 8 0.8 716 204 98

636,155 Weeks 17 0 204 157 39

636,156 Weeks 25 0 157 484 72

636,157 Weeks 33 0 484 595 42

636,158 Weeks 41 0 595 592 70

636,159 Weeks 50 0 592 93 42

636,160 Weeks 62 0 93 451 89

636,161 Sunday 5 0 451 582 54

636,162 Weeks 14 0 582 140 97

636,163 Weeks 23 0 140 741 61

636,164 Weeks 31 0 741 809 30

636,165 weeks 40 0 809 440 91

636,166 Weeks 50 0 440 108 65

636,167 Weeks 60 0 108 304 75

**Application of statistical analysis in operation and maintenance**

Application of other methods of data mining in operation and maintenance

**Other methods of data mining**

**▲ Genetic Algorithm** : It is a computational model of biological evolutionary process that simulates the natural selection and genetic mechanism of Darwinian evolution, and is a method to search the optimal solution by simulating the natural evolutionary process.

**One, coding** , can not directly deal with the parameters of the problem space, they must be converted into genetic space by the gene by a certain structure of chromosomes or individuals. This conversion operation is called coding, or it can be called a (problem) representation. **(C)**

**Second, fitness function:** the degree of adaptation in evolution, is to indicate an individual's ability to adapt to the environment, and also to indicate the ability of the individual to reproduce offspring. The fitness function of genetic algorithm is also called the evaluation function, which is used to judge the merits and demerits of the individuals in the group, and it is evaluated according to the objective function of the problem. **(E)**

**conditions** : 1. Single-value, continuous, non-negative, maximum, 2. reasonable and consistent; 3. Low calculation; 4. Strong versatility.

**Three, the basic operation process** is as follows: Simple genetic algorithm:**sga= (c,e,p (0), n,f,g,y,t)**

1. **Initialize** : Set the evolutionary algebra counter t=0, set the maximum evolutionary algebra T, randomly generate n individuals as the initial group P (0). **(P (0), N)**

2. **Individual evaluation** : Calculate the fitness of each individual in group P (t).

3. **Select operation (F)**: Applies the selection operator to the population. The goal is to genetically optimize the individual directly to the next generation or by pairing to produce new individuals and then to the next generation. The selection operation is based on the assessment of individual fitness in the population.

4. **crossover operation (G)**; the crossover operator acts on the group. The so-called crossover refers to the operation of a new individual by replacing the partial structure of the two parents with a reorganization. The core function of genetic algorithm is crossover operator.

5. **mutation operation (Y)**: The mutation operator acts on the population. It is to change the gene value of some loci in the group of individuals.

Group P (t) was selected, crossed, and mutated to obtain the next generation Group P (t+1).

6. **Termination condition Judgment (T)**: If t=t, then the optimal solution output is calculated with the maximum adaptive degree obtained in the evolutionary process.

**▲ Rough set theory:** has become a relatively new academic hotspot in artificial intelligence field, and has been widely applied in many fields such as machine learning, knowledge acquisition, decision analysis, Process control, etc.

**▲ Fuzzy Set method:** refers to the whole of an object with a property described by a fuzzy concept. Since the concept itself is not clear and distinct, the affiliation of the object to the set is not clear or either. Set A is a map of set X to [0,1], a:x→[0,1],x→a (x) is a fuzzy set on X, a (x) is called the membership function of fuzzy set A, or a (x) is the membership of X for fuzzy set a.

**Stream Data Mining** : A set of sequential, large, fast, continuous arrival of the data series, in general, the data stream can be regarded as a continuation of the time and unlimited growth of dynamic Data collection.

**Graph Mining** : Used for mining large-scale graph data sets of frequent map patterns, to carry out features, distinctions, classification and clustering analysis. Applied to chemical informatics, bioinformatics, computer vision, video indexing, text retrieval, web analytics, etc.

**the mining of complex data types** , including object data, spatial data, multimedia data, time series data, text data and web data. Spatial data mining refers to the discovery of meaningful patterns from the large data volumes of geospatial databases; Multimedia data mining refers to the discovery of meaningful patterns from multimedia databases; text data is a computer processing technology that extracts valuable information and knowledge from text data, and text data mining is data mining from text; Web Mining refers to the discovery of hidden, unknown, potentially application-worthy, non-trivial patterns from a large collection of Web documents, including static Web pages, Web databases, web structures, and user-use records.

**Application in operation and maintenance**

Chen is a high-priority cloud software (from monitoring, to application experience, to automated continuous delivery of full stack service platform)

Best Practices for cloud software data experts: Data Mining and operations analysis