Data Mining: Concepts and technologies

Source: Internet
Author: User

Data Mining: Concepts and technologies
Basic Information
Original Title: Data Mining: concepts and techniques, Third Edition
Author: (US) Jiawei Han University of Illinois-erbana-shangpain (plus) mirine kamber Simon-Fraser University (plus) Jian Pei Simon-Fraser University [Introduction to translators]
Translator: Fan Ming Meng Xiaofeng
Series name: Computer Science Series
Press: Machinery Industry Press
ISBN: 9787111391401
Mounting time:
Published on: February 1, August 2012
Start: 16
Page number: 1
Version: 1-1
Category: Computer> database storage and management

 
 

More about Data Mining: Concepts and technologies
Introduction
Books
Computer books
Data Mining: Concepts and technologies (3rd) fully describes the concepts, methods, technologies, and the latest research progress of data mining. This book comprehensively revised the first two editions, strengthened and reorganized the technical content of the book, and focused on data preprocessing, frequent pattern mining, classification, and clustering, it also comprehensively describes OLAP and outlier detection, and discusses mining networks, complex data types, and important application fields.
Data Mining: Concepts and technology (3rd) is a reference book that must be read by all teachers, researchers, developers, and users in the field of data mining and knowledge discovery, it is an excellent teaching material suitable for data analysis, data mining and Knowledge Discovery courses. It can be used as an introductory Data Mining teaching material for senior undergraduates or first-year graduate students.

Directory
Data Mining: Concepts and technology (formerly known as 3rd)
Publisher's words
Chinese Version
Translator's preface
Introduction to translators
Version 3rd
Version 2nd
Preface
Thank you
Author Profile
Chapter 1 Introduction 1
1.1 why data mining 1
1.1.1 towards the information age 1
1.1.2 Data Mining is the evolution of Information Technology 2
1.2 What is Data Mining 4
1.3 what types of data can be mined 6
1.3.1 database data 6
1.3.2 Data Warehouse 7
1.3.3 transaction data 9
1.3.4 other types of data 9
1.4 What types of models can be mined 10
1.4.1 class/concept description: feature-based and differentiated 10
1.4.2 Mining Frequent Patterns, associations, and correlations 11
1.4.3 classification and regression for predictive analysis 12
1.4.4 cluster analysis 13
1.4.5 outlier analysis 14
1.4.6 are all modes interesting? 14
1.5 What technology is used 15
1.5.1 Statistics 15
1.5.2 machine learning 16
1.5.3 database system and Data Warehouse 17
1.5.4 Information Retrieval 17
1.6 What type of application is oriented 18
1.6.1 Business Intelligence 18
1.6.2web Search Engine 18
1.7 main issues of data mining 19
1.7.1 Mining Method 19
1.7.2 User Interface 20
1.7.3 validity and scalability 21
1.7.4 diversity of database types 21
1.7.5 Data Mining and society 21
1.8 Conclusion 22
1.9 exercise 23
1.10 document comment 23
Chapter 1 recognition data 26
2.1 data object and attribute type 26
2.1.1 what is attribute 27
2.1.2 nominal property 27
2.1.3 Binary Attribute 27
2.1.4 ordinal number attribute 28
2.1.5 numeric attribute 28
2.1.6 discrete and continuous attributes 29
2.2 basic statistical description of data 29
2.2.1 central trend measurement: Mean, median, and count 30
2.2.2 measure data distribution: very short, quartile, variance, standard deviation, and quartile range 32
2.2.3 graphical display of basic statistical descriptions of Data 34
2.3 Data Visualization 37
2.3.1 pixel-based Visualization Technology 37
2.3.2 geometric projection Visualization Technology 38
2.3.3 graphic-based Visualization Technology 40
2.3.4 hierarchical Visualization Technology 42
2.3.5 visualize complex objects and relationships 42
2.4 similarity and heterogeneity of metric data 44
2.4.1 data matrix and object matrix 45
2.4.2 closeness measurement of nominal attributes 46
2.4.3 closeness measurement of binary attributes 46
2.4.4 phase difference of the numeric attribute: min kowski distance 48
2.4.5 Order Number attribute closeness measurement 49
2.4.6 the property of the hybrid type is 50.
2.4.7 cosine similarity 51
2.5 Summary 52
2.6 exercise 53
2.7 document comment 54
Chapter 5 data preprocessing 55
3.1 data preprocessing: Overview 55
3.1.1 data quality: Why data preprocessing? 55
3.1.2 main data preprocessing tasks 56
3.2 data cleanup 58
3.2.1 missing value 58
3.2.2 noise data 59
3.2.3 data cleanup as a process 60
3.3 Data Integration 61
3.3.1 Entity recognition issue 62
3.3.2 redundancy and correlation analysis 62
3.3.3 duplicate tuples 65
3.3.4 data value conflict detection and processing 65
3.4 Data Reduction 65
3.4.1 data reduction policy overview 66
3.4.2 wavelet transformation 66
3.4.3 Principal Component Analysis 67
3.4.4 select attribute subset 68
3.4.5 regression and logarithm Linear Model: parametric data reduction 69
3.4.6 histogram 70
3.4.7 clustering 71
3.4.8 sampling 71
3.4.9 data cube aggregation 72
3.5 data transformation and Data Discretization 73
3.5.1 Data Transformation POLICY Overview 73
3.5.2 normalize and change data 74
3.5.3 discretization through binning 76
3.5.4 discretization 76 through Histogram Analysis
3.5.5 discretization through clustering, decision tree and correlation analysis 76
3.5.6 hierarchical generation of nominal data concept 77
3.6 summary 79
3.7 exercise 79
3.8 Document comment 80
Chapter 2 Data Warehouse and online analysis and processing 82
4.1 Data Warehouse: Basic Concept 82
4.1.1 what is Data Warehouse 82
4.1.2 differences between the operating database system and data warehouse 84
4.1.3 why the data warehouse to be separated 85
4.1.4 data warehouse: A Multi-layer architecture 85
4.1.5 data warehouse model: Enterprise Warehouse, data mart and virtual warehouse 87
4.1.6 data extraction, transformation and loading 88
4.1.7 metabase 88
4.2 Data warehouse modeling: data cubes and olap89
4.2.1 Data Cube: A Multidimensional Data Model 89
4.2.2 star, snowflake, and fact constellation: Multi-Dimensional Data Model 91
4.2.3 dimension: Role of Conceptual Hierarchy 94
4.2.4 metric classification and calculation 95
4.2.5 typical OLAP operations 96
4.2.6 star network query model 98 for querying multidimensional databases
4.3 Data Warehouse design and use 99
4.3.1 business analysis framework for data warehouse design 99
4.3.2 data warehouse design process 100
4.3.3 data warehouse for information processing 101
4.3.4 from OLAP to multidimensional data mining 102
4.4 Data Warehouse implementation 103
4.4.1 effective computing of data cubes: Overview 103
4.4.2 index OLAP data: bitmap index and connected index 105
4.4.3processing of OLAP queries 107
4.4.4olap server structure: Comparison of ROLAP, molap, and holap 107
4.5 data generalization: Attribute-oriented induction 109
4.5.1 Attribute-oriented Summary of data features 109
4.5.2 effective implementation of Attribute-oriented induction 113
4.5.3 category comparison property orientation 114
4.6 conclusion 116
4.7 exercise 117
4.8 document comment 119
Chapter 2 data cube TECHNOLOGY 5th
5.1 data cube computing: Basic Concepts 121
5.1.1 cube materialized: Full cube, Iceberg Cube, closed cube, and cube shell 122
5.1.2 general data cube calculation policy 124
5.2 data cube calculation method 126
5.2.1 multi-channel Array aggregation for full cube computing 126
5.2.2buc: Calculate Iceberg Cube 129 from the vertex cube down
5.2.3star-cubing: Calculate Iceberg Cube 132 using the dynamic star Tree Structure
5.2.4 fast high-dimensional OLAP pre-computed shell segment 136
5.3 use exploration cube technology to process advanced queries 141
5.3.1 sample cube: OLAP-based Data Mining 141
5.3.2 sorting Cube: 145 Effective Calculation of Top-K queries
5.4 Multidimensional Data Analysis of Data Cube space 147
5.4.1 prediction Cube: prediction and mining of cube space 147
5.4.2 multi-feature Cube: 149 of complex aggregation at multiple granularities
5.4.3 exception-driven cube space exploration 149
5.5 Conclusion 152
5.6 exercise 152
5.7 document comment 155
Chapter 1 mining frequent patterns, associations, and correlations: Basic Concepts and Methods 6th
6.1 basic concepts 157
6.1.1 Shopping Basket Analysis: An Example 157
6.1.2 frequent item set, closed item set, and Association Rule 158
6.2 frequent item set mining method 160
6.2.1apriori algorithm: 160 frequent item sets discovered by limiting candidate generation
6.2.2 association rules generated by frequent item sets: 164
6.2.3 improve the efficiency of the Apriori algorithm by 165
6.2.4 Method for increasing the mode of mining frequent item sets by 166
6.2.5 mining frequent item sets using vertical data formats 169
6.2.6 mining closed and extremely large modes 170
6.3 which modes are interesting: Model Evaluation Method 171
6.3.1 strong rules are not necessarily interesting 172
6.3.2 from association analysis to correlation analysis 172
6.3.3 comparison of model evaluation metrics 173
6.4 conclusion 176
6.5 exercise 177
6.6 document comment 179
Chapter 2 advanced mode mining 7th
7.1 Pattern Mining: a Road Map 180
7.2 multi-layer and multi-dimensional mode mining 182
7.2.1 multi-layer Association Rule Mining 182
7.2.2 multi-dimensional association rule mining 185
7.2.3 mining quantitative association rules 186
7.2.4 mining of rare and negative modes 188
7.3 constraint-based frequent mode mining 190
7.3.1 metadata rule guidance mining for association rules 190
7.3.2 constraint-based model generation: model space pruning and Data Space pruning 191
7.4 mining high-dimensional data and giant models 195
7.5 mining compression or approximate mode 198
7.5.1 mode clustering mining compression mode 199
7.5.2 extract the top-K mode of perception redundancy 200
7.6 Model exploration and application 202
7.6.1 semantic annotation 202 for frequent patterns
7.6.2 application of Pattern Mining 205
7.7 conclusion 206
7.8 exercise 207
7.9 document comment 208
Chapter 4 classification: Basic Concepts 8th
8.1 basic concepts 211
8.1.1 what is classification 211
8.1.2 general classification method 211
8.2 decision tree induction 213
8.2.1 sum decision tree 214
8.2.2 select measurement 217 for Attributes
8.2.3 pruning 222
8.2.4 sum of scalability and decision tree 224
8.2.5 visual mining of decision tree induction 225
8.3 Bayesian classification method 226
8.3.1 Bayes Theorem 227
8.3.2 Naive Bayes classification 227
8.4 rule-based classification 230
8.4.1 use if-then rule classification 230
8.4.2 rule 231 extracted from Decision Tree
8.4.3 induction of rules using sequential overwrite algorithms 232
8.5 model evaluation and selection 236
8.5.1 measure for evaluating classifier performance: 236
8.5.2 retention method and random Secondary Sampling 240
8.5.3 cross verification 240
8.5.4 self-help 241
8.5.5 use statistical significance test to select model 241
8.5.6 comparison of classifier based on cost-benefit and ROC curve 243
8.6 technology for improving classification accuracy 245
8.6.1 overview of combined Classification Methods 245
8.6.2 bagging 246
8.6.3 upgrade and upgrade st247
8.6.4 random forest 249
8.6.5 improve classification accuracy of unbalanced data by 250
8.7 conclusion 251
8.8 exercise 251
8.9 document comment 253
Chapter 4 classification: advanced method 9th
9.1 Bayesian Belief Network 255
9.1.1 concepts and mechanisms 255
9.1.2 training Bayesian Belief Network 257
9.2 back-to-propagation classification 258
9.2.1 multi-layer feed-forward neural network 258
9.2.2 define network topology 259
9.2.3 Back-Propagation 260
9.2.4 inside the black box: backward propagation and interpretability 263
9.3 Support Vector Machine 265
9.3.1 265 case of data line performance differentiation
9.3.2 268 cases of non-linear data Differentiation
9.4 frequent mode classification 270
9.4.1 associated category 270
9.4.2 Classification Based on differentiated frequent patterns 272
9.5 inert learning method (or learning from nearest neighbor) 275
9.5.1k-Nearest Neighbor classification 275
9.5.2 Case-based Reasoning 277
9.6 other classification methods 277
9.6.1 genetic algorithm 277
9.6.2 Rough Set Method 278
9.6.3 Fuzzy Set Method 278
9.7 other questions about classification 280
9.7.1 multiclass classification 280
9.7.2 semi-supervised classification 281
9.7.3 active learning 282
9.7.4 Migration 283
9.8 conclusion 284
9.9 exercise 285
9.10 document comment 286
Chapter 1 Cluster Analysis: Basic Concepts and Methods 10th
10.1 cluster analysis 288
10.1.1 what is cluster analysis 288
10.1.2 requirements for cluster analysis 289
10.1.3 overview of basic Clustering Methods 291
10.2 division method 293
10.2.1k-mean: a heart-based technology 293
10.2.2k-center point: a technology based on representative objects 295
10.3 Hierarchy Method 297
10.3.1 clustered and split hierarchical clustering 298
10.3.2 Distance Measurement of Algorithm Methods: 300
10.3.3birch: multi-stage clustering using the clustering feature tree 301
10.3.4chameleon: multi-stage hierarchical clustering with dynamic modeling 303
10.3.5 probability hierarchical clustering 304
10.4 density-Based Method 306
10.4.1dbscan: A density-based clustering 307 based on high-density connected areas
10.4.2optics: Identify the cluster structure by dot sorting 309
10.4.3denclue: clustering based on density distribution function 311
10.5 grid-based Method 312
10.5.1sting: grid of statistics information 312
10.5.2clique: A sub-space clustering method similar to Apriori 314
10.6 Clustering Evaluation 315
10.6.1 estimated cluster trend 316
10.6.2 determine the number of clusters 317
10.6.3 Determination of cluster quality 317
10.7 conclusion 319
10.8 exercise 320
10.9 document comment 321
Chapter 2 advanced clustering analysis 11th
11.1 Probability Model-Based Clustering 323
11.1.1 fuzzy cluster 324
11.1.2 Probability Model-Based Clustering 326
11.1.3 Expectation Maximization Algorithm 328
11.2 clustering of high-dimensional data 330
11.2.1 clustering of high-dimensional data: problems, challenges and main methods 330
11.2.2 method of sub-space clustering 331
11.2.3 double clustering 332
11.2.4 Dimension Reduction Method and spectral clustering 337
11.3 clustering graph and network data 339
11.3.1 applications and challenges 339
11.3.2 similarity measurement 340
11.3.3 graph clustering 343
11.4 constrained clustering 345
11.4.1 constraint classification 345
11.4.2 constrained clustering method 347
11.5 conclusion 349
11.6 exercise 349
11.7 document comment 350
Chapter 1 outlier detection 12th
12.1 outlier and outlier analysis 351
12.1.1 what is an outlier 351
12.1.2 type of the outlier: 352
12.1.3 challenges of outlier detection 354
12.2 outlier detection method 354
12.2.1 supervision, semi-supervision and unsupervised Methods 355
12.2.2 statistical method, proximity-based method, and clustering-Based Method 356
12.3 statistical methods 357
12.3.1 parameter method 357
12.3.2 non-parameter method 360
12.4 Method Based on closeness 361
12.4.1 method of distance-based outlier detection and nested loop 361
12.4.2 grid-based method 363
12.4.3 density-based outlier detection 364
12.5 clustering-Based Method 366
12.6 classification-Based Method 368
12.7 mining situational and collective outliers 369
12.7.1 convert situational outlier detection to traditional outlier detection 369
12.7.2 Scenario-Based Modeling of normal behavior 370
12.7.3 Mining Group outlier 371
12.8 outlier detection in high-dimensional data 371
12.8.1 expanded traditional outlier detection 372
12.8.2 discovering outlier in the sub-space 373
12.8.3 high-dimensional outlier modeling 373
12.9 Conclusion 374
12.10 exercise 375
12.11 document comment 375
Chapter 1 Development Trend and Research Frontier of Data Mining 13th
13.1 mining complex data types 377
13.1.1 mining sequence data: Time Series, symbol series, and biological series 377
13.1.2 mining charts and network 381
13.1.3 383 of data of other types
13.2 other data mining methods 385
13.2.1 Statistical Data Mining 385
13.2.2 views on the basis of data mining 386
13.2.3 visual and auditory Data Mining 387
13.3 data mining applications 391
13.3.1 financial data analysis data mining 391
13.3.2 retail and telecommunications industry data mining 392
13.3.3 scientific and engineering data mining 393
13.3.4 Intrusion Detection and Prevention Data Mining 395
13.3.5 Data Mining and recommendation system 396
13.4 Data Mining and society 397
13.4.1 universal and intangible Data Mining 397
13.4.2 privacy, security, and social impact of Data Mining 399
13.5 Development Trend of Data Mining 400
13.6 conclusion 402
13.7 exercise 402
13.8 document comment 403
References 406
Index 435

Source of this book: China Interactive publishing network

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.