Depending on the data mining that you've heard or seen countless times, do you know what that is? Many scholars and experts give different definitions of what data mining is, and here are a few common statements:
"To put it simply, data mining is extracting or ' digging ' knowledge from a large amount of data. The term is actually a bit of a misnomer. Data mining should be more correctly named ' Mining knowledge from data ', unfortunately it's a bit long. Many people see data mining as another common term ' database of knowledge discovery ' or a synonym for KDD. Others simply see data mining as a basic step in the knowledge discovery process in the database. "-Data mining: Concepts and Technology" (Freeeim J. Han and M. Kamber)
"Data mining is the analysis of the observed datasets (often very large), with the aim of discovering unknown relationships and summarizing data in novel ways that data owners can understand and value them." "--the principles of data Mining (David Hand, et al)
"The whole process of using a computer-based approach, including new technologies, to gain useful knowledge in data is called data mining. "-Data mining-concepts, models, methods and Algorithms" (Mehmed Kantardzicopeneim)
"Data mining, simply put, is the automatic discovery of related patterns from a database." "-building data mining Applications for CRM" (Alex Berson, et al)
Data Mining (DM) is the process of extracting hidden predictive information from a large database. "-Data mining: Opportunities and Challenges" (John Wang)
As the first Chinese in the field of data mining, Professor Jiawei Han a clearer definition of "data mining: Concepts and Techniques": "Data mining is the process of extracting meaningful (non-trivial, implicit, previously unknown, and potentially valuable) information or patterns from large databases." ”
Here we can see that data mining has the following characteristics:
L based on a lot of data: not to say that the small amount of data can not be mined, in fact, most of the data mining algorithms can be run on the small amount of data and get results. But, on the one hand, too little data can be summed up by manual analysis, on the other hand, small data volume often does not reflect the real world of universal characteristics.
L Non-trivial: the so-called non-trivial, refers to the excavation of knowledge should be not simple, can not be similar to a famous sports commentator said "After my calculation, I found an interesting phenomenon, to the end of this game, the World Cup goal and the number of missed goals is the same." It was a coincidence! "That kind of knowledge. This seems to be needless to be explained, but many novice data-mining novices who do not understand business knowledge often make this mistake.
L Implicit: Data mining is about discovering knowledge that lies deep inside the data, not the information that emerges directly on the surface of the data. Common BI tools, such as reports and OLAP, allow users to find this information completely.
L Novelty: The knowledge dug out should be unknown before, otherwise it is just a validation of the experience of the business experts. Only a new knowledge can help the enterprise gain further insight.
L Value: The results of the excavation must be able to bring direct or indirect benefits to the enterprise. Some people say that data mining is only "disappointing", it looks marvellous, but nothing useful. This is a misunderstanding, admittedly, in some data mining projects, or because of a lack of clear business goals, or because of inadequate data quality, or because people are resisting the change in business processes, or because they are inexperienced, they can result in poor results or even no effect at all. But a lot of success stories are proving that data mining can actually become a powerful tool for improving efficiency.
When the term "data mining" was universally accepted, it was hard to verify that it began to rise in the 90 's. There is also a section of amusingly. In the scientific research community, "knowledge discovery in the database" (i.e. Kdd,knowledge Discovery in database5z20) has been used in the first place. At the first KDD International conference, the Committee had been discussing whether to continue to follow KDD or renamed Data Mining. Finally, everyone decided to vote and adopt the choice of a party with many votes. The results were quite dramatic, with 14 members, 7 of whom voted in favour of KDD and 7 in favor of data Mining. The last veteran proposed that the term "data mining is too vague to do scientific research", so the term "KDD" continues to be used in the scientific research community. In the commercial world, because "knowledge discovery in Databases" seems too lengthy, a more popular and simple term-"data mining"-is widely used.
Strictly speaking, data mining is not a new area, it is quite a bit of "new bottle of old wine" meaning. The three pillars of data mining include research in the fields of statistics, machine learning and databases, and others include visualization, information science, and so on. Data mining incorporates the techniques of regression analysis, discriminant analysis, clustering analysis, confidence interval and so on, such as decision tree in machine learning, neural network technology, association analysis and sequence analysis in database.
The original: How can programmers not know what is data mining
How can programmers not know what data mining is