How can programmers not know what data mining is and how can programmers mine data?

Source: Internet
Author: User

How can programmers not know what data mining is and how can programmers mine data?

As you have heard or seen countless times of data mining, do you know what it is? Many scholars and experts have different definitions about what data mining is. The following are some common statements:
"Simply put, data mining is to extract or 'mined 'knowledge from a large amount of data. In fact, this term is a bit inappropriate. Data mining should be named 'knowledge mining from data' more correctly. Unfortunately, it is a little long. Many people regard data mining as another commonly used term 'knowledge discovering' in the database or a synonym for KDD. Others only regard data mining as a basic step in the knowledge discovery process in the database ." -- Data Mining: Concepts and technology (FreeEIM J. Han and M. Kamber)
"Data mining is to analyze the observed data sets (often very large, the purpose is to discover unknown relationships and summarize data in a novel way that data owners can understand and have value." -- Principle of data mining (David Hand, et al)
"The whole process of using computer-based methods, including new technologies, to obtain useful knowledge in data is called data mining ." -- Data Mining-concepts, models, methods, and algorithms (Mehmed Kantardzicopeneim)
"Data mining, simply put, is to automatically discover relevant models from a database ." -- Building CRM-oriented data mining applications (Alex Berson, et al)
"Data mining (DM) is a process of extracting hidden prediction information from large databases ." -- Data Mining: opportunities and challenges (John Wang)
As the first Chinese in the field of data mining, Professor Han Jiawei gave a clearer definition in the "Data Mining: Concepts and Technology" teaching slide: "data mining, it is the process of extracting meaningful (non-trivial, hidden, previously unknown and potentially valuable) information or patterns from large databases."
Here we can see that data mining has the following features:
L based on a large amount of data: not a small amount of data cannot be mined. In fact, most data mining algorithms can run on a small amount of data and obtain results. However, on the one hand, a small amount of data can be summarized through manual analysis. On the other hand, a small amount of data often cannot reflect the common characteristics of the real world.
L extraordinary: the so-called extraordinary refers to the fact that the acquired knowledge is not simple. It cannot be like what a famous sports commentator says: "After my computing, I found an interesting phenomenon. By the end of this competition, the number of bits in the World Cup is the same as that in the World Cup. It's a coincidence !" That kind of knowledge. This does not seem to be repeated, but many new users who do not know business knowledge often make this mistake.
L implicit: data mining is to discover the knowledge hidden inside the data, rather than the information that is directly present in the data table. Common BI tools, such as reports and OLAP, allow users to find such information.
L Novelty: the acquired knowledge should be unknown in the past; otherwise, it will only verify the experience of business experts. Only new knowledge can help enterprises gain further insights.
L value: the mining results must bring direct or indirect benefits to the enterprise. Some people say that data mining is just a "dragon-killing technique". It seems to be a god of God, but it is useless. This is just a misunderstanding. It is undeniable that in some data mining projects, the lack of clear business objectives, or the lack of data quality, or because of the resistance of people to change business processes, or the lack of mining personnel experience, it will lead to poor results or even completely ineffective. However, a large number of successful cases prove that data mining can indeed become a powerful tool to improve efficiency.
The term "Data Mining" was widely accepted by everyone and is hard to find. It began to emerge in the 1990s S. There is also an interesting story. In the scientific research field, "Knowledge Discovery in databases" (KDD and Knowledge disdge in Database5z20) has been used for the first time ). At the first KDD International Conference, the Committee discussed whether to continue using KDD or change it to Data Mining )? Finally, we decided to vote and accept the option of one party with more votes. The voting results were quite dramatic. A total of 14 members, 7 of whom voted in favor of KDD, And the other 7 in favor of Data Mining. The Last Veteran proposed that "the term data mining is too vague and knowledge is required for scientific research", so he continued to use the term KDD in the scientific research field. In the commercial field, because "Knowledge Discovery in databases" is too long, a more common and simple term-"Data Mining" is widely used ".
Strictly speaking, data mining is not a brand new field. It is a bit of a "new bottle with old wine. The three pillars of data mining include research achievements in the fields of statistics, machine learning, and database, and other contents including visualization and information science. Data Mining incorporates techniques such as regression analysis, discriminant analysis, clustering analysis, and confidence interval in statistics, decision tree, neural network, and association analysis and sequence analysis in databases.

Original article: How can programmers not know what data mining is?

Zookeeper

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.