Absrtact: Data mining, as an information technology which extracts knowledge from massive data, has aroused wide attention of both domestic and foreign academia and industry, and its successful application in business has enabled software developers to develop new data mining tools and improve existing data mining tools, which is a dazzling collection of data mining tools. Then there is the question of how to reasonably choose the mining tool. In view of this, this article has proposed and discussed five points about the reasonable Choice Data mining tool the skill.
Keywords: data mining; Data mining tools; Data Warehouse
With the extensive application of database and computer network, coupled with the use of advanced data automatic generation and collection tools, people have a sharp increase in the amount of data. However, the rapid growth of data is not directly proportional to the improvement of data analysis methods, on the one hand, people want to carry out scientific research, business decision, enterprise management on the basis of the large amount of data, on the other hand, traditional data analysis tools are difficult to deal with the data deeply, so the contradiction between them It is in this situation that data mining arises. Data mining, as an information technology which extracts knowledge from massive data, is a "find-driven" process, which has aroused great attention of academia and industry. In particular, since the first appearance of the knowledge Discovery concept in the database at the 11th session of the joint International Artificial Intelligence Conference held in Detroit, USA in August 1989, data mining has received unprecedented attention at home and abroad, and data mining is widely used in various fields, such as geography, geology, biomedicine and so on. In a word, the emergence of data mining makes the database technology into a more advanced stage, not only to query and traverse the past data, but also to identify the potential links between the past data to promote the dissemination of information.
Data Mining Technology Overview
1, the definition of data mining
Data mining is a process of extracting patterns from data, is a cross-cutting area influenced by many disciplines, including database system, statistics, machine learning, visualization and information science, etc. data mining repeatedly using multiple data mining algorithms to determine patterns or reasonable models from observational data is a decision support process. By predicting customer behavior, help decision-makers adjust their marketing strategies, reduce risk, and make the right decisions. Because traditional tools, such as query tools, reporting tools, cannot answer previously undefined or cross-sectoral/institutional issues, their users must have a clear understanding of the purpose of the problem. Data mining can respond to previously undefined comprehensive or cross-sectoral/institutional issues, excavate potential patterns and predict future trends, users do not have to ask exact questions, and fuzzy issues are more conducive to discovering unknown facts.
2, the main methods and ways of data mining
Data mining has many kinds of classification methods, such as the types of knowledge discovered, the type of database mining, the methods of mining, the techniques used and so on. Here are just four more widely used methods:
• Association Rules (Association rule)
In the field of data mining, association rules are the most widely used and important research direction. A rule that represents an association relationship between a set of objects in a database, in general, the attributes of an association rule can be described with multiple parameters, commonly used: credibility, support, interest, expectation, credibility, and degree of action.
• Outlier data (outlier)
Outlier data is data that deviates significantly from other data, does not meet the general pattern or behavior of the data, and is inconsistent with other data that exists. Most of the research of data mining ignores the existence and significance of outlier data, and the existing methods often study how to reduce the effect of outlier data on normal data, or just treat it as noise. These outlier data may be from computer input error, human error, and may be the real reflection of the data.
• case-based Reasoning (case-based reasoning, CBR)
Case-based reasoning originates from human cognitive psychology, which belongs to analogical inference method. The basic idea is based on the experience and knowledge acquired by people accustomed to dealing with similar problems in problem solving, and adjusting the differences in the old and new situations, so as to get the solution of new problems and form new cases. The application of CBR has been paid more and more attention in many fields, such as meteorology, environmental protection, earthquake, agriculture, medical, commercial, CAD and so on. CBR can also be used in the production of computer hardware and software, such as fault detection The CBR method, especially in the field of expert knowledge, has been applied more and more widely and deeply.
• Support vector machines (Support vector MACHINE,SVM)
Support Vector Machine (SVM) is a new general knowledge discovery method developed in recent years, which has good performance in classification. SVM is based on the structural risk minimization principle of computational learning theory, and the main idea is to find a hyperplane as two kinds of segmentation in high space for two kinds of classification problems to ensure the smallest classification error rate.