The research status of data mining

Source: Internet
Author: User
Tags mail

I. Challenges of the Times

Over the past more than 10 years, the ability of people to use information technology to produce and collect data has increased dramatically, and countless databases have been used for business management, government office, Scientific Research and engineering development, and this momentum will continue to develop. A new challenge has been raised: In an era known as information explosion, an overdose is almost a problem that everyone needs to face. How can we not be overwhelmed by the vast ocean of information and find useful knowledge in time to improve the utilization of information? To make the data truly a company's resources, only make full use of it for the company's own business decisions and strategic development services, otherwise a large number of data may become a burden, or even become garbage. Therefore, in the face of "people are drowning in data, people are hungry for knowledge," the challenge of data Mining and Knowledge Discovery (DMKD) technology came into being, and was able to flourish, more and more show its strong vitality.

Data Mining (Mining) is the process of extracting information and knowledge that is implied in it, which is not known in advance, but is potentially useful, from a large number of incomplete, noisy, fuzzy, random data. There are also many terms similar to this term, such as discovering knowledge from databases (KDD), data analysis, data fusion, and decision support. People think of raw data as a source of knowledge, like mining from ore. The original data can be structured, such as data in relational databases, or semi-structured, such as text, graphics, image data, or even heterogeneous data distributed across the network. The method of discovering knowledge can be either mathematical or mathematical, or it can be deductive or inductive. The discovered knowledge can be used for information management, query optimization, decision support, Process control, etc., and can be used for the maintenance of data itself. Therefore, data mining is a broad interdisciplinary, which brings together researchers in different fields, especially in database, artificial intelligence, mathematical statistics, visualization, parallel computing and other scholars and engineering technicians.

In particular, data mining is an application-oriented technique from the outset. It is not only a simple search query call for a specific database, but also a microscopic, meso and even macroscopic statistics, analysis, synthesis and inference to guide the solution of practical problems, attempt to find the correlation between events, and even use the existing data to predict future activities. For example, the Canadian BC Telephone Company requires the KDD research group of Simon Fraser University in Canada to summarize, analyze and propose new telephone charges and management methods that benefit both the company and the customer, based on its customer data for more than ten years. The NBA coach of the famous American national basketball team, using the data mining technology provided by a company, decided to replace the players and was legendary in the database field.

In this way, people's application of data, from the lower end of the query operations, to provide decision-making support for business decision-makers at all levels. This demand-driven force is more powerful than database queries. At the same time, it should be pointed out that the discovery of knowledge here is not a requirement to discover the truth of universal, nor to discover the new natural science theorem and pure mathematical formula, much less the machine theorem proof. All discovered knowledge is relative, has specific prerequisites and constraints, oriented to specific areas, but also easy to be understood by users, it is best to use natural language to express findings. Therefore, DMKD's research results are very practical. 1997, the 3rd Annual KDD International Academic conference on the real data mining tools of the competition awards activities, is a vivid proof. Recently, there are a number of DMKD products used to screen news on the Internet, to protect users from boring e-mail interference and commercial sales, is greatly welcomed.

Ii. Current status of research

The word kdd first appeared at the 11th session of the International Joint Artificial Intelligence Conference held in August 1989. To date, the International Symposium on KDD, sponsored by the American Artificial Intelligence Association, has been held 7 times, from the original symposium to the International Academic Conference, from twenty or thirty people to seven hundred or eight hundred people, the proportion of papers included from 2x1 to 6x1, the research focus has gradually shifted from the discovery method to the System application, It also focuses on the integration of multiple discovery strategies and technologies, as well as the mutual infiltration of various disciplines. Other topics, such as data mining and knowledge discovery, have become one of the topics in the current computer science community.

The 1997 Asia-Pacific region organized its first large-scale PAKDD Academic symposium in Singapore. PAKDD ' 98, which will be held in Melbourne, Australia, has received more than 150 papers this year, with unprecedented enthusiasm.

In addition, databases, artificial intelligence, information processing, knowledge engineering and other fields of international academic journals have opened up the topic of KDD or monograph. IEEE Knowledge and Data Engineering journal leading in 1993 published the KDD Technology Special issue, published 5 papers on behalf of the current KDD study of the latest results and dynamic, a more comprehensive discussion of KDD system methodology, findings of the evaluation, The logical method of KDD system design focuses on the relation and difference between KDD system and other traditional machine learning, expert system, Artificial neural network, mathematical statistic Analysis system, and the corresponding basic countermeasures in view of the dynamic redundancy of database, high noise and uncertainty, and null value. 6 Paper Abstracts Show the concrete application of KDD in building molecular model to design manufacturing.

Not only that, on the internet there are many KDD electronic publications, including the fortnightly knowledge Discovery nuggets the most authoritative, such as to free subscriptions, just to http://www.kdnuggets.com/ Subscribe.html sends an email and downloads a wide variety of data mining tool software and a typical sample data warehouse for people to test and evaluate. Another online weekly for ds* (DS Representative decision Support), October 7, 1997 began publishing, can submit a free subscription to dstrial@tgc.com application. Online, there is also a free forum DM email Club, people through e-mail to discuss DMKD hot issues.

As for DMKD books, you can find more than 10 copies in any computer bookstore, but mostly with commercial color. The author suggests that interested persons may read the book "Advances in Knowledge Discovery and Data Mining" published by the United States Aaa/mit in 1996. At present, the most influential typical data mining systems in the world are cover Story, explora, knowledge Discovery Workbench, DB Miner, Quest and so on.

Iii. Content and nature

With the gradual deepening of DMKD research, it is becoming more and more clear that DMKD's research mainly has 3 technical pillars, namely, database, artificial intelligence and mathematical statistics. Database technology After the brilliant 80 's, has become a kind of database culture or fashion in all walks of life, in addition to focusing on distributed database, object-oriented database, multimedia database, query optimization and parallel computing technology, the database industry has begun to rethink. is the application of database essence just a query? The most essential technological progress of relational database is the separation between data storage and data use. The query is the slave of the database, the discovery is the master of the database, the data only serves for the staff, does not serve the boss! This is the leader of many units in the enthusiastic database building after the sigh.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.