Overview of data Mining for databases (i)

Source: Internet
Author: User
Tags mail variables require
Data | Database with the development of database technology and the extensive application of database management system, the amount of data stored in the database has increased dramatically, and many important information is hidden behind a large amount of data, if the information can be extracted from the database, it will create many potential profits for the company, The technology of mining information from massive database is called data mining.

Data mining tools are able to predict future trends and behavior, this is a good way to support people's decisions, for example, through the analysis of the company's entire database system, data mining tools can answer such questions as "which customer is most likely to respond to the mail sales of our company," and so on. Some data mining tools can also solve some of the traditional problems that are very time-consuming, because they can quickly browse the entire database and find extremely useful information that some experts are not aware of.

A brief introduction to the basic techniques of data mining is given below.

The foundation of Data mining

Data mining is the result of long-term research and development of database technology. At first, a variety of commercial data is stored in the computer database, and then developed to the database can be queried and accessed, and then developed to the real-time traversal of the database. Data mining enables database technology to enter a more advanced order
Segment, which can not only query and traverse past data, but also identify potential links between past data to facilitate the delivery of information. Data mining technology is now available in commercial applications, because the three basic technologies that support this technology have matured, and they are:

Massive data collection
Powerful multiprocessor computers
Data Mining algorithm

The business database is now growing at an unprecedented rate, and data Warehouse is widely used in a variety of industries; the requirement of higher computer hardware performance can also be satisfied by the technology of parallel multiprocessor which is now ripe; Besides, the data mining algorithm has become a mature and stable one after more than 10 years of development. , and easy to understand and operate the technology.

From business data to business information evolution, each step is built on the basis of the previous step. See the table below. As we can see in the table, the fourth step evolution is revolutionary, because from the user's point of view, this phase of the database technology has been able to quickly answer a lot of business problems.

Evolution Stage Business problem support technology products product characteristics data collection
(60) "What is my total income in the past five years?" "IBM, computer, tape, and disk, CDC provides historical, static data access to information
(80) "What was the sales in the New England division last March?" relational databases (RDBMS), Structured Query Language (SQL), ODBC Oracle, Sybase, Informix, IBM, and Microsoft provide historical, dynamic data warehouses at the record level; decision support
(90) "What was the sales in the New England division last March?" What conclusions can Boston draw from this? Online analytical Processing (OLAP), multidimensional databases, data warehousing Pilot, Comshare, Arbor, Cognos, microstrategy provide backtracking, dynamic data mining at various levels
What's the sales going to be like in Boston next month? Why? "Advanced algorithms, multiprocessor computers, massive database Pilot,
Lockheed, IBM, SGI, other startups provide predictive information

Table I. The evolutionary process of data mining.

The core module technology of data mining has undergone several decades of development, including mathematical statistics, artificial intelligence, machine learning. Today, these sophisticated technologies, coupled with high-performance relational database engines and extensive data integration, have put data mining technology into a practical phase in the current data warehousing environment.

Scope of data mining

The name "Data mining" comes from something akin to digging up valuable mineral deposits in mountains. In business applications, it is shown to search for valuable business information in a large database. Both processes require a detailed filtration of huge amounts of material, and the need to intelligently and accurately locate potential value
In. For a database of a given size, data mining technology can generate huge business opportunities with the following capabilities:

Automatic trend forecasting. Data mining can automatically search for potential predictive information in large databases. Problems that traditionally require a lot of experts to analyze are now able to quickly and directly find answers in the middle of the data. A typical example of using data mining for forecasting is targeted marketing. Data mining tools can identify customers who are most likely to respond to future mail sales based on a large amount of data in past mail sales.

Automatically detects patterns that have not been discovered before. Data mining tools scan the entire database and identify hidden patterns, such as analyzing retail data to identify products that appear to be out of touch, and in fact have been sold together in many cases.

Data mining technology can make existing software and hardware more automated and can be implemented on upgraded or newly developed platforms. When a data mining tool runs on a high-performance parallel processing system, it can analyze a very large database in a few minutes. This faster processing means that users have more opportunities to analyze the data, make the results of the analysis more accurate and reliable, and easy to understand.

Database can expand depth and breadth

In depth, more columns are allowed to exist. In the past, when conducting more complex data analysis, the experts were limited to the time factor and had to limit the number of variables involved in the operation, but the variables that were discarded without taking part in the operation might contain other useful information that was not known. Now high-performance data mining tools allow users to make a comprehensive database of the depth of the calendar, and any possible candidate variables are taken into account, do not need to select a subset of variables to do the operation.

Breadth, allowing for more rows to exist. Larger samples reduce the probability of errors and changes so that users can deduce some small but important conclusions more precisely.

Recently, a senior technical survey by Gartner group ranked data mining and artificial intelligence as "the top five key technologies that will have a profound impact on industry over the next 3-5 years", and has also ranked parallel processing systems and data mining as the top two of the ten emerging technologies for investment over the next five years. According to a recent Gartner HPC study, "with the rapid development of data capture, transmission and storage technology, large system users will need to adopt new technologies to tap the value outside the market, using a broader parallel processing system to create new business growth points." ”

The techniques most commonly used in data mining are:

Artificial neural network: The model is identified by learning to imitate the non-linear prediction models of physiological neural networks.

Decision Tree: A tree-shaped structure that represents the decision set.

Genetic algorithm: Based on evolutionary theory, and using genetic combination, genetic variation, and natural selection design methods such as optimization techniques.

Nearest neighbor algorithm: A method of classifying each record in a data collection.

Rule derivation: To find and deduce the "if-then" rule in the data from a statistical point of view.

Some of the specialized analytical tools used in the above techniques have been developed for about 10 years, but the amount of data that these tools face is usually small. Now these technologies have been directly integrated into many large industrial standard data warehouses and online analysis systems.




Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.