Data Mining Overview

Source: Internet
Author: User
Tags mail variables query
Data
With the development of database technology and the wide application of database management system, the amount of data stored in the database has increased dramatically, and there is a lot of data hiding behind it.
Important information, if you can extract this information from the database, will create a lot of potential profits for the company, and this mining information from the mass database
Technology, which is called data mining.

Data mining tools are able to predict future trends and behavior, thus supporting people's decisions well, for example, through the company's entire database system
Analysis, data mining tools can answer similar questions such as "which customer is most likely to respond to the e-mail promotion of our company," and so on. Yes
Some data mining tools can also solve some of the traditional problems that are very time-consuming, because they can quickly browse the entire database, to find out that some experts are not easy
The most useful information to perceive.

A brief introduction to the basic techniques of data mining is given below.

The foundation of Data mining

Data mining is the result of long-term research and development of database technology. At first, all kinds of commercial data were stored in the computer database,
After the development to the database can be queried and access, and then developed to the database real-time traversal. Data mining enables database technology to enter a more advanced order
Segment, which can not only query and traverse past data, but also identify potential links between past data to facilitate the delivery of information. Now data digging
Digging technology is already available in commercial applications, as three of the basic technologies that support this technology have matured:

Massive data collection
Powerful multiprocessor computers
Data Mining algorithm

Business databases are now growing at an unprecedented rate, and data warehouses are being widely used in a variety of industries; higher computer hardware performance
, it can also be satisfied by the technology of parallel multiprocessor which is now ripe; Besides, the data mining algorithm has been developed for more than 10 years.
Mature, stable, and easy to understand and operate technology.

From business data to business information evolution, each step is built on the basis of the previous step. See the table below. In the table we can see that the four
Evolutionary evolution is revolutionary because, from a user's point of view, this phase of database technology can quickly answer a lot of business questions.

Evolution Stage Business problem support technology product characteristics of products Manufacturers
Data collection
(60) "What is my total income in the past five years?" "Computers, tapes, and disks IBM, CDC provides historical, static data information
Data access
(80) "What was the sales in the New England division last March?" "Relational database (RDBMS), Structured Query Language (SQL), ODBC
Oracle, Sybase, Informix, IBM, Microsoft provide historical, dynamic Data information at the record level
Data warehousing; Decision support
(90) "What was the sales in the New England division last March?" What conclusions can Boston draw from this? "Online analytical Processing (OLAP), multidimensional
Database, Data Warehouse Pilot, Comshare, Arbor, Cognos, microstrategy provide backtracking, dynamic data information at various levels
Data mining
What's the sales going to be like in Boston next month? Why? "Advanced algorithms, multiprocessor computers, massive database Pilot,
Lockheed, IBM, SGI, other startups provide predictive information

Table I. The evolutionary process of data mining.

The core module technology of data mining has undergone several decades of development, including mathematical statistics, artificial intelligence, machine learning. Today, these mature technologies,
Coupled with High-performance relational database engine and extensive data integration, data mining technology has entered a practical phase in the current Data Warehouse environment.

Scope of data mining

The name "Data mining" comes from something akin to digging up valuable mineral deposits in mountains. In a commercial application, it is represented in a large database
Search for valuable business information. Both processes require a detailed filtration of huge amounts of material, and the need to intelligently and accurately locate potential value
In. For a database of a given size, data mining technology can generate huge business opportunities with the following capabilities:

Automatic trend forecasting. Data mining can automatically search for potential predictive information in large databases. The traditional need for many experts to analyze the problem, now
You can quickly and directly find the answer from the middle of the data. A typical example of using data mining for forecasting is targeted marketing. Data mining tools can root
According to a lot of data from past email sales, find out which customers are most likely to respond to future mail sales.

Automatically detects patterns that have not been discovered before. Data mining tools scan the entire database and identify hidden patterns, such as analyzing retail data to differentiate
Do not appear to have no contact with the product, in fact, there are many cases are sold together.

Data mining technology can make existing software and hardware more automated and can be implemented on upgraded or newly developed platforms. When data mining tools are shipped
When it is on a high-performance parallel processing system, it can analyze a very large database in a few minutes. This faster processing speed means that users have more
opportunity to analyze the data, make the results of the analysis more accurate and reliable, and easy to understand.

Database can expand depth and breadth

In depth, more columns are allowed to exist. In the past, when making more complex data analysis, the experts were limited to the time factor and had to take the variable
Quantity is limited, but the variables that are discarded without taking part in the operation may contain other useful information that is not known. Now, high performance data digging
Tunneling tools allow users to make a comprehensive database of the depth of the calendar, and any possible candidate variables are considered, do not need to select a subset of variables to do
The operation.

Breadth, allowing for more rows to exist. Larger samples reduce the probability of errors and changes so that users can more precisely deduce some small
But it is quite important to conclude.

Recently, a senior technical survey by Gartner group ranked data mining and artificial intelligence as "the five major industries that will have a profound impact on industry over the next 3-5 years"
Key technology ", and also the parallel processing system and data mining in the next five years investment focus of the top ten emerging technologies in the top two. According to the recent Gartner
HPC research shows that "with the rapid development of data capture, transmission and storage technology, large system users will need to adopt new technologies to tap the price outside the market."
Value, a more extensive parallel processing system is used to create new business growth points. ”

The techniques most commonly used in data mining are:

Artificial neural network: The model is identified by learning to imitate the non-linear prediction models of physiological neural networks.

Decision Tree: A tree-shaped structure that represents the decision set.

Genetic algorithm: Based on evolutionary theory, and using genetic combination, genetic variation, and natural selection design methods such as optimization techniques.

Nearest neighbor algorithm: A method of classifying each record in a data collection.

Rule derivation: To find and deduce the "if-then" rule in the data from a statistical point of view.

Some of the specialized analytical tools used in the above techniques have been developed for about 10 years, but the amount of data that these tools face is usually small. And now this
Some technologies have been directly integrated into many large industrial standard data warehouses and online analysis systems.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.