Introduction to "SQL Server 2008 Business Intelligence BI" data mining

Source: Internet
Author: User


What exactly is data mining?

obviously data mining is not magic,Data Mining is the use of complex mathematical algorithms, so that we can use the computer's powerful computing power to sift through a large number of detail data, to find some meaningful information to find patterns, correlations and clustering in the data. It also allows us to get rid of the hard work of doing this kind of digital arithmetic by hand.



So why do we have to know about data mining again?

relational database systems are adept at documenting daily business transactions and accumulating large amounts of data.
The multi-dimensional data system aggregates the data by aggregation, but the data is multiplied because it is aggregated according to countless dimensions and hierarchies.
when the data is too much to tolerate, it can only be dumped into backup tapes, or stored in documents, and they are gradually forgotten.

But the problem is that the data is a record of the life course of the enterprise. This data contains a complete record of how the business has been in the past, and more importantly, it provides clues as to how the enterprise will be managed in the future, which may be helpful in managing the business.

How do you derive meaningful information from all of this data?

we use data mining to correlate data, and initially, when people look at the data, they may not immediately find these associations, but at least the association makes it easier to understand the data, and once people have a fresh understanding of the data, they can analyze the data more effectively and set the direction.
with further observation, we can predict what is hidden under these data. Any prediction of an unknown situation has a first-division of the wrong conclusion. Then using the patterns shown by the associations to make predictions, we predict that the right possibilities are great. The more data we have, the more likely we are to make the right predictions.



Does this mean that as long as we arbitrarily apply one of the complex mathematical algorithms to our data, we get a lot of business intelligence bi knowledge?
This is not exactly the case.

We need to know what data mining algorithms can do for us.

We also need to know what we have to do to get business intelligence using data mining technology. that is, we need to follow certain steps to prepare the data and algorithms for the excavation process, and we need to evaluate the results so that we can find gold from the excavated gravel.




What can data mining do for us?

1. Classification

classifications are used to predict the value of a discrete attribute, and the value of a discrete attribute is one of a set of different values.

A simple example is that we may divide the customer into two categories: low credit risk and high credit risk. If we know how to categorize individuals, businesses, or things, we can make smarter decisions when we're dealing with individuals, businesses, or things.

First, choose which classification to make, that is, to select the attribute value (predictive attribute) in the future trading situation that we want to predict, such as credit risk.

Then, viewing historical data, the value of the predicted attribute in historical data is known, such as net assets, annual operating income, invoice payment history.

Next, we need to determine from these historical data which attributes are most differentiated (distinguishing attributes) and can distinguish a worthy customer with a predictive attribute from another worthy customer.

Finally, we use these distinguishing attributes to predict the value of the predicted attribute in future transactions.


2. Return

Regression is used to predict a continuous value, whereas a general measure is a variable of a continuous value, such as sales.

When predicting continuous values, the regression is looking for trends that may persist and repeat over time. For example, sales may be seasonal, with spikes occurring over a certain month. When viewing sales figures in historical data, the regression algorithm finds these peaks and follows this trend when predicting sales for future years.

Like classification, regression also looks for the value to predict, and the relationship to other contiguous values in historical data. Oil prices, for example, could have a big impact on sales (as is the case for the SUV's sellers), and in predicting monthly sales, the regression algorithm could take the oil price of the month as a factor into the forecast.


3. Subdivision

Subdivision is a method of divide and conquer data analysis.

Subdivisions divide the data into groups with similar characteristics, and then you can analyze the characteristics of each group in more depth. Because when you look at a company's groupings, you can see what's not obvious when you view a company separately.

For example, a company is grouped according to the number of employees, a group of companies that may be over 1000 people, a group of companies that may be between 500-999, and so on. You can then look at each grouping to see what the customers in each group are generating, what types of needs they have, how much time they spend on management, and so on.


4. Grouping

Grouping is the first of several types of grouping in the data.

The association algorithm examines the groupings found in historical data so that the patterns between grouped members are found. For example, in a very large number of groups, entries A, B, c all appear together, according to the pattern found, you can predict the composition of the future groupings, that is, if a, B are in a group, then C will be added to the possibility of a very large, and we are most familiar with the Purchase advice is generated by the association (bought YY Book of people also bought XX book ...) )。


5. Sequence analysis

Sequence analysis is used to check the stop order of a line.

First, the algorithm processes data from past lines.

Then, the algorithm can predict the future of the line. Given a current stop, the algorithm is able to determine the likelihood of moving in accordance with the given route.

Sequence analysis is often used for site navigation, such as User A on page A, he is likely then to browse page B, and to browse page C is less likely.

Sequence analysis can be applied to other types of event providers that occur in sequence. Customers may purchase products in a specific order, use the services, and we can analyze the data to determine which products the customer may buy next, or what services they are interested in.



What are the steps of data mining?

1, problem definition

2. Data preparation

3. Training

4. Verification



Finally, but as I said before, data mining is not magic. It predicts events not because they can see the future. Instead, it is just a mathematical way to analyze what is going on in historical data and to determine what is most likely to happen if the current trend continues.

Then there is always the case that some external factors are causing the current trend to go down. Although all others follow current trends, the users, buyers, or prospects we are trying to analyze may be the ones that do not follow current trends.

So, with data mining, we can at least determine what the current trend is in a certain degree of confidence. Then we can make informed decisions based on these trends. Imagine that without data mining, we don't know about trends and associations, so we can only operate with intuition.


Introduction to "SQL Server 2008 Business Intelligence BI" data mining

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.