Data mining is one of the most exciting new features of SQL Server . I view data mining as a process that automates the analysis of data to obtain relevant information, and data mining can be integrated with either relational or OLAP data sources, but the benefits of integration with OLAP are extremely significant. Because the structured data source makes the user no longer need to provide the massive information to the data mining algorithm. Although not an expert, I learned from my colleague Greg Bernhardt that it was enough for me to explain the data mining, and I also wanted to make the data mining feature of the analytics service no longer mysterious and show you how to use data mining in an analytical application.
Data mining capabilities
Data mining makes up for the important deficiencies in the Analysis service function. Microsoft has introduced analysis and exploratory analysis capabilities for specific issues in SQL Server 7.0. In the analysis of a particular problem, the parser has to be clear about what questions the user needs to answer and simply use the Oalp engine to get the relevant information. For example, a fast-food manager might want to know: "What about the sales and profits of hamburgers in the last four quarters?" ”
In exploratory analysis, the parser may have an understanding of the user's interests, but no need to answer specific questions. For example, a company may know that some of its retail outlets are unprofitable, but don't know why. In an OLAP cube, the parser obtains the most interesting data by obtaining more details, multidimensional search, and we call this process data surfing.
Data mining is not the same as "specific problem" analysis and exploratory analysis. Through data mining, Analysis Services can browse information, find relevant data and submit data. Data mining can be said to be the ideal partner for exploratory analysis.
SQL Server 2000 implements data mining functionality through the new Api━━ole db for Data Mining (OLE db for DM), a programming interface designed to facilitate the use of data mining for a variety of applications. With Ole DB for DM, Microsoft provides two data mining algorithms (other software vendors can also insert new algorithms). Theoretically, the analytical programs developed using OLE DB for DM are able to use the newly invented algorithm.
The two algorithms included in SQL Sever 2000 are decision trees and groups. The decision tree classifies information into a tree structure to help us predict some of the characteristics of the data. For example, user information, such as income, marital status, and educational attainment, can be referred to a decision tree algorithm to help predict whether the user has a credit risk. You can use clustering algorithms to find natural groupings in your data. For example, all user information can be fed into the clustering algorithm, requiring that all customers be grouped into three groups. The algorithm may find a married, highly educated and poorly paid group, a single, high-income group and a less educated and higher-income group. Through further analysis, it may be found that each group has a specific way of shopping. Using this information, we can carry out high efficiency, targeted advertising activities. Note that the second to third group does not fully use all three entry information, because not all input information is important for grouping. Therefore, the description of the second group does not include education because the grouping is not divided according to educational level.
Using Analysis Services
If the object for data mining is an OLAP cube rather than a relational database, you cannot use OLE DB for DM directly because the profiling service can implement data mining functionality through its own programming interface. On the server side, data mining is available through the Decision support object (Decision Support objects,dso), while on the client side you can use data mining functionality through OLE DB for OLAP or ActiveX MD.
Compared with the clustering algorithm, I prefer the decision tree algorithm, because it can be based on decision tree algorithm for data mining results to create a new "dimension", we can incorporate these "dimensions" into a new virtual cube, you can use the results of data mining to browse existing dimensions.
To create a mining model
To create a mining model, you need to open the Profiling Manager (Analysis Manager), expand the left tree Navigation window, and open the FoodMart 2000 database. You will see the Mining Model folder (the mining model defines the specific data that is being mined and the type of prediction that is made based on that information). , right-click the folder, and select the new mining Model (Mining) to run the Mining Model Wizard. The wizard first requires you to choose whether to do data mining in relational or multidimensional datasets. When you select multidimensional, click the Next button. Then select the dataset to be mined (in this case, select the sale DataSet). The third step of the wizard is to select the Data mining algorithm, select Microsoft Decision trees (Microsoft's decision tree), and then click the Next button.
The next step is to select the Data mining container (the container is the data entity for the new mining multidimensional model mining). Select the customers peacekeeping name layer. Next, select the type of forecast you want to make. Assuming that the data mining algorithm has input and output, this example takes the user's relevant information as input, the yearly income as the predicted entity or output. Yearly income is a member property of the Customer dimension name layer. In turn, select a member of the "Case level" (the members property of the container layer) ―> yearly income, as shown in Figure 1.
The next step is to select that portion of the data in the OLAP cube that you want to use to forecast yearly income. Select all layers in the Customers dimension, customers dimension, and the member attribute in the name layer, as shown in Figure 2 (these are the default settings). Note that yearly income is both input and output, because we are training the mining model. For training the mining model, the algorithm needs the correct answer. (for example, actual yearly income for existing users)
(Figure 2)
Click the Next button to create an OLAP peacekeeping virtual cube. The dialog box settings are shown in Figure 3. Mining wizard mode will create a virtual cube based on the Sales cube, add a new data mining dimension named Predictincome, and click Next to name the mining model and decide whether to execute it immediately. We'll name the mining model established as Incommodel, click the Save button and execute it immediately. When the Finish button is clicked, the profiling service processes the data and displays the results of the mining in the mining Model Editor.