Data mining Algorithm (analysis services–) Data mining algorithm are a set of heuristics and calculations that creates a data mining m Odel from data. "Xml:space=" preserve "> Data mining Algorithms" is a set of heuristics and calculations that create data mining models based on data. To create a model, the algorithm first analyzes the data you provide and looks for patterns and trends of a particular type. The algorithm uses the results of this analysis to define the best parameters for creating the mining model. These parameters are then applied to the entire data set to extract the feasible patterns and detailed statistics.
The mining models that the algorithm creates based on your data can take many forms, including:
- A set of classifications that describe how the cases in the dataset are related.
- A decision tree that predicts the results and describes how different conditions affect the results.
- A mathematical model for predicting sales.
- A set of rules that describe how products are grouped together in a transaction, and the probability of a product being purchased together.
Microsoft SQL Server Analysis Services provides a variety of algorithms that are used in data mining solutions. These algorithms are the implementation of some of the most popular methods used in data mining. All Microsoft data mining algorithms can be customized and fully programmable by using the provided APIs or by using Data mining components in SQL Server integration Services.
You can also use a third-party algorithm that conforms to the OLE DB for data Mining specification, or develop a custom algorithm that can be registered as a service and then used in the SQL Server Data Mining framework.
Choosing the best algorithm for a particular analysis task is challenging. You can use different algorithms to perform the same business tasks, and each algorithm produces different results, and some algorithms produce multiple types of results. For example, you can use the Microsoft decision number algorithm not only for forecasting, but also as a way to reduce the number of columns in a dataset, because the decision tree recognizes columns that do not affect the final mining model.
Selecting an algorithm by type
Analysis Services includes the following algorithm types:
- classification algorithms predict one or more discrete variables, based on the other attributes in the dataset. The xml:space= "preserve" > Classification algorithm predicts one or more discrete variables based on other attributes in the dataset.
- Regression algorithms predict one or more continuous variables, such as profit or loss, based on other attribut Es in the dataset. The xml:space= preserve > Regression algorithm predicts one or more contiguous variables, such as profit or loss, based on other attributes in the dataset.
- segmentation algorithms divide data into groups, or clusters, of items, has similar properties. "Xml:space = "preserve" > segmentation algorithm divides data into groups or categories that have similar properties.
- Association algorithms find correlations between different attributes in a dataset. "Xml:space=" preserve "> Association The algorithm looks for correlations between the different attributes in the data set. The most common application of such algorithms is to create association rules that can be used for Market basket analysis.
- Sequence Analysis Algorithms summarize frequent sequences or episodes in data, such as a Web path flow. The "xml:space=" Preserve > Sequence analysis algorithm summarizes common sequences or events in the data, such as Web path streams.
However, it is not necessary to limit one of the algorithms in your solution. Experienced analysts sometimes use an algorithm to determine the most efficient input (that is, a variable) and then apply other algorithms to predict specific results based on that data. SQL Server Data Mining enables you to build multiple models based on a single mining structure so that within a single data mining solution, you can use clustering algorithms, decision tree models, and naïve Bayes models to get different views of your data. You can also use multiple algorithms within a single solution to perform separate tasks: for example, you can use regression to get financial forecasts, and use neural network algorithms to perform sales impact factor analysis.
Selecting Algorithms by Task
To help you choose an algorithm for a specific task, the following table gives recommendations for the types of tasks that each algorithm has traditionally used.
Task Example |
Microsoft algorithms that you can use |
Predicting discrete attributes
- A prospective customer who marks the customer in the expected buyers list as good or bad.
- Calculates the probability that the server will fail within the next 6 months.
- Classify patient outcomes and explore related factors.
|
Decision Tree algorithm Naive Bayes algorithm Clustering Analysis algorithm Neural Network algorithm |
Predicting continuous properties
- Forecast sales for the next year.
- Predict site visitors based on past historical information and seasonal trends.
- Generate a risk score based on demographic information.
|
Decision Tree algorithm Timing algorithm Linear regression algorithm |
Forecast Order
- Perform a clickstream analysis of the company's website.
- Analyze the factors that cause the server to fail.
- Capture and analyze the sequence of activities during outpatient visits in order to develop best practices around general activities.
|
Sequential analysis and cluster analysis algorithms |
Find a group of common items in a transaction
- Use Market Basket analysis to determine product placement.
- Customers are advised to purchase additional products.
- Analyze survey data from event visitors to determine which activities or booths are relevant to plan for future activities.
|
Correlation algorithm Decision Tree algorithm |
Find groups of similar items
- Create a patient risk profile group based on attributes such as demographic information and behavior.
- Analyze users by Browse and purchase mode.
- Identifies a server that has similar usage characteristics.
|
Clustering Analysis algorithm Sequential analysis and cluster analysis algorithms |
The details of the algorithm, you can refer to the following articles, or in the MSDN Technical Documentation Center to find the source: MSDN Excerpt: http://www.datafew.com/archive/160.html
Data mining algorithm Analysis services-SQL Server-based data mining