Understanding IBM SPSS Modeler and database integration and optimization
In the first part of the three-part series on IBM SPSS Modeler and Database Integration modeling and optimization, we talked about using database nodes to write and read data to a database. Next, we'll introduce the use of SPSS Modeler applications to build, score, and store models in a database. Through integration, the analysis function and usability of SPSS Modeler can be combined with the powerful performance of database, and the database-owned algorithm provided by database provider can be used to model. Through SPSS Modeler, the model can be created in the database, and then the use of SPSS Modelerd friendly GUI interface to browse the model in various ways and using the model for data scoring, can be quickly applied to enterprise real-time decision-making. Modeling in a database can provide the following benefits:
The algorithm in the database is closely combined with the database server to improve performance.
Models built and stored within a database are easier to share or deploy between applications that can access the database.
The SPSS Modeler directly with the enterprise existing database integration data mining, avoids the enterprise funds the duplication investment.
This paper assumes that readers are familiar with how to establish database connection and some other basic operations in SPSS Modeler, such as creating data streams, editing nodes and so on.
To enable modeling in a database
After installing SPSS Modeler, the default database modeling nodes are hidden and we need to enable database modeling in the Secondary Application dialog box (Main Menu Tool > Helper application). For example, in Microsoft Analysis Services, select Enable Microsoft Profiling service integration. At this point, we can set the default value for the server, and note that the settings that are made in the Secondary Application dialog box will be overwritten later in each analysis Services node. Of course, you can also leave a direct point to determine the close dialog box, so each time you add an analysis Services node, you need to set the server information within the node.
To access the Microsoft Analysis Server from SPSS Modeler, you need to set up a few settings in Microsoft SQL Server and analyze Services, so let's just click OK to close the Secondary Application dialog box first. You can see an extra set of database modeling nodes at the bottom node palette. Including decision Tree, clustering, association rules, Naive Bayes, linear regression, neural network, Logistic regression, time series, sequence clustering, etc...
Figure 1. Secondary applications
Integrated modeling and examples with Microsoft analysis Services
The following figure illustrates the data flow from the client to the server, where the database mining is managed by SPSS Modeler Server. Modeling uses data from SPSS Modeler server to SQL Server, and then from SQL Server to Analysis Services for model building in analyze services. The resulting model is stored in the analysis Services database and only references to this model are saved in the SPSS Modeler stream. You can then use the model to score in Microsoft SQL Server or SPSS Modeler.
Figure 2. Integration sample
Necessary conditions and settings
There are some prerequisites and settings for using the analysis Services algorithm in SPSS Modeler to perform database modeling.