IBM SPSS Modeler and Database Integration modeling and optimization (III.)

Last Update:2017-02-27 Source: Internet

Author: User

Tags execution

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

IBM SPSS Modeler and Database Integration modeling and optimization

In the first two parts of this three-part series on IBM SPSS Modeler and Database Integration modeling and optimization, we talked about basic operations and integration modeling related to databases, and this section focuses on performance optimization in the process of using the database for SPSS Modeler integration, Includes the introduction of a variety of functions and the use of skills:

Data manipulation loopback

Model Loopback

Database Plug-ins-Rating service adapters

Calling database functions

Rearrange execution order

Summary of performance promotion skills

This paper assumes that readers are familiar with how to establish database connection and some other basic operations in SPSS Modeler, such as creating data streams, editing nodes and so on.

SQL Loopback Functionality

SQL Loopback is one of the most powerful features of IBM SPSS Modeler, which performs many data preparation and data mining operations directly in the database. As the SQL loopback name literally means, its rationale is to execute it by generating an SQL statement that can be executed inside the database and then loopback to the database. By executing in a database without having to perform on the same computer as IBM SPSS Modeler or IBM SPSS Modeler Server, you can reduce the amount of data that needs to be transferred (such as sampling, summarizing, Connection and so on) and using the high performance of database parallel processing and indexing function, the whole performance optimization of data mining process is obtained. It is for this reason that the design of SPSS Modeler makes the most possible use of SQL loopback functionality. However, due to different database differences, and standard SQL can express the content constraints, in the SPSS Modeler or a part of the operation can not support SQL loopback. If these features are used in streaming files, they will be executed on the SPSS Modeler server side rather than in the database.

If you need to use the SQL loopback feature, you first need to purchase the appropriate license code for this feature, and then enable Generate SQL, see the Help documentation. Of course, all SQL loopback needs to have one or more database source nodes as the data source.

Data manipulation loopback

Data operations that gain performance gains by loopback to a database can be broadly grouped into two categories that reduce the size of data operations and the operations that can take advantage of database performance processing. The former such as selection (select), Summary (aggregate), sampling (sample), filtering (filter) and other operations, such as sorting (sort), field derivation (derive) and so on. We are here to make a simple introduction to the selection exercise. As shown in Figure 1, write out the Modeler sample file drug1n to the database (use DB2 9.7 here) and then read it from the database.

Figure 1. Data Loopback Sample Flow

Assuming that we do not need all the data, such as only data for people older than 45 years, then the expression in the Select node can be written as "Age > 45", as shown in Figure 2.

Data Loopback Sample Select node

As shown in Figure 1, the top toolbar has a purple "SQL" icon, and when you select an executable node (such as a Table node in the figure) in the operation area of the Modeler, the button becomes clickable and clicks to preview the SQL loopback of the stream branch where the currently selected executable node is located. It does not really perform data operations, it simply generates the SQL statements for the current stream branch for further use by the user as a reference, and we will use this preview function repeatedly to generate SQL statements later in this article. After clicking on this icon, the stream will briefly enter the Non-editable state, generate the stream SQL statements, and show whether the convection branch can be loopback. A node highlighted in purple can be echoed to the database for execution, and other unchanged nodes will be executed locally in SPSS Modeler. As shown in Figure 3, the first two nodes in the lower branch of this example can be loopback to the database execution, and the last node is executed locally.

Data Loopback Sample Preview

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More