Introduction
Data mining software IBM SPSS Modeler is known for its user-friendly, visually powerful features. There are few references to its scripting features. The author believes that the scripting function is actually designed to automate the process of data processing and analysis modeling. In scenarios where data processing needs to be dynamically changed, automatic execution of streams, and automatic execution of batch tasks, some scripts must be supplemented to perform certain functions. Therefore, the scripting feature is a necessary complement to the user interface, not just the code of the user interface mouse operation function.
The Scripting User Guide with the SPSS Modeler does not organize the content according to the common scenarios of the scripting, which makes it inconvenient for the scripting people to refer to the reference. At the same time, the lack of a complete practical example, most of the examples given are analog user interface common operations. The reality is that scripts are often written to supplement features that are rarely or never implemented in the user interface. The author often frets about not finding an example to refer to.
This article first introduces the five common scenarios where scripting is impossible or inconvenient to implement on the user interface. In each scenario, a complete application example is given, focusing on the methods and techniques of scripting. In the second section, based on the author's experience, the common techniques for scripting are summarized. Examples attached to this article come from the actual project and are tested through the SPSS Modeler 15.0 environment.
Scenarios for scripting features
Under what circumstances does scripting function be required? According to the author's experience, you should consider scripting functionality when you need to repeat some data processing, a process that requires dynamic data processing, and a stream that ultimately needs to be deployed to a third-party environment, where the data flow needs to be automated (not mouse actions), and you need to bulk modify existing data flows or automate batch tasks.
Data processing for repeated executions
As we know, the Modeler data stream is performed sequentially by default, and the data stream that is connected sequentially by multiple nodes specifies the order of the processing in advance. However, it is often inconvenient to perform manual execution when some data streams need to be executed repeatedly and possibly with parameters in the actual modeling. At the same time, it may be necessary to repeat a stream of data (implementing a dynamic loop) based on the value of a variable, which in this case must be implemented using scripting.
The data flow shown in Figure 1 is derived from a time series model that predicts product sales. The total sales for each sales branch (IMT) in the next quarter need to be predicted separately. When the sales organization is more (=21) and dynamically changing, it is necessary to loop out the values of IMT line by row, and then set the Select1 and IMT nodes according to the Table node's imt_list output, so as to achieve dynamic and repetitive data processing. The main trick here is to loop the number from the Table node.
Figure 1. To iterate from a Table node
Figure 1 The three nodes in the Chinese box are the main parts of the script, and the corresponding script reads as follows:
Listing 1 script Content-Looping from Table node
Listing 1. Script Content-Looping from Table node
Key to scripting: The Execute Table node reads all the loop variable values. Use the output property of the result object and the value command to read the value of the loop variable one by one. Use the SET command to dynamically assign values to multiple nodes.