A data processing solution based on IBM SPSS data Collection Automation script

Source: Internet
Author: User

Background: Why data processing is needed after data collection

The main function of data Collection is the collection. It is based on a questionnaire, supports a variety of ways including WEB, CAPI, CATI to collect data, and supports a variety of data format storage to meet a wide range of user needs, mainly covering the following tools:

Typically, the Data Collection business process is like this:

Through interview Server/professional/interviewer/paper/scan, the respondents ' questionnaires can be collected into the database.

In the Professional, the further processing of the data is realized through the script;

Finally, the report is produced in Reporter to extract valuable information contained in the data and provide support for decision analysis.

The presence of errors or outliers makes data processing an important step in the process of data collection and reporting, since only high-quality data can ensure meaningful reporting.

What causes error or abnormal data to be stored in the database?

The main aspects are as follows:

In the paper Media questionnaire processing system based on Scan/paper, due to the inability to provide appropriate means to prevent illegal data input, it is possible to produce a number of options to input the answer medium error data;

In the telephone interview based on interviewer, the answer is to be added to the system by the staff side of the telephone question and answer, in the process, will inevitably produce some artificial input error;

Flawed questionnaire design or incorrect access logic introduces abnormal data.

In order to ensure the accuracy of the report, first of all, in the process of data processing for these abnormal data to take a certain form of conversion.

For example, a single selection, whose definition includes 5 options, is as follows:

Excellent

Good

So so

Poor

Don't understand

Respondents in the answer process should be the only choice for this issue, in the follow-up report analysis needs to be a proportional statistics of each option. If an interviewee chooses both excellent and general at the same time, it is clear that the answer is abnormal data, the data will be included in the later statistical reports, will reduce the quality of the report.

This type of exception data can be handled mainly in the following ways:

Direct use option does not understand to replace all exception values

Use intermediate values between two values to be good instead of

A random use of these two values;

Use a certain algorithm to alternately use one of these two values;

Delete entire record with exception data

These processing methods can be based on the data Collection data Management process, in the data Collection Professional products by scripting to achieve.

Introduction to IBM SPSS data Collection Data management

Data management framework in IBM SPSS Collection

The data management of IBM SPSS Collection, including the conversion of data between different formats, and the conversion of some problem data values in each record, is implemented through DMS scripts. A DMS file is a text file with a suffix of DMS, which can contain all the tasks for data management. It uses not a new language, but a language based on a variety of industrial standards, such as:

SQL syntax: Used to query data sources based on OLE DB provider

Mrscriptmetadata: A definition for adding a new problem (question) to a DMS script

Mrscriptbasic:mrscriptbasic is a programming language similar to VBScript. The processing logic of the problem value in data management can be written by it

Data management process in IBM SPSS Collection

Figure 1. Data management process in Collection

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.