Background: Why data processing is needed after data collection
The main function of data Collection is the collection. It is based on a questionnaire, supports a variety of ways including WEB, CAPI, CATI to collect data, and supports a variety of data format storage to meet a wide range of user needs, mainly covering the following tools:
Typically, the Data Collection business process is like this:
Through interview Server/professional/interviewer/paper/scan, the respondents ' questionnaires can be collected into the database.
In the Professional, the further processing of the data is realized through the script;
Finally, the report is produced in Reporter to extract valuable information contained in the data and provide support for decision analysis.
The presence of errors or outliers makes data processing an important step in the process of data collection and reporting, since only high-quality data can ensure meaningful reporting.
What causes error or abnormal data to be stored in the database?
The main aspects are as follows:
In the paper Media questionnaire processing system based on Scan/paper, due to the inability to provide appropriate means to prevent illegal data input, it is possible to produce a number of options to input the answer medium error data;
In the telephone interview based on interviewer, the answer is to be added to the system by the staff side of the telephone question and answer, in the process, will inevitably produce some artificial input error;
Flawed questionnaire design or incorrect access logic introduces abnormal data.
In order to ensure the accuracy of the report, first of all, in the process of data processing for these abnormal data to take a certain form of conversion.
For example, a single selection, whose definition includes 5 options, is as follows:
Excellent
Good
So so
Poor
Don't understand
Respondents in the answer process should be the only choice for this issue, in the follow-up report analysis needs to be a proportional statistics of each option. If an interviewee chooses both excellent and general at the same time, it is clear that the answer is abnormal data, the data will be included in the later statistical reports, will reduce the quality of the report.
This type of exception data can be handled mainly in the following ways:
Direct use option does not understand to replace all exception values
Use intermediate values between two values to be good instead of
A random use of these two values;
Use a certain algorithm to alternately use one of these two values;
Delete entire record with exception data
These processing methods can be based on the data Collection data Management process, in the data Collection Professional products by scripting to achieve.
Introduction to IBM SPSS data Collection Data management
Data management framework in IBM SPSS Collection
The data management of IBM SPSS Collection, including the conversion of data between different formats, and the conversion of some problem data values in each record, is implemented through DMS scripts. A DMS file is a text file with a suffix of DMS, which can contain all the tasks for data management. It uses not a new language, but a language based on a variety of industrial standards, such as:
SQL syntax: Used to query data sources based on OLE DB provider
Mrscriptmetadata: A definition for adding a new problem (question) to a DMS script
Mrscriptbasic:mrscriptbasic is a programming language similar to VBScript. The processing logic of the problem value in data management can be written by it
Data management process in IBM SPSS Collection
Figure 1. Data management process in Collection