I used the important function of SPSS summary

Source: Internet
Author: User

First, theSPSS article

(1) using SPSS to remove outliers

Outlier: The measured value of a set of observations with an average deviation exceeding twice times the standard deviation.

First,analyze >> Descriptive statistics >>descriptives>> Select variable (column) to the right box >> Click Save standardized values as variables >> select OK

Second, select Select casesin data ,then select if Correlation, point button settings, enter after input -2<=  variables & variables <=2,continue, then unselected casees is Filtered or deleted, then OK .

(2) correlation analysis

Indicators: Correlation coefficients and p -values. the sig is the p -value, which represents the significance of the hypothesis test, usually if sig<0.05,

Reject the null hypothesis (the original hypothesis), accept the alternative hypothesis, and vice versa there is no good reason to reject the null hypothesis
For correlation analysis, usually sig<0.05 is the result that the researcher wants to see, because it means that the correlation coefficient has statistical significance, and there is a correlation between the variables.


A.spearson Correlation: Calculates the correlation coefficient and makes the significance test, applies to two column variables are the normal distribution continuous

Variables of variable or equal spacing measure

B.kendall Tau-b grade correlation calculation correlation coefficient and make significant test, the data distribution is not strict,

Applicable to the degree of correlation between test-level variables (rank -dependent)

C.spearman grade correlation calculation correlation coefficient and do significant test, the data distribution is not strict, applicable

In the case of a rank variable or a rank variable that does not meet the normal distribution.  

For the continuous variable of the non-equal spacing measure, because the distribution is unknown can use the hierarchical correlation analysis, can also use

Pearson correlation Analysis,

For discrete variables of complete rank, the correlation of hierarchical correlation must be used

When the data is not subject to a bivariate normal distribution or the population is unknown, or the original data is represented by a hierarchy, it is advisable to use

Spearman or Kendall related

In general, we all have a person. Data obey normal distribution and adopt Pearson correlation coefficient

Partial correlation: The partial correlation analysis is to consider whether there are other variables that affect both variables except for the variables analyzed. (For example, analysis of the correlation between height and sprint performance, because lung capacity also affects height and sprint results, so need to remove the effect of this variable)

Distance correlation analysis: Calculation of distance similarity and dissimilarity between cases

(1) Regression analysis

Linear regression, nonlinear regression, categorical regression. The definition of linear regression: it is the optimal linear unbiased estimation under the classical statistical hypothesis based on the principle of least squares. is a statistical method for studying whether there is a linear relationship between one or more independent variables and a dependent variable.

In the Statistics tab, estimates, model fit, collinearity diagnostics, and DW test statistics are generally checked.

In general, tolerance, variance expansion factor (VIF, the reciprocal of tolerance) as a common linear diagnostic indicator. Generally, the value of tolerance is between0and the1, if the value is too small, it indicates that there is collinearity problem between the self-variable and other independent variables.VIFThe larger the value, the more obvious the collinearity problem is, generally less thanTento judge the basis(Neter et al.,1985). DWvalues are used to test for the existence of autocorrelation in the residuals in the regression analysis.DWvalue is between0and the4between: Residual first-order positive correlation,DW≈0residual first-order negative correlation,DW≈4; When the residuals are independent,DW≈2. Analysis results, such as table5.3with the table5.4) shows that each variable'sVIFare far less thanTen,DWvalues also meet the requirements, indicating that there is no collinearity problem between the individual arguments.

Analysis results explained: First look at the R -side of the Model summary table , this value is between 0 and 1 , indicating that your equation can explain how much of your model, the closer 1 The better . Then look at the variance analysis table, the first row of the regression corresponding to the last side of the P - value to characterize the equation is not credible (less than 0.05 ). Then look at the coefficient table, the P -value in this table will tell you whether each argument is credible in the equation, and the table shows the coefficients of each argument in the equation, There are non-standardized coefficients (mainly see this) and standardized coefficients (the coefficients that are calculated after your data normalization). each hollow circle on the P-map should be worn as far as possible on the top of the line, the closer the center to the line the better.

Least squares:


(1) Descriptive statistics, frequency analysis

Frequency: The distribution frequency and descriptive statistic of the values of each variable.

Description: mean, standard deviation, variance, range, kurtosis (kurtosis is an indicator used to measure the concentration of distribution or the cusp of a distribution curve), skewness (skewness is an indicator used to measure the degree of asymmetry or skewness of a distribution).

Explore: The dependent variable list is a variable in the list that is used as the target variable in the exploratory analysis, usually a continuous variable or a proportional variable. The factor list is the grouping variable of the target variable, and the target variable of the desired analysis is grouped, and the property is usually character or digital type.

P -Map: Test the distribution of data compliance.

Q-q diagram: Test the distribution of data compliance.

Crossover Rate: cross-table analysis is primarily used to test whether there is a relationship between two variables, or whether it is independent, and 0 assumes that there is no relationship between the two variables.

Ratio: Calculates the statistical characteristics of two variables in comparison with each other. (For division; direct comparison)

A-p graph is a graph drawn based on the relationship between the cumulative proportions of the variables and the cumulative proportions of the specified distributions. It is possible to verify that the data conforms to the specified distribution through the P-map. When the data conforms to the specified distribution,the points in thep - map approximate to a straight line.

(1) parameter and non-parametric test

The use condition of the parameter test is that the sample is subjected to normal distribution, and the condition of non-parameter test is that the general disobedience or uncertainty obeys the normal distribution.

Parameter Test parameter test, the statistical test of the mean and variance of theparameters , its application range is when the overall distribution is known (as the population is normally distributed), according to the sample data on the overall distribution of statistical parameters inferred. At this time, the overall distribution form is given or assumed, but some of the parameters of the value or range is unknown, the main purpose of the analysis is to estimate the value of the parameter, or to do some statistical testing. This kind of problem often uses the parameter examination to carry on the statistical inference. It is not only possible to infer the overall characteristic parameters, but also to achieve the comparison of two or more overall parameters.

Parameter test:

The most common non-parametric tests include run-length inspection and single-sample k-s test.

Run Test:

It is commonly used to detect whether the order in which two different observations occur is random. We choose Analysis - non-parametric inspection -old dialog box -run, and select our 0,1 variable column in the main panel's check variable list. Inside the tab, select Descriptive, other default. Cut points can be selected all the time. The output can be seen by the p -value.

Single sample k-s test:

This is more important. The purpose of this test is to observe the distribution of samples. As long as we want to do correlation and regression, we'd better use the k-s test to check the distribution of samples. After all, an important condition for the effective Pearson correlation coefficient is that the sample obeys normal distribution.

We choose Analysis - non-parametric inspection - Old dialog --1 Sample K-s, In the list of test variables in the main dialog box, select the variables that we want to test for distribution (such as the number of blood cells for a group of people), and the options check the descriptive and four-bit numbers, the other default. At the bottom of the test distribution there are four boxes to check, this should pay attention to, the conventional refers to the normal distribution, the equivalent is to mean evenly distributed, tick the distribution you want to test (usually normal distribution). Make sure you can see the results later.

Multiple independent non-parametric tests:

K-w test: Used to determine whether each sample represents the overall consistency.

Non-parametric test of two related samples :

Wilcoxon Test : used to check whether the distribution of two variables is different.

Non-parametric inspection of multiple related samples:

Friedman test: Used to verify that multiple related samples are from the same whole, is an extension of the Wilcoxon.

KENDALLW test: Test the consistency of the sample.

(1) SPSS to make predictions

When we create a model in the prediction method, remember that you must first define the time series and tags of the data!

To know the starting and time interval of the data.

PASW Statistics offers three main classes of predictive methods:1- Expert Modeler,2- exponential smoothing,3-arima

Exponential Smoothing method

Exponential smoothing helps predict the existence of a sequence of trends and/or seasons, where the data reflect both of these characteristics. Creating the most appropriate exponential smoothing model includes determining the model type (whether the model needs to include trends and / or seasons), and then obtaining the parameters that best fit the selected model.

To help us find the right model, it's best to draw the time series first. Visual inspection of time series can often be a good guide and help us to choose. In addition, we need to clarify the following points:

? is there an overall trend for this sequence? If so, is the trend to show persistence or is the display going to fade over time?

? Does this sequence show seasonal changes? If so, is the volatility of the season increasing over time or is it consistent?

(interpretation of parametric tests in regression analysis and correlation analysis)

(6)SPSS do classification

Two-step clustering,K -means, system clustering, decision tree,k - nearest neighbor


I used the important function of SPSS summary

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.