System Usability Scale,sus is used to measure the ease of use of software, hardware, phones, and Web sites, consisting of 10 projects. When using SUS, there are 10 things you should know:
1. The average of SUS was 68: We collected 500 products on SUS and found that the average score was 68. Remember: The SUS score is not a percentage. 68 cents is the highest score of 68%, but it is on the 50% percentile. It is best to think of the original SUS score as a score, and if you want to use a percentage expression, you need to convert the original score to a percentage against the database.
2. SUS measures availability and accessibility: Although SUS is designed to measure usability in a single dimension, we find that there are two topics that can be used to measure accessibility: Question 4th (I think I need technical support to use this system) and 10th question (I need to learn a lot in order to operate this system). The following figure shows the relationship between the accessibility score and the availability score (10 topics and the remaining 8 topics after the topic is removed).
Depending on the type of system being tested and its maturity, the measurement of learning may be as important as usability measurement.
3. Reverse questions outweigh benefits: As with many questionnaires, each topic in SUS alternates alternating tones. The odd question is the positive expression, even the question is the negative expression. This approach aims to reduce the default and extreme reaction deviations. If you see someone who is quick to answer a questionnaire without reading the title carefully, you will find it a good idea.
In a paper we published a few years ago, we found that there was no difference in the response deviation between the scale of the positive expression and the original scale.
And unfortunately, we found the negative effects of alternating tones. 11% of the researchers mistakenly calculated SUS scores because they forgot even to score backwards. In addition, 17% of the studies we examined had problems where participants forgot to change the order of the even-numbered questions. Despite this disadvantage, the original SUS can still be used, only to check your topic code twice, and if so, you can follow up on your participants in a certain way when the score appears to be wrong.
4. Familiarity leads to satisfaction: We studied the SUS scores from software and Web sites, and the user's previous experience in the application affected the usability perception they had measured with SUS. In general, users with a lot of experience tend to think that an application is more usable. This is especially true for users with the most experience and the least experience (or no experience at all).
As far as the site is concerned, we find that the revisit user's rating of the site is 11% higher than the first visit. The software also shows the same pattern.
5. Usability can predict customer loyalty: Overall, we found that the SUS score could explain why customers recommend a software or site about 40% of the variance, which is measured by net recommended values (NET promoter Score). The derogatory (detractors) average SUS was divided into 67 points, slightly below the overall average. The average SUS of the Recommender (promoters) was divided into 82 points, well above the overall average. Based on a large number of independent data, we find that you can simply divide the SUS score by 10来 estimates for the recommended probability in the net recommended value scale (10-point scale). For example, SUS is divided into 72, and you can divide by 10 to get a referral probability of 7.2.
6. The original fraction of SUS is not a normal distribution, and the sample mean is a normal distribution: the original fraction distribution of SUS is a very asymmetric graph (pictured below). This makes some people familiar with parameter statistics and the theory of normal distribution become worried, when the need to use confidence intervals and T-test to make statistical inference.
The graph above shows the distribution of 311 SUS scores from a histogram (similar to a bar chart) from a study.
Although the normal distribution is the distributed pattern used by most of the statistical programs we recommend, only the distribution of the sample mean is too distributed. The following figure shows the shape of the sample mean of sample size from 8 to 30. In all cases, the distribution of the sample mean is a bell-type distribution and symmetry, which allows us to get the confidence interval and the P value, even in the case of small sample size.
The above illustration shows the 1000 sample mean values from a dataset with a sample size of 8, 20, and 30, respectively. These samples mean a symmetrical bell type, even if the sample size is very low, which makes the parameter statistics more feasible and accurate.
7. You can also use the SUS scale when you have a small sample size: In theory, you need at least two users to measure variations (that is, standard deviation) and to calculate confidence intervals. But we never use the SUS scale to measure only two users. We will report 5 user SUS scores.
For early usability research, 5 is a magical number. Confidence intervals can be quite large, but the average SUS score is unusually stable. We did a computer simulation and found that when the sample size was 5 o'clock, the variation of the sample mean at 50% was kept within 6 minutes.
The above illustration shows the difference between the overall SUS mean and the SUS mean with a sample size of 5. Repeated 1000 samples, in 50% of the sample, the sample size of 5 of the SUS score and the actual SUS score is less than 6 points. Not bad for small samples.
In other words, if the actual SUS score was 74, the average SUS score from 5 users would fall between 66 and 80 in 50%. In the case of 75%, the average SUS score would be within 10 minutes and 95% would be within 17 minutes. In other words, even if the sample size is very small, you can get more than 50% of the actual score of SUS scores.
8. SUS scores are not intended to be diagnosed: first-time users of the SUS scale are sometimes surprised because SUS cannot provide diagnostic information. In the best case, SUS can provide usability and learning metrics that can be used to compare with some industry benchmarks. The SUS scale does not have a title that tells you where to adjust the interface. This is because the SUS scale, like most questionnaires, is not for diagnosis. It takes too many questions to diagnose, and it's probably still not clear whether the search results page or the label for the product description needs to be improved. Thankfully, letting users try some authentic tasks and documenting their behavioral problems can quickly spot areas that affect SUS scores.
9. SUS is not targeted at specific technologies: the presentation of topics on the SUS scale allows it to be applied to any user interacting with the system. This means that a company that develops hardware, software, or voice response systems can use the SUS scale as a flexible network benchmark. This flexibility also has its price. The SUS scale is powerless when you need more specific measurements of a technology, such as trust or visual attractiveness.
The SUS scale is not always the best questionnaire: Because the SUS scale is not technically specific and relatively short, we can use other tools depending on the job.
To measure the usability of the website, we can use the supr-q of 13 items. Four of these items can calculate a stable and SUS equivalent score. Other topics can measure reliability/credibility, aesthetics, and loyalty.
To measure the availability of task levels, we use the difficulty of a single problem (SEQ).
To measure the usefulness of perceived mobile applications, we use this "function of application to meet my requirements", with a 5 point scale score.
One common denominator of these scales is that we can compare the original score with a larger dataset, get the relative rank and percentile, and make the data more meaningful.