Open Bo The third chapter still reviews the most basic concepts of statistics involved in data analysis, including the following concepts: standard deviation, standard error.
10 Standard deviationin probability and mathematical statistics, standard deviation (Deviation, symbol \ (\sigma\)) is the square root of the squared variance. The standard deviation definition is the square root of the arithmetic mean of the total units ' standard values and their average deviations squared. It reflects the degree of dispersion between individuals within a group. With the same average, the standard deviation may not be the same. for discrete random variables, assuming that the random variable is \ (x\), the value \ ({x}_{i} (i = 1, 2, ..., n) \), \ (\mu\) is the mathematical expectation (mean) of the random variable, then the standard deviation of the discrete random variable \ (x\) can be expressed as:\ (\sigma (X) = \sqrt{\frac{1}{n}\sum_{i=1}^{n}{({x}_{i}-\MU)}^{2}}\)
11 Standard Error standard error refers to the sampling errors, from a population can be extracted from a myriad of samples, each sample of data is the overall data estimates. The standard error represents the estimation of the total data of the current sample, which is the relative errors of the sample mean and the total mean. Standard errors are calculated by dividing the standard deviation of the sample by the open square of the sample capacity. As can be seen here, the standard error is much more affected by the sample capacity. The larger the sample size, the smaller the standard error, the smaller the sampling errors, indicating that the sampled samples can represent the whole. If the sample obeys the mean value of \ (\mu\), the standard deviation is the normal distribution of \ (\sigma\), i.e. \ (X \backsim N (\mu, {\sigma}^{2}) \). Then the average value of the sample is \ (\mu\) and the standard deviation is the normal distribution of \ (\frac{\sigma}{\sqrt{n}}\), i.e. \ (X \backsim n (\mu, \frac{{\sigma}^{2}}{n}). Here \ (\sigma\) for the standard deviation, \ (\frac{\sigma}{\sqrt{n}}\) for the standard error.
From the concept of standard error, the standard error is a fraction, the size of which is influenced by the numerator and denominator. If the numerator (standard deviation) in the formula is small, the standard error is small, and vice versa; if the denominator (sample size) in the formula is large, the standard error becomes small and vice versa. As the standard deviation of the overall (or sample) is calculated according to the actual distribution condition, can not be adjusted arbitrarily, so, increasing the sample capacity is an effective way to reduce the standard error.
The smaller the standard error, the closer the value of the sample statistic to the overall parameter is, the more representative the sample is, and the greater the accuracy of the overall parameter is inferred from the sample statistic. Therefore, the standard error is the reliability index of statistical inference.
the relationship and difference between the two standard deviation and standard error are the contents of mathematical statistics, the two are not only more similar in literal, but also represent a distance from a certain standard or intermediate value of the degree of dispersion, that is, the degree of variation, but there is a greater difference between the two. first of all, from the aspect of statistical sampling. In real life or research, we often cannot estimate all the members of the target group of a certain type of survey, but can only extract some members from all members (i.e. samples) to investigate, then use statistical principles and methods to analyze the data obtained, the results of the analysis are the results of the sample. It then infers the overall situation with the sample results. A population can extract more than one sample, and the more samples it extracts, the more the sample mean is closer to the average of the overall data. the standard deviation represents the degree of dispersion of the sample data, which is the open square of the average variance of the sample, usually relative to the average of the sample data, usually denoted by \ (\mu±\sigma\), which indicates how far the sample data is measured by the average distance. As you can see from here, the standard deviation is affected by the extremum. The smaller the standard deviation, the more aggregated the data, and the larger the standard deviation, the more discrete the data. The standard deviation is closely related to the normal distribution: in normal distribution, 1 standard deviations are equal to the 68.26% area of the curve under normal distribution, and 1.96 standard deviation equals 95% area. This plays an important role in the test score equivalence. The standard deviation indicates the degree of dispersion of the data, and the standard incorrectly indicates the size of the sampling error. Standard deviation is the square root of the variance of the sample data, which measures the degree of dispersion of the sample data, and the standard error is the standard deviation of the sample mean, which measures the degree of dispersion of the sample mean. In the actual sampling, the average value of the sample is used to infer the overall mean, so the greater the degree of dispersion (standard error) of the sample mean, the larger the sampling error. So the size of sampling error is measured by standard error. in practical applications, the standard deviation is mainly two points, one is used to standardize the sample processing, that is, the sample observation value minus the sample mean, and then divided by the standard deviation, this becomes the standard normal distribution, and the second is the standard deviation to determine the outliers, the common method is the sample mean plus minus n times the standard deviation. The function of standard error is mainly used to make interval estimation, and the common estimation interval is the standard error of mean plus minus n times.
The standard deviation indicates the degree of dispersion of the data, or the size of the data fluctuations. Standard errors indicate the size of the sampling error. Give an example to illustrate the meaning.
For example, there is a school, there are 1000 students in the school, then the 1000 students can be as the overall students of this school. If I want to know the height of all the students, a random sample of 50 people were taken. These 50 people are a sample. Note here: A sample does not refer to a person, but to a sample. A sample can be 1 people, or 100 people, where 1 and 100 are sample sizes.
In theory, the sampling error denotes the meaning that if the sample is not sampled once, but is sampled 10 times, each time 50 people, then I have 10 mean and standard deviation. For example, a large circle represents a total of 1000 people, and a small circle represents a sample, or 50 people. Each sample can calculate one mean and standard deviation.
With the 10 mean as the original data, a mean and standard deviation can still be calculated, and the standard deviation computed by the 10 mean is called the standard error. This is the meaning of the theory, the actual meaning represents the size of the sampling error, that is, the sampling sample is not good, the smaller the sample error, the better the representative, conversely, the worse the representative.
If I had measured the height of 1000 people in the school, there would have been no standard mistake in theory, that is, there was no sampling error, because I had measured the whole, and there was no standard error. But the standard deviation is there, because these 1000 people are certainly different in height, there will be fluctuations. This is a good indication of the difference between standard deviation and standard error.
*******************************************************************Copyright, reproduced Please specify the sourceWelcome to the data analysis, data mining related issues to communicate with me. E-mail:[email protected]*******************************************************************
Standard deviation, standard error