Variable
The variable value of variable can be divided into continuous variable and discrete variable. Continuous variables (continuous variable) and discrete variables (discrete variable)
Continuous variables
In a certain interval can be arbitrary value of the variable called continuous variable, its value is continuous, the adjacent two values can be infinite segmentation, that is, an infinite number of values.
Discrete variables
A discrete variable is a discrete variable whose value can only be computed in natural or integer units. For example, the number of enterprises, number of employees, equipment, etc., can only be counted by the number of units of measurement, the value of this variable is generally used to obtain the Count method.
Data distribution
Characteristics of the data distribution
Concentration trend (location)
Off-trend (degree of dispersion)
Skewness and Peak States (morphology)
One, the measurement of the concentration trend
Category data: Majority
Sequential data: number, median, and number of bits
Numeric data: Majority, median, number of bits, average
Concept:
Majority (Mode): The most frequently occurring value in a set of data, the data with the highest number of repetitions. If the selection of "best", "most popular" and so on are related to the majority. Mo
Median (median): The value in the middle position after sorting. If there are 5 numbers, the 3rd number is the median, and if 6, the median is averaged for the middle two numbers. Me
Quarter (QUARTILE): The value at 25% and 75% positions after sorting.
Average (mean): Also known as expectation
Simple arithmetic Average:
Weighted Average:
Geometric average:
The geometric average is mainly used to calculate the average growth rate;
Characteristics:
1. Public number
Not affected by extreme values
Has the inflexible one sex
Apply when the data distribution skew degree is large
2. Median
Not affected by extreme values
Apply when the data distribution skew degree is large
3. Average
Be susceptible to extreme values
Good mathematical properties
Application of symmetrical distribution of data or near symmetry distribution
Relationship:
The mean value is left-biased on the left side of the median, and the mean is right-biased on the right side of the median.
Second, the measure of the degree of dispersion
Reflects the degree to which the values of each variable are away from their central values (degree of dispersion)
Categorical data: The ratio of different audiences
Sequential data: four-cent difference
Numeric data: Extreme, mean, variance, and standard deviation
Measurement of relative position: standard score
Relative dispersion degree: discrete coefficients
Concept:
Variation ratio: The frequency of the non-majority array is the proportion of the total frequency.
Example:
Four-bit difference (quartile deviation): The difference between the upper four and the next four-cent. The degree of dispersion of the intermediate 50% data is reflected.
Example:
Extreme difference (range): The difference between the maximum and minimum values in the data.
Variance (Variance): Is the average sum of squares of the difference between each data and its average; The average difference between the values of each variable and the mean value is reflected.
e{x} indicates average
Sample variance:
In statistics, the average difference of the sample is divided by the degree of Freedom (n-1), which means that the sample can be freely selected. When elected to only one, it can no longer be free, so the degree of freedom is n-1.
Standard deviation (deviation): is the square root of the square of the sum of the squares of the difference between each data and its average, reflecting the average difference between the values of each variable and the mean. Reflects the degree of dispersion of the data set.
The other side of the difference to prescribe
Standard score: Also known as Z-Score (Z-score) is the process of dividing the difference between a fraction and an average by a standard deviation. expressed as z= (x-μ)/σ with a formula. where x is a specific fraction, the score is the value.
Example:
Discrete coefficients: Also known as coefficient of variation, commonly used is the standard deviation coefficient, denoted by CV (coefficient of Variance). The ratio of the standard deviation to the mean value. Expressed in a formula: cv=σ/μ
The discrete coefficients reflect the degree of dispersion of the unit mean, and are commonly used in the comparison of the discrete degrees of two of the population mean. If the mean value of the two population is equal, the comparison standard deviation coefficient is equivalent to the comparison standard deviation. In contrast, the distribution of large dispersion coefficients is very different.
Distribution
Distribution of continuous variables
1. Two-point distribution
2. Two distributions
3. Poisson distribution
Distribution of discrete variables
1. Uniform distribution
2. Exponential distribution
3. Normal distribution
4. Standard normal distribution
Knowledge about statistics and distribution