A virtual variable (dummy variables), also known as a virtual variable, a nominal variable, or a dummy variable, is a manual variable used to reflect a qualitative attribute. It is a quantified independent variable, usually with a value of 0 or 1. The introduction of dummy variables can make linear regression models more complex, but the problem description is more concise. An equation can act as two equations and is close to reality.
1. Addition Model
For example, we need to return the weight of a person based on the height and gender. Here we define:
Weight: Weight
Height: Height
Gender: Sex
From common sense, we can know that the weight and height are continuous data values, while the gender value can only be male or female. In this way, if we perform regression, the Gender attribute cannot meet our regression characteristics. Therefore, the technology of virtual variables is used to represent gender.
We define two methods for gender sex:
Isman: whether it is male. If it is male, it is 1; otherwise, it is 0.
Iswoman: whether it is a female. If it is a female, it is 1. If it is a female, it is 0.
Therefore, we can write the current regression equation as follows:
weight = a + b*height + c*isMan
Here we only use one of the isman methods of the sex method. Suppose there are N values for the Virtual Variables (male and female here ), then, only n-1 values (isman) can be written in the regression equation ).
For the above regression equations, we can obtain the values of A, B, and C respectively, but the values of isman are 0 or 1, so the values of C * isman are C or 0, that is, for the above regression equation, we can write it:
weight = (a+c) + b*height
Or:
weight = (a+0) + b*height
Therefore, for the above model, the sex value affects the intercept of the entire regression equation, that is, the simulated regression equation is parallel, for example. We can call this model an addition model of virtual variables.
2. Multiplication Model
For the above model, virtual variables affect the intercept of the regression equation. Of course, we can also establish a model so that the virtual variables affect the slope of the regression equation.
We can write the regression equation as follows:
weight = a + bh + c*isMan*h + d*isWoman*h
For the above regression equation, if it is male:
weight = a + (b+c)*h
For Women:
weight = a + (b+d)*h
In this way, the two regression equations will be handed in with one point, for example:
Of course, we can also use the addition and multiplication models together. This is called a hybrid model. Here we will not detail it in detail. If we come into use it in the future, we will explain it again.
[Machine learning practice] regression techniques-Virtual Variables