[Machine learning practice] regression techniques-Virtual Variables

Source: Internet
Author: User

A virtual variable (dummy variables), also known as a virtual variable, a nominal variable, or a dummy variable, is a manual variable used to reflect a qualitative attribute. It is a quantified independent variable, usually with a value of 0 or 1. The introduction of dummy variables can make linear regression models more complex, but the problem description is more concise. An equation can act as two equations and is close to reality.


1. Addition Model


For example, we need to return the weight of a person based on the height and gender. Here we define:

Weight: Weight

Height: Height

Gender: Sex

From common sense, we can know that the weight and height are continuous data values, while the gender value can only be male or female. In this way, if we perform regression, the Gender attribute cannot meet our regression characteristics. Therefore, the technology of virtual variables is used to represent gender.


We define two methods for gender sex:

Isman: whether it is male. If it is male, it is 1; otherwise, it is 0.

Iswoman: whether it is a female. If it is a female, it is 1. If it is a female, it is 0.


Therefore, we can write the current regression equation as follows:

weight = a + b*height + c*isMan
Here we only use one of the isman methods of the sex method. Suppose there are N values for the Virtual Variables (male and female here ), then, only n-1 values (isman) can be written in the regression equation ).


For the above regression equations, we can obtain the values of A, B, and C respectively, but the values of isman are 0 or 1, so the values of C * isman are C or 0, that is, for the above regression equation, we can write it:

weight = (a+c) + b*height
Or:

weight = (a+0) + b*height
Therefore, for the above model, the sex value affects the intercept of the entire regression equation, that is, the simulated regression equation is parallel, for example. We can call this model an addition model of virtual variables.



2. Multiplication Model


For the above model, virtual variables affect the intercept of the regression equation. Of course, we can also establish a model so that the virtual variables affect the slope of the regression equation.

We can write the regression equation as follows:

weight = a + bh + c*isMan*h + d*isWoman*h
For the above regression equation, if it is male:

weight = a + (b+c)*h
For Women:

weight = a + (b+d)*h
In this way, the two regression equations will be handed in with one point, for example:



Of course, we can also use the addition and multiplication models together. This is called a hybrid model. Here we will not detail it in detail. If we come into use it in the future, we will explain it again.

[Machine learning practice] regression techniques-Virtual Variables

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.