One-hot of data processing

Source: Internet
Author: User
The classification method usually requires the conversion of various properties of the data to a vector representation, so that each data feature is a vector, and each dimension on the vector represents a feature attribute.

But if the data to be converted contains 3 properties, such as height, weight, and age. A is a woman, 168cm,70kg,30 years old, B is a male, 180cm,90kg,20 years old, then the direct use of the numerical vector becomes, 0,168,70,30;1,180,90,20. While 168 and 70 or 30 are different attributes, it is more obvious that 0 or 1 represents a greater difference in gender and other dimensions.
One is the dimensionless normalization of the values of each dimension, meaning that the values of each dimension are normalized to 0-1 or -0.5-+0.5.
However, this is still not good, such as gender 0,1 and other dimensions are still not relevant, so there is a code called One-hot, that is, a feature encoding of an attribute, only one activation point at a time (not 0). The gender of a is changed to "1,0" and the gender of B is encoded as "0,1". Age, weight, height, etc. can be expressed in a larger number of enums (the number is not necessarily to satisfy all enumerations, but to meet the actual data to appear in the category can be, such as only three height, then the side represents only need [0,0,1] can)
Then, the various properties are concatenated together to construct a very sparse eigenvector, such as sex and height concatenation as "0,1,0,0,1", which guarantees the dispersion of various data.

Related references:
http://blog.csdn.net/google19890102/article/details/44039761

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.