One hot encoding

Source: Internet
Author: User

Transferred from: http://blog.sina.com.cn/s/blog_5252f6ca0102uy47.html

Origin of the problem

In many machine learning tasks, features are not always sequential, but they can be categorized values.

For example, consider the three characteristics:

["Male", "female"]

[From Europe, ' from US ', ' from Asia ']

["Uses Firefox", "uses Chrome", "uses Safari", "uses Internet Explorer"]

If the above features are represented by numbers, the efficiency will be much higher. For example:

["Male", "from US", "uses Internet Explorer") expressed as [0, 1, 3]

["Female", "from Asia", "uses Chrome") expressed as [1, 2, 1]

However, even after the conversion to a digital representation, the above data cannot be used directly in our classifier. Because, the classifier is often the default data data is continuous, and is orderly. However, according to our stated above, the numbers are not ordered, but are randomly allocated.

Single Hot Coding

In order to solve the above problems, one possible solution is to use the single-Hot coding (one-hot Encoding).

The single-Hot code is one-hot encoding, also known as a valid encoding, the method is to use n-bit status register to encode n states, each state by his independent register bit, and at any time, only one of them is valid.

For example:

The Natural status code is: 000,001,010,011,100,101

Single-Heat code: 000001,000010,000100,001000,010000,100000

It can be understood that for each feature, if it has m possible values, then after the single-hot code, it becomes the M two-dollar feature. Also, these features are mutually exclusive, with only one activation at a time. As a result, the data becomes sparse.

The main benefits of this are:

    1. Solves the problem that the classifier does not handle the attribute data well

    2. To some extent, it also plays an important role in expanding features.

Example

We write a simple example based on Python and Scikit-learn:

From Sklearn Import preprocessing

ENC = preprocessing. Onehotencoder ()

Enc.fit ([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]])

Enc.transform ([[[0, 1, 3]]). ToArray ()

Output Result:

Array ([[1., 0., 0., 1., 0., 0., 0., 0., 1.]]

One hot encoding

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.