1. One hot encoder
Sklearn.preprocessing.OneHotEncoder
One hot encoder can encode not only the label, but also the categorical feature:
>>> from sklearn.preprocessing import onehotencoder
>>> enc = onehotencoder ()
>>> Enc.fit ([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]])
>>> enc.n_values_
Array ([2, 3, 4])
>>> Enc.feature_indices_
Array ([0, 2, 5, 9])
>>> enc.transform ([[0, 1, 1]]). ToArray ()
Array ([[1., 0., 0., 1., 0., 0., 1., 0., 0.]
1 2 3 4 5 6 7 8 9 10 11 12-13
The data set passed in for the Onehotencoder class:
[[0, 0, 3],
[1, 1, 0],
[0, 2, 1],
[1, 0, 2]]
1 2 3 4
Each column represents a property, after the fit operation: An object ENC n_values_ member variable that records the maximum number of values for each attribute, such as the first property in this example: 0, 1, 0, 1⇒2,0, 1, 2, 0⇒3,3, 0, 1, 2⇒4;
The number of bits that each attribute (feature) occupies under one hot code, and the Feature_indices_ of the object enc, which records the index position of the attribute under the new one hot code.
Feature_indices_ is the cumulative value of N_values_, but the first of feature_indices is 0;
Further encode the new eigenvector by a fit, one hot encoder:
>>> enc.transform ([[0, 1, 1]]). ToArray ()
Array ([[1., 0., 0., 1., 0., 0 ., 1., 0., 0.]])
1 2 first 2 digits 1, 0, encode 0 digits 3, 0, 1 for 0, end 1 digits 4, 0, 1, 0 encode 0;