First, One-hot EncodingOne-hot encoding, also known as a valid encoding, is mainly the use of bit status registers to encode a state, each state by his independent register bit, and at any time only one valid. In Practical machine learning applications, features are sometimes not always sequential, and may be categorized as "male" and "female". In machine learning tasks, for this feature, we usually need to digitize it, as in the following example: There are three characteristic attributes: gender: ["Male", "female"] region: ["Europe", "US", "Asia"] Browser: ["Firefox", "Chrome", "Safari", "Internet Explorer"] for a sample, such as ["Male", "us", "Internet Explorer"], we need to digitize the characteristics of this categorical value, the most direct way, We can take a serialized approach: [0,1,3]. However, such feature processing cannot be directly put into the machine learning algorithm.
second, the treatment method of One-hot encodingFor the above problem, the gender attribute is two-dimensional, the same, the region is three-dimensional, the browser is thinking, so that we can use the One-hot encoding way to the above sample "[" Male "," US "," Internet Explorer "]" code, "male" corresponds to [1,0], the same as "US" corresponds to [0,1,0], "Internet Explorer" corresponds to [0,0,0,1]. The result of the complete feature digitization is: [1,0,0,1,0,0,0,0,1]. The result is that the data becomes very sparse.
third, the actual Python code
From Sklearn import preprocessing
enc = preprocessing. Onehotencoder ()
Enc.fit ([[[0,0,3],[1,1,0],[0,2,1],[1,0,2]])
array = Enc.transform ([[0,1,3]]). ToArray ()
Print Array
Results: [[1]. 0.0. 1.0. 0.0. 0.1.]