Import NumPy as NP from pandas import Series, dataframe import pandas as PD df = dataframe {' key ': [' B ', ' B ', ' A ', ' C ', ' A ', ' B '], ' data1 ': Range (6)}) pd.get_dummies (df[' key ') print (DF) ' data1 key 0 0 B 1 1 B 2 2 a 3 3 C 4 4 A 5 5 B ' ' dummies = pd.get_dummies (df[' key '), prefix= ' key ') Df_with _dummy = df[[' data1 ']].join (dummies) print (df_with_dummy) ' ' Data1 key_a-key_b key_c 0 0 0.0 1.0 0.
0 1 1 0.0 1.0 0.0 2 2 1.0 0.0 0.0 3 3 0.0 0.0 1.0 4 4 1.0 0.0-0.0 5 5 0.0 1.0 0.0 ' "' pandas.get_dummies discrete feature coding is divided into two cases: 1, the value of discrete features is not the size of the meaning, such as color:[red,blue], then use One-hot
Code 2, discrete characteristics of the value of the size of the meaning, such as SIZE:[X,XL,XXL], then use the map of the value of {X:1,xl:2,xxl:3} using Pandas can be very convenient for discrete features one-hot encoding parameters: Data: Array, series, or frame Prefix: prefix string, string list, or dictionary string, default no prefix_sep:sep prefix, string, default _, if added prefix, separator/separator to use.
or a list or dictionary as a prefix. Dummy_na:bool whether to show NAN a column. Default to False does not display, true to display columns: Similar list, default does not. Encodes the column name in the Data box.
If the column is not, all column objects or types will be converted.
Sparse:bool is sparse and defaults to false. Drop_first:bool whether to remove the first column defaults to false ' df = PD '.
Dataframe ([[' Green ', ' M ', 10.1, ' Class1 '], [' Red ', ' L ', 13.5, ' class2 '],
[' Blue ', ' XL ', 15.3, ' Class1 ']] Df.columns = [' Color ', ' size ', ' Prize ', ' Class label '] size_mapping = {' XL ': 3, ' L ': 2 , ' M ': 1} df[' size '] = df[' size '].map (size_mapping) class_mapping = {Label:idx for idx,label in EN Umerate (Set (df[' Class label '))} df[' class label ' = Df[' class label '].map (class_mapping) print (DF) ' Colo R Size Prize class label 0 Green 1 10.1 1 1 Red 2 13.5 0 2 Blue 3 15.3
1 ' df=pd.get_dummies (DF) # dummies English meaning Imitation Print (DF) # using Get_dummies for One-hot encoding, before and after the application of single heat code to notice the change of color column is as follows ' Size Prize class label Color_blue color_green color_red 0 1 10.1 1 0.0 1.0 0.0 1 2 13.5 0 0.0 0.0 1.0 2 3 15.3-1 1.0 0.0 0.0 "", thinning matrix S = pd.
Series (List (' ABCA ')) print (Pd.get_dummies (s)) ' ' A B C 0 1.0 0.0 0.0 1 0.0 1.0 0.0 2 0.0 0.0 1.0 3 1.0 0.0 0.0 ' S1 = [' A ', ' B ', Np.nan] Print (pd.get_dummies (S1)) ' ' A B 0 1.0 0.0 1 0.0 1.0 2 0.0
0.0 ' ### Display nan column print (pd.get_dummies (S1, dummy_na=true)) ' A B Nan 0 1.0 0.0 0.0 1 0.0 1.0 0.0
2 0.0 0.0 1.0 ' ### Drop_first = True Remove the first column print (Pd.get_dummies (S1, dummy_na=true, Drop_first = True)) b NaN 0 0.0 0.0 1 1.0 0.0 2 0.0 1.0 ' ' ### data one by one corresponds, add column name prefix demo_1 = PD. Dataframe ({' A ': [' A ', ' B ', ' A '], ' B ': [' B ', ' A ', ' C '], ' C ': [1, 2, 3]} print_demo_1 = Pd.get_dummies (demo_1, prefix=[' Co L1 ', ' col2 '] print (print_demo_1) ' C col1_a col1_b col2_a col2_b col2_c 0 1 1.0 0.0 0.0 1. 0 0.0 12 0.0 1.0 1.0 0.0 0.0 2 3 1.0 0.0 0.0 0.0 1.0 ' ' #详情: https://pandas.pydata.org
/pandas-docs/stable/generated/pandas.get_dummies.html