ValueError Traceback (most recent call last)
<ipython-input-12-1dc462ae8893> in <module>()
15 print(‘cv prepared!‘)
16 return df_x.astype(np.float64)
---> 17 df_test = get_feature(test_data,all_table,ready_cols,vec_col)
18 df_train = get_feature(train_data,all_table,ready_cols,vec_col)
<ipython-input-12-1dc462ae8893> in get_feature(df, all_data, cols, vec_col)
9 cv=CountVectorizer()
10 for feature in vec_col:
---> 11 cv.fit(all_data[feature])
12 df_a = cv.transform(df[feature])
13 df_x = sparse.hstack((df_x, df_a))
def get_feature (Df,all_data,cols,vec_col):
ENC = Onehotencoder ()
Df_x=np.int64 (Df[cols])
Cv=countvectorizer ()
For feature in Vec_col:
Cv.fit (All_data[feature])
df_a = Cv.transform (Df[feature])
df_x = Sparse.hstack ((df_x, df_a))
Print (' Done Feature ' + str (Feature))
Print (' CV prepared! ')
Return Df_x.astype (Np.float64)
Cause analysis: There are Nan data in my all_data, I use All_table.fillna (-1) when the data is read, I understand that only empty values will be filled, but the value of Nan in all_table will not change. Change to All_table.fillna (-1) to execute.
Np.nan is an invalid document, expected byte or Unicode string.