scipy.sparse.
hstack
(blocks, format=none, dtype=none)[source]?
Stack sparse matrices Horizontally (column wise)
parameters: |
-
blocks
-
seque nCE of sparse matrices with compatible shapes
-
format : str
-
spar SE format of the result (e.g. "CSR") by default a appropriate sparse matrix format is returned. This choice are subject to the change.
-
dtype : d Type, optional
-
the Data-type of the output matrix. If not given, the dtype are determined from that of blocks .
|
There's an error function.
///////////////////////////////////////////////////////////////////////////////////////////////////
In the competition, change the feature into a coefficient matrix, which is changed according to the Open source:
BASE_TRAIN_CSR = Np.float64 (Train_x[num_feature]) BASE_PREDICT_CSR = Np.float64 (Predict_x[num_feature]) ENC = Onehotencoder () for feature in Short_cate_feature: enc.fit (Data[feature].values.reshape ( -1, 1)) BASE_TRAIN_CSR = Sparse.hstack ((BASE_TRAIN_CSR, Enc.transform (Train_x[feature].values.reshape ( -1, 1)), ' CSR ', ' BOOL ' ') BASE_PREDICT_CSR = Sparse.hstack ((BASE_PREDICT_CSR, Enc.transform (Predict_x[feature].values.reshape (-1, 1)) ), ' CSR ', ' bool ') print (' one-hot prepared! ') CV = Countvectorizer (min_df=20) for feature in Long_cate_feature: cv.fit (data[feature]) BASE_TRAIN_CSR = Sparse.hstack ((BASE_TRAIN_CSR, Cv.transform (Train_x[feature])), ' CSR ', ' int ') BASE_PREDICT_CSR = Sparse.hstack ((BASE_PREDICT_CSR, Cv.transform (predict_x[feature)), ' CSR ', ' int ') print (' CV prepared! ')
Features such as Lgb,loss were startled by the rapid descent. I didn't find the reason all night,
Do a simple experiment from scratch today and find out why.
Above, I first on the numerical characteristics, directly with the NP conversion, the category less characteristics, with onehot encoding, the problem arises in this: Sparse.hstack (, ' CSR ', ' bool ')
I put the float (64) matrix directly with the bool line of the matrix, and then into a bool shape, brain residue, the front of the numerical characteristics are all useless ...
Summary: In the future when using hstack, from coarse granularity to fine-grained, such as bool->int32->float32->float64, otherwise fine-grained characteristics will be compressed, the loss of information a lot
Data mining competition, the brain residual behavior when constructing matrix