Data mining competition, the brain residual behavior when constructing matrix

Source: Internet
Author: User

scipy.sparse.hstack(blocks, format=none, dtype=none)[source]?

Stack sparse matrices Horizontally (column wise)

parameters:
blocks

seque nCE of sparse matrices with compatible shapes

format   : str

spar SE format of the result (e.g. "CSR") by default a appropriate sparse matrix format is returned. This choice are subject to the change.

dtype   : d Type, optional

the Data-type of the output matrix. If not given, the dtype are determined from that of blocks .

There's an error function.

///////////////////////////////////////////////////////////////////////////////////////////////////

In the competition, change the feature into a coefficient matrix, which is changed according to the Open source:

BASE_TRAIN_CSR = Np.float64 (Train_x[num_feature])    BASE_PREDICT_CSR = Np.float64 (Predict_x[num_feature])    ENC = Onehotencoder () for       feature in Short_cate_feature:        enc.fit (Data[feature].values.reshape ( -1, 1))        BASE_TRAIN_CSR = Sparse.hstack ((BASE_TRAIN_CSR, Enc.transform (Train_x[feature].values.reshape ( -1, 1)), ' CSR ', ' BOOL ' ')        BASE_PREDICT_CSR = Sparse.hstack ((BASE_PREDICT_CSR, Enc.transform (Predict_x[feature].values.reshape (-1, 1)) ), ' CSR ', ' bool ')    print (' one-hot prepared! ')    CV = Countvectorizer (min_df=20) for    feature in Long_cate_feature:         cv.fit (data[feature])        BASE_TRAIN_CSR = Sparse.hstack ((BASE_TRAIN_CSR, Cv.transform (Train_x[feature])), ' CSR ', ' int ')        BASE_PREDICT_CSR = Sparse.hstack ((BASE_PREDICT_CSR, Cv.transform (predict_x[feature)), ' CSR ', ' int ')    print (' CV prepared! ')

Features such as Lgb,loss were startled by the rapid descent. I didn't find the reason all night,

Do a simple experiment from scratch today and find out why.

Above, I first on the numerical characteristics, directly with the NP conversion, the category less characteristics, with onehot encoding, the problem arises in this: Sparse.hstack (, ' CSR ', ' bool ')

I put the float (64) matrix directly with the bool line of the matrix, and then into a bool shape, brain residue, the front of the numerical characteristics are all useless ...

Summary: In the future when using hstack, from coarse granularity to fine-grained, such as bool->int32->float32->float64, otherwise fine-grained characteristics will be compressed, the loss of information a lot

Data mining competition, the brain residual behavior when constructing matrix

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.