Stitching raw DATA:
Train_data = pd.read_csv (' train.csv ') Test_data = pd.read_csv (' test.csv ') All_data = Np.vstack ((train_data.ix[:,1:-1], TEST_DATA.IX[:,1:-1]))
Merge array Vstack and Hstack functions under NumPy:
>>> a = Np.ones ((2,2)) >>> B = Np.eye (2) >>> print Np.vstack ((A, b)) [[1. 1.] [1. 1.] [1. 0.] [0. 1.]]>>> Print Np.hstack ((A, b)) [[1. 1. 1. 0.] [1. 1. 0. 1.]
Generate a high (2) secondary feature:
def group_data (data, degree=2, Hash=hash): new_data = [] m,n = Data.shape for indicies in combinations (range (n), degree): new_data.append ([Hash (tuple (v)) for V in Data[:,indicies]]) return Array (new_data). T
Before generating a high-level feature, do a "labelencoder" Operation ....
From Kaggle
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Machine learning in coding (Python): stitching raw data; generating high-level features