Using the functions in Sklearn makes it easy to divide the data into trainset and Testset
The function is Sklearn.cross_validation.train_test_split and uses the following:
Import NumPy as NP from Import train_test_split>>> X, y = Np.arange (Ten). Reshape ((5, 2)), Range (5)>>>1 ],[2, 3],[4, 5],[6, 7],[8, 9]])>>>1, 2, 3, 4]
>>> X_train, X_test, y_train, y_test = train_test_split (... X, y, test_size=0.33, random_state=42) ... >>> X_trainarray ([[4, 51],[6, 7]])>>> y_train[2, 0, 3 ]>>> x_testarray ([[2, 3],[8, 9]])>>> y_test[1, 4 ]
Where Test_size is the ratio of the sample, if it is an integer is the number of samples, random_state is a random number of seeds, different seeds will cause different random sampling results, the same seed sample results are the same.
Reference: http://blog.sina.com.cn/s/blog_6a90ae320101a5rc.htmlhttp://scikit-learn.org/stable/modules/generated/ Sklearn.cross_validation.train_test_split.html
Python data preprocessing-training set and test set data partitioning