Cross-validation and Python code implementations

Source: Internet
Author: User
This article introduces the content of cross-validation and Python code implementation, has a certain reference value, now share to everyone, the need for friends can refer to

Two methods of model selection: regularization (typical method), cross-validation.

Cross-validation and its Python code implementations are described here.


If a given sample data is sufficient, a simple way to choose a model is to randomly divide the dataset into 3 parts, divided into training sets, validation sets, and test sets.

Training set: training model

validation Set: selection of models

Test set: final evaluation of the model

In the study of different complexity models, select the model with the minimum prediction error for the validation set. Because the validation set has enough data, it is also valid to use it for model selection. In many practical applications where data is insufficient, a cross-validation approach can be used.

Basic idea: Using data repeatedly, slicing a given data into training sets and test sets, on the basis of repeated training, testing and model selection.

Simple cross-validation:

Randomly divides the data into two parts, a training set and a test set. General 70% of the data is the training set, 30% is the test set.

Code (dividing training set, test set):

From sklearn.cross_validation import train_test_split# data (all data)   labels (all target values)     X_train training Set (all features)  Y_ Train training Set target value X_train, X_test, y_train, y_test = Train_test_split (Data,labels, test_size=0.25, random_state=0) #这里训练集75% : Test Set 25%

one of the random_state

Source Interpretation: int, randomstate instance or None, optional (default=none)

int, randomstate instance or None, optional (default=none) If int, random_state is the seed used by the random number Gener Ator;
If Randomstate instance, random_state is the random number generator;
If None, the random number generator is the Randomstate instance used
By ' Np.random '.

If you set a specific value, such as: random_state=10 , the data after each partition is the same, and runs multiple times. If set to None, that is , random_state=none, the data after each partition is different, and each run divides the data differently.

Code (dividing training sets, validation sets, test sets):

From Sklearn import cross_validationtrain_and_valid, test = cross_validation.train_test_split (data, test_size=0.3, random_state=0)  # First divided into two parts: training and validation  ,  test set train, valid = Cross_validation.train_test_split (data, test_size= 0.5,random_state=0)   # then the training and validation are divided into: training set, validation set

Related recommendations:


3 Types of cross-validation

The usefulness of cross-validation

Why use cross-validation

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.