Reference: http://scikit-learn.org/stable/modules/model_persistence.html
after the model has been trained, we want to be able to save it and use the trained saved model directly when encountering a new sample without having to retrain the model again. This section describes the application of pickle in saving the model. (aftertraining a scikit-learn model, it's desirable to has a-a-persist the model for the Out has to retrain. The following section gives a example of what to persist a model with pickle. We ' ll also review a few security and maintainability issues when working with Pickle serialization.)
1, persistence example
It is possible to save a model in the Scikit by using Python ' s built-in persistence model, namely C7>pickle:
>>> from Sklearn Import SVM>>> from Sklearn Import Datasets>>>CLF = SVM.SVC()>>>Iris = Datasets.Load_iris()>>>X, y = Iris.Data, Iris.Target>>>CLF.Fit(X, y) SVC (c=1.0, cache_size=200, Class_weight=none, coef0=0.0, degree=3, gamma=0.0 ,kernel= ' RBF ', Max_iter=-1, Probability=false, Random_state=none,shrinking=true, tol=0.001, Verbose=false)>>>Import Pickle>>>s = Pickle. dumps (clf) >>>Clf2 = Pickle. loads (s) >>>Clf2.predict(X[0])Array ([0])>>>y[0]0
In some cases (More efficient on objects that carry large numpy arrays internally)use joblib ' s instead of pickle (joblib.dump & joblib.load ). After that we can even load a well-preserved model in another Pathon program (Pickle can also ...) ):
>>> from sklearn.externals import joblib>>> <strong>joblib.dump (CLF, ' filename.pkl ') >> > CLF = joblib.load (' filename.pkl ') </strong>
Note
Joblib.dump Returns a list of filenames. Each individual numpy array contained in the CLF object is serialized as a separate file on the filesystem. All files is required in the same folder when reloading the model with Joblib.load.
2. Security & maintainability Limitations
Pickle (and joblib by extension) have some problems with maintainability and security because:
- Never Unpickle untrusted data
- Models saved in one version of Scikit-learn might not load in another version.
In order to be able toScikit-learnTo refactor a saved model in a future release, you need to add some metadata when pickled:
- The training data, e.g. a reference to a immutable snapshot
- The python source code used to generate the model
- The versions of Scikit-learn and its dependencies
- The cross validation score obtained on the training data
Further Discussion,refer ThisTalk by Alex Gaynor.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
scikit-learn:3.4. Model Persistence