Open Source machine learning tools Scikit-learn Getting Started

Source: Internet
Author: User

Scikit-learn is a python-based machine learning module based on BSD open source licenses. The project was first initiated by Davidcournapeau in 2007 and is currently being maintained by community volunteers.

Scikit-learn's official website is http://scikit-learn.org/stable/, where you can find related Scikit-learn resources, module downloads, documentation, routines and more.

Scikit-learn installation requires modules such as Numpy,scipy,matplotlib, Windows users can go to

http://www.lfd.uci.edu/~gohlke/pythonlibs Download the compiled installation package and dependencies directly, or you can download it from this website http://sourceforge.jp/projects/ sfnet_scikit-learn/.

The basic functions of scikit-learn are mainly divided into six parts, classification, regression, clustering, data dimensionality reduction, model selection, data preprocessing, and can refer to the documents on the official website.

For the specific machine learning problem, usually can be divided into three steps, data preparation and preprocessing, model selection and training, model validation and parameter tuning, the logistic regression model is illustrated here.

Scikit-learn supports multiple formats of data, including classic iris data, LIBSVM format data, and more. For convenience, we recommend the use of LIBSVM format data, detailed see LIBSVM's official website.

From Sklearn.datasets importload_svmlight_file, import this module to load the LIBSVM module data,

T_x,t_y=load_svmlight_file ("filename")

The machine learning model is also imported into the corresponding module, and the logistic regression model is in the following module.

From Sklearn.linear_modelimport logisticregression

Regressionfunc =logisticregression (c=10, penalty= ' L2 ', tol=0.0001)

Train_sco=regressionfunc.fit (train_x,train_y). Score (train_x,train_y)

Test_sco=regressionfunc.score (test_x,test_y)

You can complete the training and testing of the model.

In order to choose a better model you can cross-experiment, or use greedy algorithms for parameter tuning.

You can import the following modules,

CV:

From Sklearn importcross_validation

X_train_m, x_test_m,y_train_m, y_test_m = Cross_validation.train_test_split (t_x,t_y, test_size=0.5,random_state= Seed_i)

Regressionfunc_2.fit (X_train_m,y_train_m)

Sco=regressionfunc_2.score (X_test_m,y_test_m, Sample_weight=none)

Gridsearch:

From Sklearn.grid_searchimport GRIDSEARCHCV

Tuned_parameters =[{' penalty ': [' L1 '], ' tol ': [1e-3, 1e-4],

' C ': [1, 10, 100, 1000]},

{' Penalty ': [' L2 '], ' tol ': [1e-3, 1e-4],

' C ': [1, 10, 100, 1000]}

CLF =GRIDSEARCHCV (Logisticregression (), Tuned_parameters, cv=5, scoring=[' precision ', ' recall '])

Print (CLF.BEST_ESTIMATOR_)

Of course, you can draw the learning curve using matplotlib, you need to import the corresponding module as follows:

From Sklearn.learning_curveimport Learning_curve,validation_curve

The core code is as follows, see the official documentation for Scikit-learn:

rain_sizes, train_scores,test_scores = Learning_curve (

Estimator, X, Y, CV=CV, n_jobs=n_jobs,train_sizes=train_sizes)

Train_scores, Test_scores =validation_curve (

Estimator, X, Y, Param_name,param_range,

CV, scoring, N_jobs)

Of course, the machine learning model in Scikit-learn is very rich, including SVM, decision Tree, GBDT,KNN and so on, can choose the appropriate model according to the type of problem, more information please refer to the Official document.

Open Source machine learning tools Scikit-learn Getting Started

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.