Kaggle Code: Leaf classification Sklearn Classifier application

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

which Classifier is should I Choose?

This is one of the most import questions to ask when approaching a machine learning problem. I find it easier to just test them all at once. Here's your favorite Scikit-learn algorithms applied to the leaf data. In [1]:

Import NumPy as NP
import pandas as PD
import Seaborn as SNS
import Matplotlib.pyplot as PLT

def warn (*arg S, **kwargs): Pass
import warnings
Warnings.warn = Warn from

sklearn.preprocessing import Labelencoder From
sklearn.cross_validation import stratifiedshufflesplit

train = Pd.read_csv ('.. /input/train.csv ')
test = Pd.read_csv ('.. /input/test.csv ')

Data PreparationIn [2]:

# Swiss Army knife function to organize the data

def encode (train, test):
    le = Labelencoder (). Fit (train.species) 
  labels = Le.transform (train.species)           # Encode species strings
    classes = List (Le.classes_)                    # Save Column Names for submission
    test_ids = test.id                             # Save test IDs for submission
    
    train = Train.drop ([' species ', ' id '], a Xis=1)  
    test = Test.drop ([' ID '], Axis=1)
    
    return train, labels, test, test_ids, Classes

train, labels, test, Test_ids, classes = Encode (train, test)
Train.head (1)

OUT[2]: margin6

	margin1	margin2	margin3	margin4	margin5		margin7	margin8	margin9	margin10	...	Texture55	texture56	texture57	texture58	texture59	Texture60	texture61	texture62	Texture63	texture64
0	0.007812	0.023438	0.023438	0.003906	0.011719	0.009766	0.027344	0.0	0.001953	0.033203	...	0.007812	0.0	0.00293	0.00293	0.035156	0.0	0.0	0.004883	0.0	0.025391

1 rowsx192 columns stratified train/test Split

Stratification is necessary for this dataset because there are a relatively large number of classes (classes for 990 SA Mples). This would ensure we have all classes represented in both the train and test indices. In [3]:

SSS = stratifiedshufflesplit (labels, ten, test_size=0.2, random_state=23)

for Train_index, Test_index in SSS:
    x_ Train, X_test = Train.values[train_index], Train.values[test_index]
    y_train, y_test = Labels[train_index], labels[ Test_index]

Sklearn Classifier Showdown

Simply Looping through out-of-the box classifiers and printing the results. Obviously, these would perform much better after tuning their hyperparameters, but this gives you a decent ballpark idea. In [4]:

From sklearn.metrics import Accuracy_score, log_loss from sklearn.neighbors import kneighborsclassifier from SKLEARN.SVM Import SVC, Linearsvc, nusvc from Sklearn.tree import decisiontreeclassifier from sklearn.ensemble import Randomforestcla Ssifier, Adaboostclassifier, gradientboostingclassifier from Sklearn.naive_bayes import gaussiannb from Sklearn.discriminant_analysis Import lineardiscriminantanalysis from sklearn.discriminant_analysis Import Quadraticdiscriminantanalysis classifiers = [Kneighborsclassifier (3), SVC (kernel= "RBF", c=0.025, Probability=tru
    e), Nusvc (Probability=true), Decisiontreeclassifier (), Randomforestclassifier (), Adaboostclassifier (),  Gradientboostingclassifier (), GAUSSIANNB (), Lineardiscriminantanalysis (), Quadraticdiscriminantanalysis ()] # Logging for Visual Comparison log_cols=["Classifier", "accuracy", "Log Loss"] log = PD. DataFrame (Columns=log_cols) for CLF in Classifiers:clf.fit (X_train, y_train) name =clf.__class__.__name__ print ("=" *30) print (name) print (' ****results**** ') train_predictions = cl F.predict (x_test) acc = Accuracy_score (y_test, train_predictions) print ("accuracy: {:. 4%}". Format (ACC)) T Rain_predictions = Clf.predict_proba (x_test) ll = Log_loss (y_test, train_predictions) print ("Log loss: {}". Format ( ll)) Log_entry = PD

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Kaggle Code: Leaf classification Sklearn Classifier application

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Kaggle Code: Leaf classification Sklearn Classifier application

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support