Kaggle Code: Leaf classification Sklearn Classifier application

Source: Internet
Author: User
which Classifier is should I Choose?

This is one of the most import questions to ask when approaching a machine learning problem. I find it easier to just test them all at once. Here's your favorite Scikit-learn algorithms applied to the leaf data. In [1]:

Import NumPy as NP
import pandas as PD
import Seaborn as SNS
import Matplotlib.pyplot as PLT

def warn (*arg S, **kwargs): Pass
import warnings
Warnings.warn = Warn from

sklearn.preprocessing import Labelencoder From
sklearn.cross_validation import stratifiedshufflesplit

train = Pd.read_csv ('.. /input/train.csv ')
test = Pd.read_csv ('.. /input/test.csv ')
Data PreparationIn [2]:
# Swiss Army knife function to organize the data

def encode (train, test):
    le = Labelencoder (). Fit (train.species) 
  labels = Le.transform (train.species)           # Encode species strings
    classes = List (Le.classes_)                    # Save Column Names for submission
    test_ids = test.id                             # Save test IDs for submission
    
    train = Train.drop ([' species ', ' id '], a Xis=1)  
    test = Test.drop ([' ID '], Axis=1)
    
    return train, labels, test, test_ids, Classes

train, labels, test, Test_ids, classes = Encode (train, test)
Train.head (1)
OUT[2]: margin6
margin1 margin2 margin3 margin4 margin5 margin7 margin8 margin9 margin10 ... Texture55 texture56 texture57 texture58 texture59 Texture60 texture61 texture62 Texture63 texture64
0 0.007812 0.023438 0.023438 0.003906 0.011719 0.009766 0.027344 0.0 0.001953 0.033203 ... 0.007812 0.0 0.00293 0.00293 0.035156 0.0 0.0 0.004883 0.0 0.025391

1 rowsx192 columns stratified train/test Split

Stratification is necessary for this dataset because there are a relatively large number of classes (classes for 990 SA Mples). This would ensure we have all classes represented in both the train and test indices. In [3]:

SSS = stratifiedshufflesplit (labels, ten, test_size=0.2, random_state=23)

for Train_index, Test_index in SSS:
    x_ Train, X_test = Train.values[train_index], Train.values[test_index]
    y_train, y_test = Labels[train_index], labels[ Test_index]
Sklearn Classifier Showdown

Simply Looping through out-of-the box classifiers and printing the results. Obviously, these would perform much better after tuning their hyperparameters, but this gives you a decent ballpark idea. In [4]:

From sklearn.metrics import Accuracy_score, log_loss from sklearn.neighbors import kneighborsclassifier from SKLEARN.SVM Import SVC, Linearsvc, nusvc from Sklearn.tree import decisiontreeclassifier from sklearn.ensemble import Randomforestcla Ssifier, Adaboostclassifier, gradientboostingclassifier from Sklearn.naive_bayes import gaussiannb from Sklearn.discriminant_analysis Import lineardiscriminantanalysis from sklearn.discriminant_analysis Import Quadraticdiscriminantanalysis classifiers = [Kneighborsclassifier (3), SVC (kernel= "RBF", c=0.025, Probability=tru
    e), Nusvc (Probability=true), Decisiontreeclassifier (), Randomforestclassifier (), Adaboostclassifier (),  Gradientboostingclassifier (), GAUSSIANNB (), Lineardiscriminantanalysis (), Quadraticdiscriminantanalysis ()] # Logging for Visual Comparison log_cols=["Classifier", "accuracy", "Log Loss"] log = PD. DataFrame (Columns=log_cols) for CLF in Classifiers:clf.fit (X_train, y_train) name =clf.__class__.__name__ print ("=" *30) print (name) print (' ****results**** ') train_predictions = cl F.predict (x_test) acc = Accuracy_score (y_test, train_predictions) print ("accuracy: {:. 4%}". Format (ACC)) T Rain_predictions = Clf.predict_proba (x_test) ll = Log_loss (y_test, train_predictions) print ("Log loss: {}". Format ( ll)) Log_entry = PD

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.