The multiclass_classification of Xgboost learning examples

Source: Internet
Author: User
Tags python script xgboost in python

The previous article used the Xgboost CLI to do two categories, now to do a multiple classification.


The data set used is the UCI skin disease set


A total of 34 attribute sets, 6 categories of labels, property set in addition to family history is a nominal value, the other is a linear value line values

7. Attribute information:--Complete attribute documentation:clinical Attributes: (Take values 0, 1, 2, 3, Unle SS otherwise indicated) 1:erythema 2:scaling 3:definite borders 4:itching 5:koebner PHE Nomenon 6:polygonal papules 7:follicular papules 8:oral mucosal involvement 9:knee and elbow I Nvolvement 10:scalp Involvement 11:family history, (0 or 1) 34:age (linear) histopathological Attr Ibutes: (Take values 0, 1, 2, 3) 12:melanin incontinence 13:eosinophils in the infiltrate 14:PNL Ate 15:fibrosis of the papillary dermis 16:exocytosis 17:acanthosis 18:hyperkeratosis 19:par Akeratosis 20:clubbing of the Rete ridges 21:elongation of the rete ridges 22:thinning of the Suprapapil Lary Epidermis 23:spongiform pustule 24:munro microabcess 25:focal hypergranulosis 26:disappearanc
E of the granular layer     27:vacuolisation and damage of basal layer 28:spongiosis 29:saw-tooth appearance of Retes 30:folli Cular Horn plug 31:perifollicular parakeratosis 32:inflammatory monoluclear inflitrate 33:band-like infi Ltrate

The real data is as long as this:

1,1,2,3,2,2,0,3,0,0,0,2,0,0,0,2,2,1,2,0,0,0,0,0,3,0,3,0,3,1,0,2,3,50,3
3,2,1,2,0,0,0,0,1,2,0,0,0,1,0,0,2,0,3,2,2,2,1,2,0,2,0,0,0,0,0,1,0,50,1
3,2,0,2,0,0,0,0,0,0,0,0,1,2,0,2,1,1,1,0,0,0,1,0,0,0,0,0,0,0,0,1,0,10,2
2,3,3,3,3,0,0,0,3,3,0,0,0,0,0,0,3,2,2,3,3,3,1,3,0,0,0,0,0,0,0,1,0,34,1
2,2,1,0,0,0,0,0,1,0,1,0,0,2,0,0,2,1,2,2,1,2,0,1,0,0,0,0,0,0,0,0,0,?, 1
2,1,0,0,2,0,0,0,0,0,0,0,0,0,0,2,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,?, 4
2,2,1,2,0,0,0,0,0,0,0,0,0,2,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,?, 2
2,1,2,3,2,3,0,2,0,0,1,1,0,0,0,2,1,1,2,0,0,0,0,0,1,0,2,0,2,0,0,0,3,?, 3

Last column label, question mark indicating age unknown

, this time we trained in Python script instead of Xgboost CLI, which looks more convenient:

train.py:

#!/usr/bin/python from __future__ Import Division import numpy as NP import xgboost as XGB # label need to is 0 to Num_  Class-1 data = Np.loadtxt ('./dermatology.data ', delimiter= ', ', Converters={33:lambda x:int (x = = '? '), 34:lambda X:int (x)-1}) Sz = Data.shape train = Data[:int (sz[0] * 0.7),:] test = Data[int (sz[0) * 0.7):,:] train_x = train[:,: 3 3] train_y = train[:, test_x = test[:,: test_y = test[:, Xg_train = XGB. Dmatrix (train_x, label=train_y) xg_test = XGB. Dmatrix (test_x, label=test_y) # setup parameters for Xgboost param = {} # use Softmax multi-class classification param[' OB Jective '] = ' multi:softmax ' # scale weight of positive examples ' eta '] = param[0.1 ' param['] = 6 max_depth ' param['] = 1 param[' nthread '] = 4 param[' num_class '] = 6 watchlist = [(Xg_train, ' Train '), (xg_test, ' Test ')] Num_round = 5 BST = Xgb.train (param, Xg_train, Num_round, watchlist) # get prediction pred = bst.predict (xg_test) error_rate = np.sum (pred!= test_y)/test_y.shape[0] Print (' Test error using Softmax = {} '. Format (error_rate)) # Do the same thing again, but output probabilities param[' objective '] = ' multi:softprob ' BST = Xgb.train (param, Xg_train, Num_round, watchlist) # Note:this Convention has been changed since xgboost-unity # get prediction, this are in 1D array, need reshape to (Ndata, nclass) Pred_prob = BST.PR Edict (Xg_test). Reshape (Test_y.shape[0], 6) Pred_label = Np.argmax (Pred_prob, axis=1) error_rate = np.sum (pred!= Test_Y)
 /test_y.shape[0] Print (' test error using Softprob = {} '. Format (error_rate))

The code is also very simple, it is worth mentioning that is to deal with the Age column data, the label changed to 0 from the beginning of the notation, and two training methods:

Multi:softprob
And
Multi:softmax






Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.