The path of machine learning: A python linear regression classifier for predicting benign and malignant tumors

Last Update:2018-04-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Using Python3 to learn the API of linear regression

Prediction of benign and malignant tumors using logistic regression and stochastic parameter estimation regression respectively

I downloaded the dataset locally and can come to my git to download the source code and dataset:Https://github.com/linyi0604/kaggle

1 ImportNumPy as NP2 ImportPandas as PD3  fromSklearn.cross_validationImportTrain_test_split4  fromSklearn.preprocessingImportStandardscaler5  fromSklearn.linear_modelImportlogisticregression, Sgdclassifier6  fromSklearn.metricsImportClassification_report7 8 " "9 linear classifierTen The most basic and commonly used machine learning model One linear assumptions constrained by data characteristics and classification targets A Logistic regression computation time is long, model performance is slightly higher - short calculation time of stochastic parameters and slightly lower performance of the model - " " the  - " " - 1 Data preprocessing - " " + #Create a Feature list -Column_names = ['Sample Code number','Clump Thickness','uniformity of Cell Size', +                 'uniformity of Cell Shape','Marginal Adhesion','Single epithelial Cell size', A                 'Bare Nuclei','Bland Chromatin','Normal Nucleoli','Mitoses','Class'] at #using PANDAS.READ_CSV to fetch datasets -data = Pd.read_csv ('./data/breast/breast-cancer-wisconsin.data', names=column_names) - #Replace with a standard missing value representation -data = Data.replace (to_replace='?', value=Np.nan) - #loss of data with missing values discarded as long as there is a missing dimension -data = Data.dropna (how=' any') in #the number and dimensions of the output data - #print (Data.shape) to  +  - " " the 2 preparation of benign and malignant tumors training, test data section * " " $ #random Sample 25% data for testing 75% data for trainingPanax NotoginsengX_train, X_test, y_train, y_test = Train_test_split (data[column_names[1:10]], -Data[column_names[10]], thetest_size=0.25, +Random_state=33) A #identification of the number and type distribution of training samples and test samples the #print (y_train.value_counts ()) + #print (y_test.value_counts ()) - " " $ Training Samples Total 512 of them 344 benign tumors 168 malignant tumors $ 2 344 - 4 168 - Name:class, Dtype:int64 the test Data Total 171 of them 100 benign tumors 71 malignant tumors - 2Wuyi 4 the Name:class, Dtype:int64 - " " Wu  -  About " " $ 3 machine learning models for predictive parts - " " - #data normalization to ensure that the variance of each dimension feature is 1 mean 0 The predicted result will not be dominated by the eigenvalues of some dimensions -SS =Standardscaler () AX_train = Ss.fit_transform (X_train)#standardize the X_train +X_test = Ss.transform (x_test)#standardize the x_test with the same rules as x_train, without re-establishing the rules the  - #the two methods of logistic regression and stochastic parameter estimation were used to predict learning . $  theLR = Logisticregression ()#Initialize logistic regression model theSGDC = Sgdclassifier ()#initialization of stochastic parameter estimation model the  the #use logistic regression to train on training sets - Lr.fit (X_train, Y_train) in #after training, the prediction results of the test set are saved in Lr_y_predict. theLr_y_predict =lr.predict (x_test) the  About #use random parameter estimation to train on training sets the Sgdc.fit (X_train, Y_train) the #after training, the prediction results of the test set are saved in Sgdc_y_predict. theSgdc_y_predict =sgdc.predict (x_test) +  - " " the 4 Performance Analysis SectionBayi " " the #Logistic regression model with scoring function score to obtain the accuracy rate of the model on the test set the Print("Logistic regression accuracy rate:", Lr.score (X_test, y_test)) - #other metrics for logistic regression - Print("other indicators for logistic regression: \ n", Classification_report (Y_test, Lr_y_predict, target_names=["Benign","Malignant"])) the  the #performance analysis of stochastic parameter estimation the Print("estimation accuracy of stochastic parameters:", Sgdc.score (X_test, y_test)) the #Other indicators of stochastic parameter estimation - Print("Other indicators for stochastic parameter estimation: \ n", Classification_report (Y_test, Sgdc_y_predict, target_names=["Benign","Malignant"])) the  the " " the Recall Recall Rate94 Precision Accuracy Rate the Fl-score the  Support the 98 Logistic regression accuracy rate: 0.9707602339181286 About Other indicators of logistic regression: - Precision recall F1-score support101 102 benign 0.96 0.99 0.98103 Malignant 0.99 0.94 0.96104  the avg/total 0.97 0.97 0.97 171106 107 estimation accuracy of stochastic parameters: 0.9649122807017544108 Other indicators of stochastic parameter estimation:109 Precision recall F1-score support the 111 benign 0.97 0.97 0.97 the malignant 0.96 0.96 0.96113  the avg/total 0.96 0.96 0.96 171 the " "

The path of machine learning: A python linear regression classifier for predicting benign and malignant tumors

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The path of machine learning: A python linear regression classifier for predicting benign and malignant tumors

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The path of machine learning: A python linear regression classifier for predicting benign and malignant tumors

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support