The path of machine learning: A python linear regression classifier for predicting benign and malignant tumors

Source: Internet
Author: User

Using Python3 to learn the API of linear regression

Prediction of benign and malignant tumors using logistic regression and stochastic parameter estimation regression respectively

I downloaded the dataset locally and can come to my git to download the source code and dataset:Https://github.com/linyi0604/kaggle

1 ImportNumPy as NP2 ImportPandas as PD3  fromSklearn.cross_validationImportTrain_test_split4  fromSklearn.preprocessingImportStandardscaler5  fromSklearn.linear_modelImportlogisticregression, Sgdclassifier6  fromSklearn.metricsImportClassification_report7 8 " "9 linear classifierTen The most basic and commonly used machine learning model One linear assumptions constrained by data characteristics and classification targets A Logistic regression computation time is long, model performance is slightly higher - short calculation time of stochastic parameters and slightly lower performance of the model - " " the  - " " - 1 Data preprocessing - " " + #Create a Feature list -Column_names = ['Sample Code number','Clump Thickness','uniformity of Cell Size', +                 'uniformity of Cell Shape','Marginal Adhesion','Single epithelial Cell size', A                 'Bare Nuclei','Bland Chromatin','Normal Nucleoli','Mitoses','Class'] at #using PANDAS.READ_CSV to fetch datasets -data = Pd.read_csv ('./data/breast/breast-cancer-wisconsin.data', names=column_names) - #Replace with a standard missing value representation -data = Data.replace (to_replace='?', value=Np.nan) - #loss of data with missing values discarded as long as there is a missing dimension -data = Data.dropna (how=' any') in #the number and dimensions of the output data - #print (Data.shape) to  +  - " " the 2 preparation of benign and malignant tumors training, test data section * " " $ #random Sample 25% data for testing 75% data for trainingPanax NotoginsengX_train, X_test, y_train, y_test = Train_test_split (data[column_names[1:10]], -Data[column_names[10]], thetest_size=0.25, +Random_state=33) A #identification of the number and type distribution of training samples and test samples the #print (y_train.value_counts ()) + #print (y_test.value_counts ()) - " " $ Training Samples Total 512 of them 344 benign tumors 168 malignant tumors $ 2 344 - 4 168 - Name:class, Dtype:int64 the test Data Total 171 of them 100 benign tumors 71 malignant tumors - 2Wuyi 4 the Name:class, Dtype:int64 - " " Wu  -  About " " $ 3 machine learning models for predictive parts - " " - #data normalization to ensure that the variance of each dimension feature is 1 mean 0 The predicted result will not be dominated by the eigenvalues of some dimensions -SS =Standardscaler () AX_train = Ss.fit_transform (X_train)#standardize the X_train +X_test = Ss.transform (x_test)#standardize the x_test with the same rules as x_train, without re-establishing the rules the  - #the two methods of logistic regression and stochastic parameter estimation were used to predict learning . $  theLR = Logisticregression ()#Initialize logistic regression model theSGDC = Sgdclassifier ()#initialization of stochastic parameter estimation model the  the #use logistic regression to train on training sets - Lr.fit (X_train, Y_train) in #after training, the prediction results of the test set are saved in Lr_y_predict. theLr_y_predict =lr.predict (x_test) the  About #use random parameter estimation to train on training sets the Sgdc.fit (X_train, Y_train) the #after training, the prediction results of the test set are saved in Sgdc_y_predict. theSgdc_y_predict =sgdc.predict (x_test) +  - " " the 4 Performance Analysis SectionBayi " " the #Logistic regression model with scoring function score to obtain the accuracy rate of the model on the test set the Print("Logistic regression accuracy rate:", Lr.score (X_test, y_test)) - #other metrics for logistic regression - Print("other indicators for logistic regression: \ n", Classification_report (Y_test, Lr_y_predict, target_names=["Benign","Malignant"])) the  the #performance analysis of stochastic parameter estimation the Print("estimation accuracy of stochastic parameters:", Sgdc.score (X_test, y_test)) the #Other indicators of stochastic parameter estimation - Print("Other indicators for stochastic parameter estimation: \ n", Classification_report (Y_test, Sgdc_y_predict, target_names=["Benign","Malignant"])) the  the " " the Recall Recall Rate94 Precision Accuracy Rate the Fl-score the  Support the 98 Logistic regression accuracy rate: 0.9707602339181286 About Other indicators of logistic regression: - Precision recall F1-score support101 102 benign 0.96 0.99 0.98103 Malignant 0.99 0.94 0.96104  the avg/total 0.97 0.97 0.97 171106 107 estimation accuracy of stochastic parameters: 0.9649122807017544108 Other indicators of stochastic parameter estimation:109 Precision recall F1-score support the 111 benign 0.97 0.97 0.97 the malignant 0.96 0.96 0.96113  the avg/total 0.96 0.96 0.96 171 the " "

The path of machine learning: A python linear regression classifier for predicting benign and malignant tumors

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.