Data mining--python Getting Started classic study on classification of breast cancer

Source: Internet
Author: User

Based on the tumor characteristics of malignant tumors or benign tumors, through the study of 699 patients with tumor properties, to find the tumor prediction model, according to the tumor properties to determine the nature of the tumor, for the patients who have not seen the face, according to the attributes to determine whether the malignant tumor.

Data used: Link: http://pan.baidu.com/s/1c26Dbjy Password: gllb

[HTML]View Plain Copy
  1. ###########################################
  2. # classifier: Benign or malignant tumors
  3. ###########################################
  4. ###########################################
  5. # read into the dataset and get a list of meta-fathers
  6. ###########################################
  7. def readset (FileName):
  8. trainset = []
  9. Trainfile = open (FileName)
  10. For line in Trainfile:
  11. line = Line.strip () #去掉 ' \ n '
  12. If '? ' line: #注意: Do not have a space in the middle of the quotation marks, remove the bad data containing the question mark
  13. Continue
  14. Id,a1,a2,a3,a4,a5,a6,a7,a8,a9,diag = Line.split (', ') #以逗号分开
  15. if diag = = ' 4 ':
  16. Diagmorb = ' m '
  17. Else
  18. Diagmorb = ' B '
  19. Patienttuple = (Id,diagmorb,int (a1), int (A2), int (A3), int (A4), Int (A5), \
  20. Int (A6), int (A7), int (A8), int (A9))
  21. Trainset.append (Patienttuple)
  22. return trainset
  23. ###########################################
  24. # Training Classifier
  25. ###########################################
  26. def sumlists (LIST1,LIST2):
  27. Listofsums =[0.0] * 9
  28. For index in range (9):
  29. Listofsums[index] = List1[index] + List2[index]
  30. Return listofsums
  31. def makeaverages (listofsums,total):
  32. Averagelist =[0.0] * 9
  33. For index in range (9):
  34. Averagelist[index] = Listofsums[index]/float (total)
  35. Return averagelist
  36. def Classifier (trainset):
  37. Benignsums = [0] * 9
  38. Benigncount = 0
  39. Malignantsums = [0] * 9
  40. Malignantcount = 0
  41. For Patienttup in trainset:
  42. If patienttup[1] = = ' B ':
  43. Benignsums = Sumlists (benignsums,patienttup[2:])
  44. Benigncount + = 1
  45. Else
  46. Malignantsums = Sumlists (malignantsums,patienttup[2:])
  47. Malignantcount + = 1
  48. Benignavgs = Makeaverages (Benignsums,benigncount)
  49. Malignantavgs = Makeaverages (Malignantsums,malignantcount)
  50. Classifier = Makeaverages (sumlists (Benignavgs,malignantavgs), 2)
  51. return classifier
  52. ###########################################
  53. # test Classifier
  54. ###########################################
  55. def Test (Testset,classifier):
  56. results = []
  57. For patient in Testset:
  58. Benigncount = 0
  59. Malignantcount = 0
  60. For index in range (9):
  61. If Patient[index + 2] > Classifier[index]: #注意索引值加2才是属性值
  62. Malignantcount + = 1
  63. Else
  64. Benigncount + = 1
  65. Resulttuple = (patient[0],benigncount,malignantcount,patient[1])
  66. Results.append (Resulttuple)
  67. return results
  68. ###########################################
  69. # Format Output Test results
  70. ###########################################
  71. def showresult (Result):
  72. TotalCount = 0
  73. Wrongcount = 0
  74. For R in Result:
  75. TotalCount + = 1
  76. If r[1] > r[2]:
  77. If r[3] = = ' m ':
  78. Wrongcount + = 1
  79. Elif r[3] = = ' B ':
  80. Wrongcount + = 1
  81. Print ("%d patients,there were%d wrong"% (Totalcount,wrongcount))
  82. ###########################################
  83. # main function
  84. ###########################################
  85. def main ():
  86. Print ("Reading in train data ...")
  87. Trainfilename = "C:\\python36\\code\\ruxian\\fulltraindata.txt"
  88. trainset = Readset (trainfilename)
  89. #print (trainset)
  90. Print ("Read trainset done!")
  91. Print ("Begin Training ...")
  92. Classifier = classifier (trainset)
  93. Print ("Train Classifier done!")
  94. Print ("Reading in test data ...")
  95. Testfilename = "C:\\python36\\code\\ruxian\\fulltestdata.txt"
  96. Testset = Readset (testfilename)
  97. Print ("Read testset done!")
  98. Print ("Begin testing ...")
  99. Result = Test (testset,classifier)
  100. #print (Result)
  101. Print ("Test done!")
  102. Showresult (Result)
  103. Print ("program finished.\n")

Reference: "Pthon Introductory Classics study book"

Data mining--python Getting Started classic study on classification of breast cancer

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.