As a beginner of Python, when making a decision tree algorithm program, a question about the deletion of the list bothered the day. In the blog today to write, I hope you can also avoid such problems.
Here is the code outline:
def read_txt (filename):#定义了一个读txt文件的函数, this function is to convert a txt table into a python list.
It is a nested list, following a few lines of sticky:
[' Si ', '? ', ' 180211 ', ' some-college ', ' ten ', ' Married-civ-spouse ', '? ', ' Husband ', ' asian-pac-islander ', ' Male ', ' 0 ', ' 0 ', ' >50k ', ' South ', '
[' + ', ' Private ', ' 84154 ', ' some-college ', ' ten ', ' Married-civ-spouse ', ' Sales ', ' Husband ', ' white ', ' Male ', ' 0 ', ' 0 ', ' a ', '? ', ' >50k ']
There will be a lot of '? ' in this list. The marked fields will affect my subsequent processing. So I'm using one way, that's for inclusion '? ' is deleted, the next sub-function is the function.
def computena (items):
#detect missing value in list and handle it
For item in items:
If Item.count ('? ') > 0:
Items.remove (item)
return items
Just a few lines of code, meaning very clearly. Next:
FileName = "C:\data.txt"
Data_list = read_txt (filename)
data_filtered = Computena(data_list)
In the test phase of the code I found
Print Len (data_list) #得到结果32561
Print Len (data_filtered) #得到结果30310
I thought this function very good processing missing value, the subsequent processing only then discovered the original data_filtered also has '? Exist Do you know why this is?
Next I start working on this bug:
I added another sentence data_filtered2 = computena (data_filtered) then print len (data_filtered2) #得到结果30162. Strange is the data_list in the missing value is all gone, so I get the conclusion, to filter out data_list in the missing value must be filtered two times. It seems to make sense, but I know it's a bug. Later, I thought a lot of ways did not fix this problem, I even suspect that this is a python a huge loophole. Just so tangled up a day, just as I see Python provided by the documentation found such a phrase "
This means that if you want to change the contents of a list in a loop, you'd better make a copy of the list, so you just need to replace "foritem in items:" to "foritem in items[ :] :"So this bug is solved. I hope you will pay attention to this when you use the list later.
This article is from "Lu Yao" blog, please be sure to keep this source http://cwxfly.blog.51cto.com/6113982/1692027
Traps for lists in Python