Reading speed issues for Python dictionaries and lists
Recently in the genome data processing, need to read large data (2.7G) into the dictionary, and then the processed data to the dictionary key value matching, in the processed file after each read a row to find out whether in the dictionary keys, the following two code of the efficiency difference is very large:
First paragraph:
if (Pos in Fre_dist.keys ()):
Newvalue= Fre_dist[pos]
Second paragraph:
if (pos in fre_dist):
Newvalue=fre_dist[pos]
When processing 30,000 pieces of data, the second piece of code is the speed of the first piece of code.
The reason is: the first code Fre_dist.keys () becomes List,python when retrieving the list is slower, the second code fre_dist is a dictionary, Python in the dictionary when retrieving the speed is relatively fast.
The lesson of blood.
DICT structure, I think most people will think of the for-key in Dictobj method, it is true that this method is applicable in most cases. But not completely safe, see the following example:
Copy CodeThe code is as follows: #这里初始化一个dict
>>> d = {' A ': 1, ' B ': 0, ' C ': 1, ' d ': 0}
#本意是遍历dict, if you find that the value of the element is 0, delete it.
>>> for K in D:
... if d[k] = = 0:
... del (d[k])
...
Traceback (most recent):
File "<stdin>", line 1, in <module>
Runtimeerror:dictionary changed size during iteration
#结果抛出异常了, two elements of 0, also delete only one.
>>> D
{' A ': 1, ' C ': 1, ' d ': 0}
>>> d = {' A ': 1, ' B ': 0, ' C ': 1, ' d ': 0}
#d. Keys () is an array of subscripts
>>> D.keys ()
[' A ', ' C ', ' B ', ' d ']
#这样遍历, there is no problem, because in fact this is the D.keys () this list constant traversal.
>>> for K in D.keys ():
... if d[k] = = 0:
... del (d[k])
...
>>> D
{' A ': 1, ' C ': 1}
#结果也是对的
>>>
In fact, this example is I simplified, I am in a multi-threaded program to find this problem, so, my advice is: when traversing dict, to develop the use for K in D.keys () habit.
However, if it is multi-threaded, so it is absolutely safe? Not necessarily: When all two threads have finished D.keys (), if two threads are to delete the same key, the first delete will be successful, after the deletion of that will certainly be reported Keyerror, this seems to be only by other means to ensure.
Another article: Dict performance comparison of two kinds of traversal modes
About the performance issues with parentheses and without parentheses in tangled dict traversal
Copy CodeThe code is as follows:
for (d,x) in Dict.items ():
Print "Key:" +d+ ", Value:" +str (x)
For d,x in Dict.items ():
Print "Key:" +d+ ", Value:" +str (x)
As we can see, the dict of the number of bars at 2001 is a bit higher with parentheses, but less execution time with no parentheses after more than 200 of the data.
The dictionary is denoted by curly braces ({}), where the Xiangcheng appears, and a key corresponds to a value;key and value
Separated by a colon (:), separated by a comma (,) between the different items.
Python Shell:
n = {' username ': ' zz ', ' Password ': 123}n.keys () Dict_keys ([' username ', ' password ']) n.values () Dict_keys ([' ZZ ', 123]) N.items () dict_items ([' username ', ' Zc '), (' Password ', 123)]) (K,V) in N.items (): print ("This ' s key:%r"%k) Print ("This ' s value:%r"%v ") this ' s key: ' username ' this ' s value: ' Zc ' this ' s key: ' password ' this ' s value:123
Zip (): the element that takes each of the arrays in turn, and then combines
n = [1,2,3]m = [' A ', ' B ', ' c ']a = Zip (m,n) for I in A: print (i) (' a ', 1) (' B ', 2) (' C ', 3)
n = [1,2,3]m = [' A ', ' B ', ' c ']a = Zip (m,n) for (M,n) in a: print (M,N) a 1b 2c 3
Range Merge:
For I in range (48,58) +range (65,91):
C8=CHR (i);
Python dictionary dict and list reading speed problem, range merge