Python's nested collection types are list, tuple, set, Dict.
List: Looks like an array, but more powerful than an array, supports indexing, slicing, finding, adding, and so on.
Tuple tuples: Functions are similar to lists, but once generated, the lengths and elements are immutable (elements are mutable) and seem to be a more lightweight, secure list.
Dictionary dict: Key-value pairs structure hash table, as the nature of hash table, key unordered and not repeat, adding and removing changes convenient and quick.
Set: unordered and non-repeating collection, that is, a key without the value of the Dict,java HashSet is implemented with HashMap, I hope Python will not be so, after all, set does not need value, save a lot of pointers.
Generator:
Called a generator, or a list derivation, there is a special data type in Python, which is not actually a data structure, only includes the algorithm and staging state, and has the function of iteration.
First look at their memory usage, using generators to generate 100,000 elements of set, dict, generator, tuple, list. The consumed memory dict, set, list, and tuple are reduced in turn, and the resulting object size is the same. Because generator does not generate a data table, it does not need to consume memory:
Import sysfrom Memory_profiler import profile@profiledef create_data (data_size): Data_generator = (x for x in Xrange (DA ta_size)) Data_set = {x for x in Xrange (data_size)} data_dict = {x:none for x in Xrange (data_size)} data_tuple = Tuple (x for x in Xrange (data_size)) data_list = [x for x in Xrange (data_size)] return data_set, Data_dict, Data_gene Rator, data_tuple, data_listdata_size = 100000for data in Create_data (data_size): Print data.__class__, sys.getsizeof (d ATA) Line # Mem usage Increment line contents================================================ 14.6 MiB 0.0 MIB @profile def create_data (data_size): 14.7 mib 0.0 MIB data_generator = (x F or x in Xrange (data_size)) 21.4 mib 6.7 MiB Data_set = {x for x in Xrange (data_size)} 29.8 MIB 8.5 M IB data_dict = {x:none for x in Xrange (data_size)} 33.4 MIB 3.6 mib data_tuple = tuple (x for x in Xran GE (data_size)) 38.2 MIB 4.8 MiB data_list = [x for x in Xrange (data_size)] 38.2 MIB 0.0 MIB return data_set, data_dict , Data_generator, Data_tuple, data_list <type ' Set ' > 4194528<type ' dict ' > 6291728<type ' Generator ' > 72 <type ' tuple ' > 800048<type ' list ' > 824464
Look at the lookup performance, Dict,set is the constant lookup time (O (1)), list, tuple is the linear lookup time (o (n)), with the generator to generate the object of the specified size, with randomly generated numbers to find:
Import timeimport sysimport randomfrom memory_profiler import profiledef create_data (data_size): Data_set = {x for x in Xrange (data_size)} data_dict = {x:none for x ' Xrange (data_size)} data_tuple = tuple (x for x in Xrange (data_size)) Data_list = [x for x in Xrange (data_size)] return data_set, Data_dict, Data_tuple, Data_listdef Cost_time (func): def cost (*args, **kwargs): start = Time.time () R = func (*args, **kwargs) cost = Time.time ()-Start print ' Find in%s cost time%s '% (r, cost) return R, cost #返回数据的类型和方法执行消耗的时间 return cost@cost_timedef t Est_find (Test_data, data): For D in Test_data:if D in Data:pass return Data.__class__.__name__da Ta_size = 100test_size = 10000000test_data = [Random.randint (0, Data_size) for x in Xrange (test_size)] #print test_datafor Data in Create_data (data_size): Test_find (test_data, data) output:----------------------------------------------Find in & Lt;type ' Set ' > Cost time 0.47200012207find in <type ' dict ' > Cost time 0.429999828339find in <type ' tuple ' > Cost time 5.36500000954find In <type ' list ' > cost time 5.53399991989
100 elements of the size of the collection, respectively, to find 1000W times, the gap is very obvious. However, these random numbers can be found in the collection. Modify the random number method, half of the generation can be found, half is not found. From the printing information can be seen in half the worst case, list, tuple performance worse.
def randint (Index, data_size): return random.randint (0, data_size) if (x% 2) = = 0 Else random.randint (data_size, data _size * 2) Test_data = [Randint (x, Data_size) for x in Xrange (test_size)] Output:----------------------------------------------Find in <type ' Set ' > Cost time 0.450000047684find in <type ' Dict ' > Cost time 0.397000074387find in <type ' tuples ' > Cost time 7.83299994469find in <type ' list ' > cost Tim E 8.27800011635
The number of elements increased from 10 to 500, the statistics of each time to find 10W times, the graph to fit the time consumption curve, the results, such as the results prove dict, set regardless of the number of elements, has always been constant lookup time, dict, tuple with the element growth, the presence of linear growth time:
Import Matplotlib.pyplot as Plotfrom numpy import *data_size = Array ([x for X in Xrange (ten, Ten)]) Test_size = 100000co St_result = {}for size in data_size: test_data = [Randint (x, size) ' For X ' Xrange (test_size)] for data in Create_ Data (size): name, cost = Test_find (Test_data, data) #装饰器函数返回函数的执行时间 cost_result.setdefault (name, []). Append ( Cost) Plot.figure (Figsize= (ten, 6)) Xline = Data_sizefor data_type, result in Cost_result.items (): yline = Array ( Result) Plot.plot (Xline, Yline, Label=data_type) Plot.ylabel (' Time Spend ') plot.xlabel (' Find Times ') Plot.grid () Plot.legend () plot.show ()
Iteration time, the difference is very weak, dict, set to slightly consume time a little more:
@cost_timedef test_iter (data): for D in data: pass return data.__class__. __name__data_size = Array ([x for X In xrange (1, 500000, +)]) Cost_result = {}for size in data_size: for data in Create_data (size): name, cost = tes T_iter (data) Cost_result.setdefault (name, []). Append (Cost) #拟合曲线图plot. Figure (Figsize= (6)) Xline = Data_ Sizefor data_type, result in Cost_result.items (): yline = Array (result) Plot.plot (Xline, Yline, Label=data_ Type) Plot.ylabel (' time Spend ') plot.xlabel (' Iter Times ') Plot.grid () Plot.legend () plot.show ()
Delete element consumption time diagram as follows, randomly delete 1000 elements, the tuple type cannot delete the element, so do not compare:
When you randomly delete half of the elements, the graph increases exponentially (O (n2)):
The time to add an element to consume is illustrated below, counting the addition time of the number of elements in increments of 10000, which are linear growth times, do not see any difference, the tuple type cannot add new elements, so do not compare:
@cost_timedef Test_dict_add (Test_data, data): for-D in test_data:data[d] = None return data.__class__. __nam E__@cost_timedef Test_set_add (Test_data, data): for D in Test_data:data.add (d) return data.__class__. __name __@cost_timedef Test_list_add (Test_data, data): for D in Test_data:data.append (d) return data.__class__. __n ame__# initializes the data, specifying how each type corresponds to its added element def init_data (): Test_data = {' list ': (List (), test_list_add), ' Set ': (Set () , Test_set_add), ' dict ': (Dict (), Test_dict_add)} return test_data# time to detect the 10000 increment of the data added times Data_size = Array ([x For x in Xrange (10000, 1000000, 10000)]) Cost_result = {}for size in data_size:test_data = [x for x in xrange (size)] For Data_type, (data, add) in Init_data (). Items (): name, cost = Add (test_data, data) #返回方法的执行时间 Cost_resul T.setdefault (Data_type, []). Append (Cost) plot.figure (figsize= (6)) Xline = Data_sizefor data_type, result in Cost_ Result.items (): Yline = ArraY (Result) Plot.plot (Xline, Yline, Label=data_type) Plot.ylabel (' Time Spend ') plot.xlabel (' Add Times ') Plot.grid () Plot.legend () plot.show ()