This article describes the Python collection types (listtupledictsetgenerator). The collection types embedded in Python include list, tuple, set, and dict.
List: a list that looks like an array, but is more powerful than an array. it supports indexing, slicing, searching, and addition.
Tuple: the tuple function is similar to the list function. However, once generated, the length and elements are immutable (the elements of the elements are still variable). it seems to be a more lightweight and secure list.
Dictionary dict: The structure hash table of key-value pairs, which is of the same nature as the hash table. The Keys are unordered and not duplicated, making addition, deletion, modification, and convenience.
Set: A disordered and non-repeating set is a dict with only keys and no values. Java HashSet is implemented using HashMap. I hope python will not. after all, set does not require value, saves a lot of pointers.
Generator:
It is called a generator or list derivation. it is a special data type in python. In fact, it is not a data structure. it only includes algorithms and temporary statuses and has iterative functions.
First, let's take a look at their memory usage and use the generator to generate the set, dict, generator, tuple, and list of The 100000 elements. The consumed memory dict, set, list, and tuple are reduced sequentially, and the size of the generated object is the same. Because generator does not generate data tables, it does not need to consume memory:
import sysfrom memory_profiler import profile@profiledef create_data(data_size): data_generator = (x for x in xrange(data_size)) data_set = {x for x in xrange(data_size)} data_dict = {x:None for x in xrange(data_size)} data_tuple = tuple(x for x in xrange(data_size)) data_list = [x for x in xrange(data_size)] return data_set, data_dict, data_generator, data_tuple, data_listdata_size = 100000for data in create_data(data_size): print data.__class__, sys.getsizeof(data)Line # Mem usage Increment Line Contents================================================ 14.6 MiB 0.0 MiB @profile def create_data(data_size): 14.7 MiB 0.0 MiB data_generator = (x for x in xrange(data_size)) 21.4 MiB 6.7 MiB data_set = {x for x in xrange(data_size)} 29.8 MiB 8.5 MiB data_dict = {x:None for x in xrange(data_size)} 33.4 MiB 3.6 MiB data_tuple = tuple(x for x in xrange(data_size)) 38.2 MiB 4.8 MiB data_list = [x for x in xrange(data_size)] 38.2 MiB 0.0 MiB return data_set, data_dict, data_generator, data_tuple, data_list
4194528
6291728
72
800048
824464
Let's look at the query performance. dict, set is the constant query time (O (1), and list and tuple are the linear query time (O (n )), use the generator to generate an object of a specified size element and use a random number to search for it:
Import timeimport sysimport randomfrom memory_profiler import profiledef create_data (data_size): data_set = {x for x in xrange (data_size)} data_dict = {x: None for x in xrange (data_size )} data_tuple = tuple (x for x in xrange (data_size) data_list = [x for x in xrange (data_size)] return data_set, data_dict, data_tuple, data_listdef cost_time (func ): def cost (* args, ** kwargs): start = time. time () r = func (* args, ** kwargs) cost = time. time ()-start print 'find in % s cost time % s' % (r, cost) return r, cost # return data type and time consumed by method execution return cost @ cost_timedef test_find (test_data, data): for d in test_data: if d in data: pass return data. _ class __. _ name _ data_size = 100test_size = 0000000test_data = [random. randint (0, data_size) for x in xrange (test_size)] # print test_datafor data in create_data (data_size): test_find (test_data, data) output: cannot find in
Cost time 0.47200012207 find in
Cost time 0.429999828339 find in
Cost time 5.36500000954 find in
Cost time 5.53399991989
The size of the collection of 100 elements, respectively times, the gap is very obvious. However, all these random numbers can be found in the collection. Modify the random number method and generate half of them to be searched. half of them cannot be searched. From the print information, we can see that in the case of half of the worst search examples, list and tuple performance is worse.
Def randint (index, data_size): return random. randint (0, data_size) if (x % 2) = 0 else random. randint (data_size, data_size * 2) test_data = [randint (x, data_size) for x in xrange (test_size)] output: ---------------------------------------------- find in
Cost time 0.450000047684 find in
Cost time 0.397000074387 find in
Cost time 7.83299994469 find in
Cost time 8.27800011635
The number of elements increases from 10 to 500, and the time consumed by each query is counted as times. The curve consumed by the time is fitted with a graph. The result is as follows: dict and set no matter the number of elements, it is always a constant query time. as the elements increase, dict and tuple show a linear growth time:
Import matplotlib. pyplot as plotfrom numpy import * data_size = array ([x for x in xrange (10,500, 10)]) test_size = 100000cost_result = {} for size in data_size: test_data = [randint (x, size) for x in xrange (test_size)] for data in create_data (size): name, cost = test_find (test_data, data) # cost_result.setdefault (name, []) returned by the decorator function. append (cost) plot. figure (figsize = (10, 6) xline = data_sizefor data_type, result in cost_result.items (): yline = array (result) plot. plot (xline, yline, label = data_type) plot. ylabel ('Time spend') plot. xlabel ('find times ') plot. grid () plot. legend () plot. show ()
The iteration time is slightly different. dict and set consume a little more time:
@ Cost_timedef test_iter (data): for d in data: pass return data. _ class __. _ name _ data_size = array ([x for x in xrange (1, 500000,100 0)]) cost_result = {}for size in data_size: for data in create_data (size ): name, cost = test_iter (data) cost_result.setdefault (name, []). append (cost) # fit the plot. figure (figsize = (10, 6) xline = data_sizefor data_type, result in cost_result.items (): yline = array (result) plot. plot (xline, yline, label = data_type) plot. ylabel ('Time spend') plot. xlabel ('iter times ') plot. grid () plot. legend () plot. show ()
The time consumed by adding an element is shown in the figure below. The time taken to calculate the number of elements increment by 10000 is a linear growth time. no difference can be found. The tuple type cannot add new elements, so do not compare:
@ Cost_timedef test_dict_add (test_data, data): for d in test_data: data [d] = None return data. _ class __. _ name __@ cost_timedef test_set_add (test_data, data): for d in test_data: data. add (d) return data. _ class __. _ name __@ cost_timedef test_list_add (test_data, data): for d in test_data: data. append (d) return data. _ class __. _ name __# initialize data and specify the method def init_data (): test_data = {'List': (list (), test_list_add) for each type of element to be added ), 'set': (set (), test_set_add), 'dict ': (dict (), test_dict_add )} return test_data # Add time for each 10000 incremental data check data_size = array ([x for x in xrange (10000,100 0000, 10000)]) cost_result = {} for size in data_size: test_data = [x for x in xrange (size)] for data_type, (data, add) in init_data (). items (): name, cost = add (test_data, data) # return the method execution time cost_result.setdefault (data_type, []). append (cost) plot. figure (figsize = (10, 6) xline = data_sizefor data_type, result in cost_result.items (): yline = array (result) plot. plot (xline, yline, label = data_type) plot. ylabel ('Time spend') plot. xlabel ('add times ') plot. grid () plot. legend () plot. show ()
The above is a detailed description of the Python set type (list tuple dict set generator). For more information, see other related articles in the first PHP community!