Detailed graph description of the Python collection type (listtupledictsetgenerator)

Source: Internet
Author: User
Tags add time time 0
This article describes the Python collection types (listtupledictsetgenerator). The collection types embedded in Python include list, tuple, set, and dict.

List: a list that looks like an array, but is more powerful than an array. it supports indexing, slicing, searching, and addition.

Tuple: the tuple function is similar to the list function. However, once generated, the length and elements are immutable (the elements of the elements are still variable). it seems to be a more lightweight and secure list.

Dictionary dict: The structure hash table of key-value pairs, which is of the same nature as the hash table. The Keys are unordered and not duplicated, making addition, deletion, modification, and convenience.

Set: A disordered and non-repeating set is a dict with only keys and no values. Java HashSet is implemented using HashMap. I hope python will not. after all, set does not require value, saves a lot of pointers.

Generator:

It is called a generator or list derivation. it is a special data type in python. In fact, it is not a data structure. it only includes algorithms and temporary statuses and has iterative functions.

First, let's take a look at their memory usage and use the generator to generate the set, dict, generator, tuple, and list of The 100000 elements. The consumed memory dict, set, list, and tuple are reduced sequentially, and the size of the generated object is the same. Because generator does not generate data tables, it does not need to consume memory:

import sysfrom memory_profiler import profile@profiledef create_data(data_size):    data_generator = (x for x in xrange(data_size))    data_set = {x for x in xrange(data_size)}    data_dict = {x:None for x in xrange(data_size)}    data_tuple = tuple(x for x in xrange(data_size))    data_list = [x for x in xrange(data_size)]    return data_set, data_dict, data_generator, data_tuple, data_listdata_size = 100000for data in create_data(data_size):    print data.__class__, sys.getsizeof(data)Line #    Mem usage    Increment   Line Contents================================================    14.6 MiB      0.0 MiB   @profile                            def create_data(data_size):    14.7 MiB      0.0 MiB       data_generator = (x for x in xrange(data_size))    21.4 MiB      6.7 MiB       data_set = {x for x in xrange(data_size)}    29.8 MiB      8.5 MiB       data_dict = {x:None for x in xrange(data_size)}    33.4 MiB      3.6 MiB       data_tuple = tuple(x for x in xrange(data_size))    38.2 MiB      4.8 MiB       data_list = [x for x in xrange(data_size)]    38.2 MiB      0.0 MiB       return data_set, data_dict, data_generator, data_tuple, data_list 
 
   4194528
  
    6291728
   
     72
    
      800048
     
       824464
     
    
   
  
 

Let's look at the query performance. dict, set is the constant query time (O (1), and list and tuple are the linear query time (O (n )), use the generator to generate an object of a specified size element and use a random number to search for it:

Import timeimport sysimport randomfrom memory_profiler import profiledef create_data (data_size): data_set = {x for x in xrange (data_size)} data_dict = {x: None for x in xrange (data_size )} data_tuple = tuple (x for x in xrange (data_size) data_list = [x for x in xrange (data_size)] return data_set, data_dict, data_tuple, data_listdef cost_time (func ): def cost (* args, ** kwargs): start = time. time () r = func (* args, ** kwargs) cost = time. time ()-start print 'find in % s cost time % s' % (r, cost) return r, cost # return data type and time consumed by method execution return cost @ cost_timedef test_find (test_data, data): for d in test_data: if d in data: pass return data. _ class __. _ name _ data_size = 100test_size = 0000000test_data = [random. randint (0, data_size) for x in xrange (test_size)] # print test_datafor data in create_data (data_size): test_find (test_data, data) output: cannot find in
 
  
Cost time 0.47200012207 find in
  
   
Cost time 0.429999828339 find in
   
    
Cost time 5.36500000954 find in
    
     
Cost time 5.53399991989
    
   
  
 

The size of the collection of 100 elements, respectively times, the gap is very obvious. However, all these random numbers can be found in the collection. Modify the random number method and generate half of them to be searched. half of them cannot be searched. From the print information, we can see that in the case of half of the worst search examples, list and tuple performance is worse.

Def randint (index, data_size): return random. randint (0, data_size) if (x % 2) = 0 else random. randint (data_size, data_size * 2) test_data = [randint (x, data_size) for x in xrange (test_size)] output: ---------------------------------------------- find in
 
  
Cost time 0.450000047684 find in
  
   
Cost time 0.397000074387 find in
   
    
Cost time 7.83299994469 find in
    
     
Cost time 8.27800011635
    
   
  
 

The number of elements increases from 10 to 500, and the time consumed by each query is counted as times. The curve consumed by the time is fitted with a graph. The result is as follows: dict and set no matter the number of elements, it is always a constant query time. as the elements increase, dict and tuple show a linear growth time:

Import matplotlib. pyplot as plotfrom numpy import * data_size = array ([x for x in xrange (10,500, 10)]) test_size = 100000cost_result = {} for size in data_size: test_data = [randint (x, size) for x in xrange (test_size)] for data in create_data (size): name, cost = test_find (test_data, data) # cost_result.setdefault (name, []) returned by the decorator function. append (cost) plot. figure (figsize = (10, 6) xline = data_sizefor data_type, result in cost_result.items (): yline = array (result) plot. plot (xline, yline, label = data_type) plot. ylabel ('Time spend') plot. xlabel ('find times ') plot. grid () plot. legend () plot. show ()

The iteration time is slightly different. dict and set consume a little more time:

@ Cost_timedef test_iter (data): for d in data: pass return data. _ class __. _ name _ data_size = array ([x for x in xrange (1, 500000,100 0)]) cost_result = {}for size in data_size: for data in create_data (size ): name, cost = test_iter (data) cost_result.setdefault (name, []). append (cost) # fit the plot. figure (figsize = (10, 6) xline = data_sizefor data_type, result in cost_result.items (): yline = array (result) plot. plot (xline, yline, label = data_type) plot. ylabel ('Time spend') plot. xlabel ('iter times ') plot. grid () plot. legend () plot. show ()

The time consumed by adding an element is shown in the figure below. The time taken to calculate the number of elements increment by 10000 is a linear growth time. no difference can be found. The tuple type cannot add new elements, so do not compare:

@ Cost_timedef test_dict_add (test_data, data): for d in test_data: data [d] = None return data. _ class __. _ name __@ cost_timedef test_set_add (test_data, data): for d in test_data: data. add (d) return data. _ class __. _ name __@ cost_timedef test_list_add (test_data, data): for d in test_data: data. append (d) return data. _ class __. _ name __# initialize data and specify the method def init_data (): test_data = {'List': (list (), test_list_add) for each type of element to be added ), 'set': (set (), test_set_add), 'dict ': (dict (), test_dict_add )} return test_data # Add time for each 10000 incremental data check data_size = array ([x for x in xrange (10000,100 0000, 10000)]) cost_result = {} for size in data_size: test_data = [x for x in xrange (size)] for data_type, (data, add) in init_data (). items (): name, cost = add (test_data, data) # return the method execution time cost_result.setdefault (data_type, []). append (cost) plot. figure (figsize = (10, 6) xline = data_sizefor data_type, result in cost_result.items (): yline = array (result) plot. plot (xline, yline, label = data_type) plot. ylabel ('Time spend') plot. xlabel ('add times ') plot. grid () plot. legend () plot. show ()

The above is a detailed description of the Python set type (list tuple dict set generator). For more information, see other related articles in the first PHP community!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.