Python standard library collections package usage tutorial, pythoncollections

Source: Internet
Author: User
Tags key string

Python standard library collections package usage tutorial, pythoncollections

Preface

Python provides four basic data structures: list, tuple, dict, and set. However, when processing large amounts of data, the four data structures are obviously too single. For example, the efficiency of inserting a list as an array is relatively low in some cases. Sometimes we also need to maintain an orderly dict. At this time, we need to use the collections package provided by the Python standard library. It provides multiple useful collection classes and is familiar with these collection classes, it not only makes the code written more Pythonic, but also improves the running efficiency of our program.

Defaultdict

defaultdict(default_factory)Default_factory is added to a common dict, so that the corresponding type of value is automatically generated when the key does not exist. The default_factory parameter can be specified as list, set, int, and other legal types.

We now have the following list. Although we have five groups of data, we can find that we only have three colors, but each color corresponds to multiple values. Now we want to convert the list into a dict. The key of the dict corresponds to a color, and the value of the dict is set to a list to store multiple values corresponding to the color. We can usedefaultdict(list)To solve this problem.

>>> from collections import defaultdict>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]>>> d = defaultdict(list)>>> for k, v in s:...  d[k].append(v)...>>> sorted(d.items())[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

The above is equivalent:

>>> d = {}>>> for k, v in s:...  d.setdefault(k, []).append(v)...>>> sorted(d.items())[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

If you do not want to include duplicate elements, consider usingdefaultdict(set). The difference between set and list is that the same element cannot exist in set.

>>> from collections import defaultdict>>> s = [('red', 1), ('blue', 2), ('red', 3), ('blue', 4), ('red', 1), ('blue', 4)]>>> d = defaultdict(set)>>> for k, v in s:...  d[k].add(v)...>>> sorted(d.items())[('blue', {2, 4}), ('red', {1, 3})]

OrderedDict

Dict before Python3.6 is unordered, but in some cases we need to keep the order of dict. In this case, OrderedDict can be used, which is a subclass of dict, however, the order type of dict is maintained on the basis of dict. Let's take a look at the usage method.

>>> # regular unsorted dictionary>>> d = {'banana': 3, 'apple': 4, 'pear': 1, 'orange': 2}>>> # dictionary sorted by key>>> OrderedDict(sorted(d.items(), key=lambda t: t[0]))OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])>>> # dictionary sorted by value>>> OrderedDict(sorted(d.items(), key=lambda t: t[1]))OrderedDict([('pear', 1), ('orange', 2), ('banana', 3), ('apple', 4)])>>> # dictionary sorted by length of the key string>>> OrderedDict(sorted(d.items(), key=lambda t: len(t[0])))OrderedDict([('pear', 1), ('apple', 4), ('orange', 2), ('banana', 3)])

Usepopitem(last=True)This method allows us to delete the key-value in dict in the order of LIFO (first-in-first-out), that is, to delete the last inserted key-value pair. If last = False, we will follow the FIFO (first-in-first-out) delete key-value in dict.

>>> d = {'banana': 3, 'apple': 4, 'pear': 1, 'orange': 2}>>> # dictionary sorted by key>>> d = OrderedDict(sorted(d.items(), key=lambda t: t[0]))>>> dOrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])>>> d.popitem()('pear', 1)>>> d.popitem(last=False)('apple', 4)

Usemove_to_end(key, last=True)To change the key-value Order of the ordered OrderedDict object. By using this method, We can insert any key-value in the ordered OrderedDict object to the beginning or end of the dictionary.

>>> d = OrderedDict.fromkeys('abcde')>>> dOrderedDict([('a', None), ('b', None), ('c', None), ('d', None), ('e', None)])>>> d.move_to_end('b')>>> dOrderedDict([('a', None), ('c', None), ('d', None), ('e', None), ('b', None)])>>> ''.join(d.keys())'acdeb'>>> d.move_to_end('b', last=False)>>> ''.join(d.keys())'bacde'

Deque

The advantage of using list to store data is that you can quickly search for elements by index, but inserting and deleting elements is slow because list is implemented based on arrays. Deque is a two-way list for efficient insert and delete operations. It is suitable for queues and stacks and thread security.

List only provides append and pop methods to insert/delete elements from the end of the list. deque adds appendleft/popleft and other methods to allow us to efficiently insert/delete elements at the beginning of the element. In addition, the algorithm complexity of using deque to append or pop elements on both ends of the queue is about O (1). However, operations on the list object to change the list length and data location are as follows:pop(0)Andinsert(0, v)The operation complexity is as high as O (n ).

>>> from collections import deque>>> dq = deque(range(10), maxlen=10)>>> dqdeque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)>>> dq.rotate(3)>>> dqdeque([7, 8, 9, 0, 1, 2, 3, 4, 5, 6], maxlen=10)>>> dq.rotate(-4)>>> dqdeque([1, 2, 3, 4, 5, 6, 7, 8, 9, 0], maxlen=10)>>> dq.appendleft(-1)>>> dqdeque([-1, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)>>> dq.extend([11, 22, 33])>>> dqdeque([3, 4, 5, 6, 7, 8, 9, 11, 22, 33], maxlen=10)>>> dq.extendleft([10, 20, 30, 40])>>> dqdeque([40, 30, 20, 10, 3, 4, 5, 6, 7, 8], maxlen=10)

Counter

Count is used to Count the number of occurrences of related elements.

>>> from collections import Counter>>> ct = Counter('abracadabra')>>> ctCounter({'a': 5, 'r': 2, 'b': 2, 'd': 1, 'c': 1})>>> ct.update('aaaaazzz')>>> ctCounter({'a': 10, 'z': 3, 'r': 2, 'b': 2, 'd': 1, 'c': 1})>>> ct.most_common(2)[('a', 10), ('z', 3)]>>> ct.elements()<itertools.chain object at 0x7fbaad4b44e0>

Namedtuple

Usenamedtuple(typename, field_names)Name the elements in tuple to make the program more readable.

>>> from collections import namedtuple>>> City = namedtuple('City', 'name country population coordinates')>>> tokyo = City('Tokyo', 'JP', 36.933, (35.689722, 139.691667))>>> tokyoCity(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722, 139.691667))>>> tokyo.population36.933>>> tokyo.coordinates(35.689722, 139.691667)>>> tokyo[1]'JP'
>>> City._fields('name', 'country', 'population', 'coordinates')>>> LatLong = namedtuple('LatLong', 'lat long')>>> delhi_data = ('Delhi NCR', 'IN', 21.935, LatLong(28.613889, 77.208889))>>> delhi = City._make(delhi_data)>>> delhi._asdict()OrderedDict([('name', 'Delhi NCR'), ('country', 'IN'), ('population', 21.935),   ('coordinates', LatLong(lat=28.613889, long=77.208889))])>>> for key, value in delhi._asdict().items():  print(key + ':', value)name: Delhi NCRcountry: INpopulation: 21.935coordinates: LatLong(lat=28.613889, long=77.208889)

ChainMap

ChainMap can be used to merge multiple dictionaries.

>>> from collections import ChainMap>>> d = ChainMap({'zebra': 'black'}, {'elephant': 'blue'}, {'lion': 'yellow'})>>> d['lion'] = 'orange'>>> d['snake'] = 'red'>>> dChainMap({'lion': 'orange', 'zebra': 'black', 'snake': 'red'},   {'elephant': 'blue'}, {'lion': 'yellow'})
>>> del d['lion']>>> del d['elephant']Traceback (most recent call last): File "/usr/lib/python3.5/collections/__init__.py", line 929, in __delitem__ del self.maps[0][key]KeyError: 'elephant'During handling of the above exception, another exception occurred:Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.5/collections/__init__.py", line 931, in __delitem__ raise KeyError('Key not found in the first mapping: {!r}'.format(key))KeyError: "Key not found in the first mapping: 'elephant'"

From abovedel['elephant']The error message can be seen that the ChainMap operation for changing the key value will only be in the first dictionaryself.maps[0][key]For search, the newly added key-value pairs will also be added to the first dictionary. Let's improve ChainMap to solve this problem:

class DeepChainMap(ChainMap): 'Variant of ChainMap that allows direct updates to inner scopes' def __setitem__(self, key, value):  for mapping in self.maps:   if key in mapping:    mapping[key] = value    return  self.maps[0][key] = value def __delitem__(self, key):  for mapping in self.maps:   if key in mapping:    del mapping[key]    return  raise KeyError(key)>>> d = DeepChainMap({'zebra': 'black'}, {'elephant': 'blue'}, {'lion': 'yellow'})>>> d['lion'] = 'orange'   # update an existing key two levels down>>> d['snake'] = 'red'   # new keys get added to the topmost dict>>> del d['elephant']   # remove an existing key one level downDeepChainMap({'zebra': 'black', 'snake': 'red'}, {}, {'lion': 'orange'})

You can use new_child to deepcopy A ChainMap:

>>> from collections import ChainMap>>> a = {'a': 'A', 'c': 'C'}>>> b = {'b': 'B', 'c': 'D'}>>> m = ChainMap({'a': 'A', 'c': 'C'}, {'b': 'B', 'c': 'D'})>>> mChainMap({'a': 'A', 'c': 'C'}, {'b': 'B', 'c': 'D'})>>> m['c']'C'>>> m.maps[{'c': 'C', 'a': 'A'}, {'c': 'D', 'b': 'B'}]>>> a['c'] = 'E'>>> m['c']'E'>>> mChainMap({'c': 'E', 'a': 'A'}, {'c': 'D', 'b': 'B'})
>>> m2 = m.new_child()>>> m2['c'] = 'f'>>> m2ChainMap({'c': 'f'}, {'c': 'E', 'a': 'A'}, {'c': 'D', 'b': 'B'})>>> mChainMap({'c': 'E', 'a': 'A'}, {'c': 'D', 'b': 'B'})>>> m2.parentsChainMap({'c': 'E', 'a': 'A'}, {'c': 'D', 'b': 'B'})

UserDict

Next we will improve the dictionary and convert the key into the str format when querying the dictionary:

class StrKeyDict0(dict): def __missing__(self, key):  if isinstance(key, str):   raise KeyError(key)  return self[str(key)] def get(self, key, default=None):  try:   return self[key]  except KeyError:   return default def __contains__(self, key):  return key in self.keys() or str(key) in self.keys()

Explain the above program:

  • In _ missing _, isinstance (key, str) is required. Please think about why? If a key does not exist, this will cause infinite recursion. self [str (key)] Will call _ getitem _ again __.
  • _ Contains _ is also required, because k in d will be called, but note that even if the search fails, it will not call _ missing __. Another detail about _ contains _ is that we have not usedk in my_dictBecausestr(key) in selfBecause it causes recursive call _ contains __.

In Python2.x, dict. keys () returns a list, which means that k in my_list must traverse the list. In Python3.x, dict. keys () is optimized to provide higher performance. It returns a view like set. For more information, see the official documentation.

The preceding example can be rewritten using UserDict, and all keys are stored in the str format. This method is more commonly used and concise:

import collectionsclass StrKeyDict(collections.UserDict): def __missing__(self, key):  if isinstance(key, str):   raise KeyError(key)  return self[str(key)] def __contains__(self, key):  return str(key) in self.data def __setitem__(self, key, item):  self.data[str(key)] = item

UserDict is a subclass of MutableMapping and Mapping. It inherits MutableMapping. update and Mapping. get two important methods, so we didn't overwrite the get method above. We can see in the source code that its implementation is similar to the above implementation.

Summary

The above is all the content of this article. I hope the content of this article will help you in your study or work. If you have any questions, please leave a message. Thank you for your support.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.