Python Collections module's Counter container class usage tutorial, pythoncollections

Source: Internet
Author: User

Python Collections module's Counter container class usage tutorial, pythoncollections

1. collections Module

The collections module has been introduced since Python 2.4. It contains some special container types other than dict, set, list, And tuple, which are:

OrderedDict class: Sorting dictionary, which is a subclass of the dictionary. Introduced from 2.7.
Namedtuple () function: Name tuple, which is a factory function. Introduced from 2.6.
Counter class: it refers to the hashable object count and is a sub-class of the dictionary. Introduced from 2.7.
Deque: bidirectional queue. Introduced from 2.4.
Defaultdict: Create a dictionary using a factory function so that you do not need to consider the missing dictionary key. Introduced from 2.5.
For more information, see http://docs.python.org/2/library/collections.html.

2. Counter class

The Counter class is used to track the number of occurrences of a value. It is an unordered container type, which is stored in the form of dictionary key-value pairs, where elements are used as keys and Their counts are used as values. The Count value can be any integer (including 0 and negative ). Counter classes are similar to bags or multisets in other languages.

2.1 create

The following code describes four methods for creating the Counter class:

Create Python for Counter class

>>> C = Counter () # create an empty Counter class >>> c = Counter ('gallahad ') # create an iterable object (list, tuple, dict, String, etc.)> c = Counter ({'A': 4, 'B': 2 }) # create from a dictionary Object> c = Counter (a = 4, B = 2) # create from a group of key-value pairs> c = Counter () # create an empty Counter class> c = Counter ('gallahad') # from an iterable object (list, tuple, dict, String, etc) create >>> c = Counter ({'A': 4, 'B': 2}) # create from a dictionary object >>> c = Counter (a = 4, B = 2) # create a key-Value Pair
2.2 Count value access and missing keys

If the accessed key does not exist, 0 instead of KeyError is returned; otherwise, its count is returned.

Python

>>> c = Counter("abcdefgab")>>> c["a"]2>>> c["c"]1>>> c["h"]0>>> c = Counter("abcdefgab")>>> c["a"]2>>> c["c"]1>>> c["h"]0

2.3 update and subtract)

You can use one iterable object or another Counter object to update the key value.

Counter updates include increase and decrease. The update () method is added:

Update Python

>>> C = Counter ('which ') >>> c. update ('witch ') # Use another iterable object to update >>> c ['H'] 3 >>> d = Counter ('watch') >>> c. update (d) # update with another Counter Object> c ['H'] 4> c = Counter ('which ')> c. update ('witch ') # Use another iterable object to update >>> c ['H'] 3 >>> d = Counter ('watch') >>> c. update (d) # update with another Counter Object> c ['H'] 4

 
To reduce the number, use the subtract () method:

Subtract Python

>>> C = Counter ('which ') >>> c. subtract ('witch ') # Use another iterable object to update >>> c ['H'] 1 >>> d = Counter ('Watch') >>> c. subtract (d) # Use another Counter object to update >>> c ['a']-1 >>> c = Counter ('which ') >>> c. subtract ('witch ') # Use another iterable object to update >>> c ['H'] 1 >>> d = Counter ('Watch') >>> c. subtract (d) # update with another Counter Object> c ['a']-1

Delete the 2.4 Key

When the Count value is 0, it does not mean that the element is deleted. del should be used to delete the element.

Delete Python

>>> c = Counter("abcdcba")>>> cCounter({'a': 2, 'c': 2, 'b': 2, 'd': 1})>>> c["b"] = 0>>> cCounter({'a': 2, 'c': 2, 'd': 1, 'b': 0})>>> del c["a"]>>> cCounter({'c': 2, 'b': 2, 'd': 1})>>> c = Counter("abcdcba")>>> cCounter({'a': 2, 'c': 2, 'b': 2, 'd': 1})>>> c["b"] = 0>>> cCounter({'a': 2, 'c': 2, 'd': 1, 'b': 0})>>> del c["a"]>>> cCounter({'c': 2, 'b': 2, 'd': 1})

 
2.5 elements ()

Returns an iterator. The number of times that an element has been repeated. All elements are sorted alphabetically. Elements smaller than 1 are not included.

Elements () method Python >>> c = Counter (a = 4, B = 2, c = 0, d =-2) >>> list (c. elements () ['A', 'B']> c = Counter (a = 4, B = 2, c = 0, d =-2)> list (c. elements () ['A', 'B']

2.6 most_common ([n])

Returns a TopN list. If n is not specified, all elements are returned. When the Count values of multiple elements are the same, they are sorted alphabetically.

Most_common () method Python

>>> c = Counter('abracadabra')>>> c.most_common()[('a', 5), ('r', 2), ('b', 2), ('c', 1), ('d', 1)]>>> c.most_common(3)[('a', 5), ('r', 2), ('b', 2)]>>> c = Counter('abracadabra')>>> c.most_common()[('a', 5), ('r', 2), ('b', 2), ('c', 1), ('d', 1)]>>> c.most_common(3)[('a', 5), ('r', 2), ('b', 2)]

2.7 fromkeys

Unimplemented class methods.

2.8 light copy

CopyPython

>>> c = Counter("abcdcba")>>> cCounter({'a': 2, 'c': 2, 'b': 2, 'd': 1})>>> d = c.copy()>>> dCounter({'a': 2, 'c': 2, 'b': 2, 'd': 1})>>> c = Counter("abcdcba")>>> cCounter({'a': 2, 'c': 2, 'b': 2, 'd': 1})>>> d = c.copy()>>> dCounter({'a': 2, 'c': 2, 'b': 2, 'd': 1})

2.9 arithmetic and Set Operations

+,-, &, | Operations can also be used for Counter. The & | operation returns the minimum and maximum values of each element in two Counter objects. Note that the obtained Counter object will delete elements smaller than 1.

Counter object arithmetic and Set Operations Python

>>> C = Counter (a = 3, B = 1) >>> d = Counter (a = 1, B = 2) >>> c + d # c [x] + d [x] Counter ({'A': 4, 'B': 3}) >>> c-d # subtract (only elements of positive count are retained) Counter ({'A': 2}) >>> c & d # intersection: min (c [x], d [x]) Counter ({'A': 1, 'B': 1}) >>> c | d # Union: max (c [x], d [x]) Counter ({'A': 3, 'B': 2}) >>> c = Counter (a = 3, B = 1) >>> d = Counter (a = 1, B = 2)> c + d # c [x] + d [x] Counter ({'A': 4, 'B': 3}) >>> c-d # subtract (only elements of positive count are retained) Counter ({'A': 2}) >>> c & d # intersection: min (c [x], d [x]) Counter ({'A': 1, 'B': 1}) >>> c | d # Union: max (c [x], d [x]) Counter ({'A': 3, 'B': 2 })

3. Common Operations

The following are some common Counter operations, which are from the official Python documentation.

Common Counter operations Python

Sum (c. values () # The total number of all counts c. clear () # reset the Counter object, note not to delete list (c) # convert the key in c to the list set (c) # convert the key in c to setdict (c) # convert a key-Value Pair in c to a dictionary c. items () # convert to (elem, cnt) format list Counter (dict (list_of_pairs) # convert from (elem, cnt) format list to Counter Class Object c. most_common () [:-n:-1] # retrieve the n elements with the minimum Count c ++ = Counter () # Remove the 0 and negative sum (c. values () # The total number of all counts c. clear () # reset the Counter object, note not to delete list (c) # convert the key in c to the list set (c) # convert the key in c to setdict (c) # convert a key-Value Pair in c to a dictionary c. items () # convert to (elem, cnt) format list Counter (dict (list_of_pairs) # convert from (elem, cnt) format list to Counter Class Object c. most_common () [:-n:-1] # retrieve the n elements with the minimum Count c ++ = Counter () # Remove 0 and negative values.

4. Instance
4.1 determine whether two strings are in the same order of changing the letter set)

def is_anagram(word1, word2):  """Checks whether the words are anagrams.  word1: string  word2: string  returns: boolean  """  return Counter(word1) == Counter(word2)

Counter if the input parameter is a string, it counts the number of occurrences of each character in the string. If the two strings are in reverse order from the same letter set, then their Counter results should be the same.

Multi-Dataset)
Multiset is a set of identical elements that can appear multiple times. Counter can be used to represent multiset very naturally. In addition, Counter can be extended to have some set operations, such as is_subset.

class Multiset(Counter):  """A multiset is a set where elements can appear more than once."""  def is_subset(self, other):    """Checks whether self is a subset of other.    other: Multiset    returns: boolean    """    for char, count in self.items():      if other[char] < count:        return False    return True  # map the <= operator to is_subset  __le__ = is_subset

4.3 probability Mass Function
The probability mass function (pmf) is the probability of Discrete Random Variables on specific values. Counter can be used to represent the probability mass function.

class Pmf(Counter):  """A Counter with probabilities."""  def normalize(self):    """Normalizes the PMF so the probabilities add to 1."""    total = float(sum(self.values()))    for key in self:      self[key] /= total  def __add__(self, other):    """Adds two distributions.    The result is the distribution of sums of values from the    two distributions.    other: Pmf    returns: new Pmf    """    pmf = Pmf()    for key1, prob1 in self.items():      for key2, prob2 in other.items():        pmf[key1 + key2] += prob1 * prob2    return pmf  def __hash__(self):    """Returns an integer hash value."""    return id(self)  def __eq__(self, other):    return self is other  def render(self):    """Returns values and their probabilities, suitable for plotting."""    return zip(*sorted(self.items()))

Normalize: normalize the probability of occurrence of random variables so that they are equal to 1
Add: returns a new probability mass function that combines the distribution of two random variables.
Render: returns the combined (value, probability) pairs sorted by values to facilitate usage during painting.
The following uses the dice (ps: This actually refers to tou ...) As an example.

d6 = Pmf([1,2,3,4,5,6])d6.normalize()d6.name = 'one die'print(d6)Pmf({1: 0.16666666666666666, 2: 0.16666666666666666, 3: 0.16666666666666666, 4: 0.16666666666666666, 5: 0.16666666666666666, 6: 0.16666666666666666})

Using add, we can calculate the distribution of two dice and:

d6_twice = d6 + d6d6_twice.name = 'two dices'for key, prob in d6_twice.items():  print(key, prob)

With numpy. sum, we can directly calculate the distribution of Three dice and:

import numpy as npd6_thrice = np.sum([d6]*3)d6_thrice.name = 'three dices'

Finally, you can use render to return the result and use matplotlib to draw the result:

for die in [d6, d6_twice, d6_thrice]:  xs, ys = die.render()  pyplot.plot(xs, ys, label=die.name, linewidth=3, alpha=0.5)pyplot.xlabel('Total')pyplot.ylabel('Probability')pyplot.legend()pyplot.show()

The result is as follows:

4.4 Bayesian statistics
We continue to use the example of dice to illustrate how Counter can implement Bayesian statistics. Now suppose there are five different dice in a box: four, six, eight, 12, and 20. Suppose we randomly extract a dice from the box, and the number of points of the dice is 6. So what is the probability of getting the five different dice?
(1) first, we need to generate the probability Quality Function for each dice:

def make_die(num_sides):  die = Pmf(range(1, num_sides+1))  die.name = 'd%d' % num_sides  die.normalize()  return diedice = [make_die(x) for x in [4, 6, 8, 12, 20]]print(dice)

(2) Next, define an abstract class Suite. Suite is a probability mass function that represents a set of assumptions and their probability distributions. The Suite class contains a bayesian_update function to update the probability of a hypothesis (hypotheses) based on new data.

class Suite(Pmf):  """Map from hypothesis to probability."""  def bayesian_update(self, data):    """Performs a Bayesian update.    Note: called bayesian_update to avoid overriding dict.update    data: result of a die roll    """    for hypo in self:      like = self.likelihood(data, hypo)      self[hypo] *= like    self.normalize()

The likelihood function is inherited by each class and implements different calculation methods by itself.

(3) define the DiceSuite class, which inherits the class Suite.

class DiceSuite(Suite):  def likelihood(self, data, hypo):    """Computes the likelihood of the data under the hypothesis.    data: result of a die roll    hypo: Die object    """    return hypo[data]

And the likelihood function is implemented. The two parameters passed in are: data: the number of points thrown by the observed dice. In this example, 6 hypo: the dice that may be thrown

(4) Pass the dice created in step 1 to DiceSuite, and then obtain the corresponding result based on the given value.

dice_suite = DiceSuite(dice)dice_suite.bayesian_update(6)for die, prob in sorted(dice_suite.items()):  print die.name, probd4 0.0d6 0.392156862745d8 0.294117647059d12 0.196078431373d20 0.117647058824

As expected, the probability of the four faces of the dice is 0 (because the points of the four faces may only be 0 ~ 4), while 6 and 8 faces have the highest probability. Now, if we throw another dice and the number of points this time appears is 8, re-calculate the probability:

dice_suite.bayesian_update(8)for die, prob in sorted(dice_suite.items()):  print die.name, probd4 0.0d6 0.0d8 0.623268698061d12 0.277008310249d20 0.0997229916898

Now we can see that the dice with six faces are also excluded. The dice with eight faces are the most likely.
The examples above demonstrate the usefulness of Counter. In practice, Counter is rarely used. If it can be used properly, it will bring a lot of convenience.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.