Describes Python iterators, generators, and related itertools packages.
For mathematicians, The Python language attracts many people. For example, the tuple, lists, sets, and other containers are supported by symbol marking similar to traditional mathematics, the list derivation is similar to the set-builder notation syntax.
The iterator (iterator), generator (generator) and related itertools packages in Python are also a feature that attracts mathematics enthusiasts. These tools help people easily write and process such as infinite sequence, stochastic processes, recurrence relations, and combinatorial structures) and other mathematical objects. This article will cover some of my notes on the iterator and generator, and some of my accumulated experience in the learning process.
Iterators
Iterator is an object that can be accessed by iteration on a set. In this way, you do not need to load all the sets into the memory. Because of this, such set elements can be almost infinite. You can find related documents in the Iterator Type section of the Python official document.
Let's make the definition description more accurate. If an object defines the _ iter _ method, and this method needs to return an iterator, this object is an iterable object ). An iterator is an object that implements the _ iter _ and next (_ next _ in Python 3) methods. The former returns an iterator object, the latter returns the next set element of the iteration process. As far as I know, iterators always return self in the _ iter _ method, because they are just their own iterators.
Generally, you should avoid directly calling the _ iter _ and next methods. Instead, you should use for or list comprehension. In this way, Python can automatically call these two methods for you. If you need to call them manually, use the Python built-in functions iter and next, and pass the target iterator objects or collection objects as parameters to them. For example, if c is an iteratable object, you can use iter (c) for access instead of c. _ iter _ (). Similarly, if a is an iterator object, use next (a) instead of. next () to access the next element. Similar to len.
Speaking of len, it is worth noting that there is no need to tangle the length definition for the iterator. Therefore, they usually do not implement the _ len _ method. If you need to calculate the container length, you must manually calculate it or use sum. At the end of this article, an example is provided after the itertools module.
Some iteratable objects use other objects as iterators instead of iterators. For example, a list object is an iteratable object, but it is not an iterator (it implements _ iter _ but does not implement next ). The following example shows how list uses the listiterator iterator. At the same time, it is worth noting that the list defines the length attribute, but the listiterator does not.
>>> a = [1, 2]>>> type(a)<type 'list'>>>> type(iter(a))<type 'listiterator'>>>> it = iter(a)>>> next(it)1>>> next(it)2>>> next(it)Traceback (most recent call last): File "<stdin>", line 1, in <module>StopIteration>>> len(a)2>>> len(it)Traceback (most recent call last): File "<stdin>", line 1, in <module>TypeError: object of type 'listiterator' has no len()
When the iteration ends but is still accessed by the iteration, the Python interpreter throws a StopIteration exception. However, as mentioned above, the iterator can iterate an infinite set, so the user must be responsible for this iterator to ensure that it will not create an infinite loop. See the following example:
class count_iterator(object): n = 0 def __iter__(self): return self def next(self): y = self.n self.n += 1 return y
The following is an example. Note that the last row tries to convert an iterator object to a list, which leads to an infinite loop because the iterator object will not stop.
>>> counter = count_iterator()>>> next(counter)0>>> next(counter)1>>> next(counter)2>>> next(counter)3>>> list(counter) # This will result in an infinite loop!
Finally, we will modify the above program: if an object does not have the _ iter _ method but defines the _ getitem _ method, the object is still iteratable. In this case, when the Python built-in function iter returns an iterator type corresponding to this object, and uses the _ getitem _ method to traverse all the elements of the list. If a StopIteration or IndexError exception is thrown, iteration stops. Let's take a look at the following example:
class SimpleList(object): def __init__(self, *items): self.items = items def __getitem__(self, i): return self.items[i]
Usage:
>>> a = SimpleList(1, 2, 3)>>> it = iter(a)>>> next(it)1>>> next(it)2>>> next(it)3>>> next(it)Traceback (most recent call last): File "<stdin>", line 1, in <module>StopIteration
Now let's look at a more interesting example: generate the Hofstadter Q sequence using the iterator based on the initial conditions. Hofstadter in his book "G? Del, Escher, Bach: An Eternal Golden Braid mentioned this nested sequence for the first time, and since then the question of proving that the sequence was true for all n began. The following code uses an iterator to generate the Hofstadter sequence of a given n, which is defined as follows:
Q(n)=Q(n-Q(n-1))+Q(n?Q(n?2))
Given an initial condition, for example, qsequence ([1, 1]) will generate the H sequence. The StopIteration exception is used to indicate that the sequence cannot be generated again, because a legal subscript index is required to generate the next element. For example, if the initial condition is [1, 2], generation of the sequence stops immediately.
class qsequence(object): def __init__(self, s): self.s = s[:] def next(self): try: q = self.s[-self.s[-1]] + self.s[-self.s[-2]] self.s.append(q) return q except IndexError: raise StopIteration() def __iter__(self): return self def current_state(self): return self.s
Usage:
>>> Q = qsequence([1, 1])>>> next(Q)2>>> next(Q)3>>> [next(Q) for __ in xrange(10)][3, 4, 5, 5, 6, 6, 6, 8, 8, 8]
Generators
Generator is a Generator defined by a simpler function expression. The yield expression is used in the generator. The generator does not use the return value, but uses the yield expression to return the result if needed. The internal mechanism of Python helps to remember the context of the current generator, that is, the current control flow and the value of local variables. Each time the generator is called, yield is applied to return the next value in the iteration process. The _ iter _ method is implemented by default, meaning that generators can be used wherever the iterator can be used. The functions implemented in the following example are the same as those of the surface iterator, but the code is more compact and readable.
def count_generator(): n = 0 while True: yield n n += 1
Let's take a look at the usage:
>>> counter = count_generator()>>> counter<generator object count_generator at 0x106bf1aa0>>>> next(counter)0>>> next(counter)1>>> iter(counter)<generator object count_generator at 0x106bf1aa0>>>> iter(counter) is counterTrue>>> type(counter)<type 'generator'>
Now let's try to use a generator to implement the Hofstadter's Q queue. This implementation is very simple, but we cannot implement functions similar to current_state. As far as I know, it is impossible to directly access the variable state inside the generator from outside, so functions such as current_state cannot be implemented (although data structures such as gi_frame.f_locals can be implemented, however, this is a special implementation of CPython and is not the standard part of the language, so it is not recommended ). If you need to access internal variables, one possible method is to return all results through yield. I will leave this question for exercise.
def hofstadter_generator(s): a = s[:] while True: try: q = a[-a[-1]] + a[-a[-2]] a.append(q) yield q except IndexError: return
Note that there is a simple return statement at the end of the generator iteration process, but no data is returned. Internally, this throws a StopIteration exception.
The next example is from the interview questions of Groupon. Here we first use two generators to implement the Bernoulli process. This process is an infinite sequence of random boolean values. The probability of True is p, and the probability of False is q = 1-p. Then a von norann extractor is implemented, which obtains the input (0 <p <1) from Bernoulli process and returns another Bernoulli process (p = 0.5 ).
import random def bernoulli_process(p): if p > 1.0 or p < 0.0: raise ValueError("p should be between 0.0 and 1.0.") while True: yield random.random() < p def von_neumann_extractor(process): while True: x, y = process.next(), process.next() if x != y: yield x
Finally, the generator is a very useful tool for generating random dynamic systems. The following example demonstrates how the famous tent map dynamic system is implemented through a generator. (Insert a question to see how the inaccuracy of a value starts to associate with changes and increases exponentially. This is a key feature of a dynamic system such as tent ing ).
>>> def tent_map(mu, x0):... x = x0... while True:... yield x... x = mu * min(x, 1.0 - x)...>>>>>> t = tent_map(2.0, 0.1)>>> for __ in xrange(30):... print t.next()...0.10.20.40.80.40.80.40.80.40.80.40.80.40.80.40.80.40.7999999999990.4000000000010.8000000000030.3999999999940.7999999999880.4000000000230.8000000000470.3999999999070.7999999998140.4000000003730.8000000007450.399999998510.79999999702
Another similar example is the Collatz sequence.
def collatz(n): yield n while n != 1: n = n / 2 if n % 2 == 0 else 3 * n + 1 yield n
In this example, the StopIteration exception is not manually thrown because it will be automatically thrown when the control flow reaches the end of the function.
See Usage:
>>> # If the Collatz conjecture is true then list(collatz(n)) for any n will... # always terminate (though your machine might run out of memory first!)>>> list(collatz(7))[7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]>>> list(collatz(13))[13, 40, 20, 10, 5, 16, 8, 4, 2, 1]>>> list(collatz(17))[17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]>>> list(collatz(19))[19, 58, 29, 88, 44, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
Recursive Generators
The generator can be recursive like other functions. Let's take a look at the self-implemented simple version of itertools. permutations. This generator generates a full arrangement by giving an item list (in practice, use itertools. permutations, which is faster ). The basic idea is simple: For each element in the list, we place it in the first position by exchanging it with the first element in the list, and then re-recursively arrange the remaining parts of the list.
def permutations(items): if len(items) == 0: yield [] else: pi = items[:] for i in xrange(len(pi)): pi[0], pi[i] = pi[i], pi[0] for p in permutations(pi[1:]): yield [pi[0]] + p
>>> for p in permutations([1, 2, 3]):... print p...[1, 2, 3][1, 3, 2][2, 1, 3][2, 3, 1][3, 1, 2][3, 2, 1]
Generator Expressions
Generator expressions allow you to define a generator through a simple, single-line declaration. This is very similar to the list derivation in Python. For example, the following code defines the full square of a generator iteration. Note that the returned result of the generator expression is a generator type object, which implements the next and _ iter _ methods.
>>> g = (x ** 2 for x in itertools.count(1))>>> g<generator object <genexpr> at 0x1029a5fa0>>>> next(g)1>>> next(g)4>>> iter(g)<generator object <genexpr> at 0x1029a5fa0>>>> iter(g) is gTrue>>> [g.next() for __ in xrange(10)][9, 16, 25, 36, 49, 64, 81, 100, 121, 144]
You can also use the generator expression to implement the Bernoulli process. In this example, p = 0.4. If a generator expression requires another iterator as a loop indicator and this generator expression is used on an infinite sequence, itertools. count is a good choice. If not, xrange is a good choice.
>>> g = (random.random() < 0.4 for __ in itertools.count())>>> [g.next() for __ in xrange(10)][False, False, False, True, True, False, True, False, False, True]
As mentioned above, generator expressions can be used wherever the iterator is needed as a parameter. For example, we can use the following code to calculate the sum of the top 10 full partitions:
>>> sum(x ** 2 for x in xrange(10))285
More examples of generator expressions are provided in the next section.
Itertools Module
The itertools module provides a series of iterators that help you easily use arrays, combinations, cartesian products, or other composite structures.
Before starting the following sections, I noticed that all the code given above is not optimized. Here, it serves only as an example. In practice, you should avoid arranging and combining yourself unless you have a better idea, because the number of enumerations increases exponentially.
Let's start with some interesting use cases. The first example shows how to write a common pattern: cyclically traverse all subscript elements of a three-dimensional array, and cyclically traverse all subscript elements that meet the 0 ≤ I <j <k ≤ n condition.
from itertools import combinations, product n = 4d = 3 def visit(*indices): print indices # Loop through all possible indices of a 3-D arrayfor i in xrange(n): for j in xrange(n): for k in xrange(n): visit(i, j, k) # Equivalent using itertools.productfor indices in product(*([xrange(n)] * d)): visit(*indices) # Now loop through all indices 0 <= i < j < k <= nfor i in xrange(n): for j in xrange(i + 1, n): for k in xrange(j + 1, n): visit(i, j, k) # And equivalent using itertools.combinationsfor indices in combinations(xrange(n), d): visit(*indices)
The itertools module provides two benefits: code can be completed within a single row and can easily be extended to a higher dimension. I have not compared the performance of the for and itertools methods, maybe it has a lot to do with n. If you want to, test and judge it by yourself.
The second example shows some interesting mathematical problems: Use generator expressions, itertools. combinations, and itertools. permutations to calculate the number of sorted orders, and calculate the sum of the total number of sorted orders in a list. As shown in OEIS A001809, the sum result approaches n! N (n-1)/4. In actual use, using this formula is more efficient than the above code, but I wrote this example to practice the use of the itertools enumerator.
import itertoolsimport math def inversion_number(A): """Return the number of inversions in list A.""" return sum(1 for x, y in itertools.combinations(xrange(len(A)), 2) if A[x] > A[y]) def total_inversions(n): """Return total number of inversions in permutations of n.""" return sum(inversion_number(A) for A in itertools.permutations(xrange(n)))
The usage is as follows:
>>> [total_inversions(n) for n in xrange(10)][0, 0, 1, 9, 72, 600, 5400, 52920, 564480, 6531840] >>> [math.factorial(n) * n * (n - 1) / 4 for n in xrange(10)][0, 0, 1, 9, 72, 600, 5400, 52920, 564480, 6531840]
The third example is to use the brute-force counting method to calculate the recontres number. The recontres number is defined here. First, we write a function that uses the generator expression to calculate the number of fixed points in the arrangement during the sum process. Then, use itertools. permutations and other generator expressions in the sum to calculate the total number of arrays containing n and k fixed points. Then get the result. Of course, this implementation method is inefficient and is not recommended for practical applications. Again, this is just to conceal the examples of generator expressions and itertools-related functions.
def count_fixed_points(p): """Return the number of fixed points of p as a permutation.""" return sum(1 for x in p if p[x] == x) def count_partial_derangements(n, k): """Returns the number of permutations of n with k fixed points.""" return sum(1 for p in itertools.permutations(xrange(n)) if count_fixed_points(p) == k)
Usage:
# Usage:>>> [count_partial_derangements(6, i) for i in xrange(7)][265, 264, 135, 40, 15, 0, 1]