Detailed Python iterators, generators, and related Itertools packages

Detailed Python iterators, generators, and related Itertools packages _python

Last Update:2017-01-19 Source: Internet

Author: User

Tags data structures generator stdin

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

For mathematicians, the language of Python has a lot to attract them. For a few examples: support for containers such as tuple, lists, and sets uses symbolic notation similar to traditional mathematics, as well as a list derivation, which is similar to the grammatical structure of mathematical set-derived and set-structured (Set-builder-notation).

Other features that appeal to math enthusiasts are iterator (iterators), generator (generators), and associated itertools packages in Python. These tools help people to easily write out processes such as infinite sequences (infinite sequence), random processes (stochastic processes), recursive relationships (recurrence relations), and composite structures ( Combinatorial structures) and other mathematical objects of elegant code. This article will cover my notes on iterators and generators, and some of the relevant experience I've accumulated in my learning process.
iterators

An iterator (iterator) is an object that can iterate over a collection. In this way, there is no need to load the collection into memory, and for this reason, this collection element can be almost infinite. You can find documents in the "Iterator type (iterator type)" section of the Python official document.

Let's be more precise with the description of the definition, if an object defines the __iter__ method, and this method needs to return an iterator, then the object is an iterative (iterable). An iterator is an object that implements the __iter__ and the two methods of next (__next__ in Python 3), which returns an Iterator object, which returns the next collection element of the iteration. As far as I know, iterators always simply return themselves (self) in the __iter__ method, because they are their own iterators.

In general, you should avoid calling __iter__ and next methods directly. Instead, you should use a for or list-comprehension, so that Python can automatically invoke both methods for you. If you need to call them manually, use the Python built-in function iter and next, and pass the target iterator objects or collection objects as arguments to them. For example, if C is an iterative object, then you can use ITER (c) to access, rather than c.__iter__ (), and similarly, if a is an iterator object, use Next (a) instead of A.next () to access the next element. It is similar to the use of Len.

When it comes to Len, it's worth noting that there is no need to dwell on the definition of length for iterators. So they don't usually implement __len__ methods. If you need to calculate the length of the container, you must either manually calculate it or use sum. At the end of this article, an example is given after the Itertools module.

Some iterations are not iterators, but instead use other objects as iterators. For example, the list object is an iterative object, but not an iterator (it implements __ITER__ but does not implement next). In the following example you can see how the list is Listiterator using iterators. It is also noteworthy that the list defines the length property well, but Listiterator does not.

>>> a = [1, 2]
>>> type (a)
<type ' list ' >
>>> type (ITER (a))
<type ' Listiterator ' >
>>> it = iter (a)
>>> next (IT)
1
>>> Next (IT)
2
>>> Next (IT)
Traceback (most recent):
 File "<stdin>", line 1, in <module>
stopiteration
> >> Len (a)
2
>>> len (it)
Traceback (most recent call last):
 File "<stdin>", Line 1, in <module>
typeerror:object of type ' Listiterator ' has no Len ()

The Python interpreter throws a Stopiteration exception when the iteration ends but continues to iterate over the access. However, as mentioned in the preceding paragraph, the iterator can iterate over an infinite set, so for this iterator it must be the user's responsibility to ensure that no infinite loops occur, see the following example:

Class Count_iterator (object):
  n = 0
 
  def __iter__ (self): return
    self
 
  def next (self):
    y = SELF.N
    SELF.N + 1 return
    y

The following is an example, noting that the last line attempts to convert an iterator object to a list, which results in an infinite loop, because the iterator object will not stop.

>>> counter = count_iterator ()
>>> Next (counter)
0
>>> Next (counter)
1
>>> Next (counter)
2
>>> Next (counter)
3
>>> List (counter) # this A infinite loop!

Finally, we will modify the above program: If an object has no __iter__ method but defines the __getitem__ method, then the object is still iterative. In this case, when Python's built-in function iter will return a corresponding iterator type for this object, and use the __getitem__ method to traverse all the elements of the list. If the stopiteration or Indexerror exception is thrown, the iteration stops. Let's take a look at the following examples:

Class SimpleList (object):
  def __init__ (self, *items):
    self.items = Items
 
  def __getitem__ (self, i): Return
    Self.items[i]

Usage here:

>>> a = SimpleList (1, 2, 3)
>>> it = iter (a)
>>> next (IT)
1
>>> Next (IT)
2
>>> Next (IT)
3
>>> Next (IT)
Traceback (most recent call last):
 File "< Stdin> ", line 1, in <module>
stopiteration

Now let's look at a more interesting example: using an iterator to generate Hofstadter Q sequences based on initial conditions. Hofstadter The nesting sequence for the first time in his book, G?del, Escher, Bach:an eternal Golden braid, and since then began the question of proving that the sequence is valid for all n. The following code uses an iterator to generate the Hofstadter sequence for a given n, defined as follows:

Q (n) =q (N-q (n-1)) +q (n?) Q (n?2))

Given an initial condition, for example, qsequence ([1, 1]) will generate an H sequence. We use the stopiteration exception to indicate that the sequence cannot continue to be generated because a legitimate subscript index is required to generate the next element. For example, if the initial condition is [1,2], then the sequence generation will stop immediately.

Class Qsequence (object):
  def __init__ (self, s):
    SELF.S = s[:]
 
  def next (self):
    try:
      q = self.s[- SELF.S[-1]] + self.s[-self.s[-2]]
      self.s.append (q) return
      q
    except Indexerror:
      raise Stopiteration (
 
  def __iter__ (self): return
    self
 
  def current_state (self): return
    SELF.S

Usage here:

>>> q = qsequence ([1, 1])
>>> next (q)
2
>>> Next (q)
3
>> > [Next (Q) for __ in xrange]
[3, 4, 5, 5, 6, 6, 6, 8, 8, 8]

Generators

The Builder (generator) is a generator that is defined with a simpler function expression. More specifically, yield expressions are used inside the generator. The builder does not return the value using the returns, and returns the result using the yield expression when needed. Python's intrinsic mechanism helps to remember the context of the current generator, which is the current control flow and the values of local variables. Each time the generator is invoked, yield returns the next value in the iteration. The __iter__ method is implemented by default, meaning that you can use generators wherever you can use iterators. The following example implements the same functionality as the example above, but the code is more compact and more readable.

Def count_generator ():
  n = 0 while
  True:
   yield n
   n = 1

Take a look at the usage:

>>> counter = count_generator ()
>>> counter
<generator object Count_generator at 0x106bf1aa0>
>>> Next (counter)
0
>>> Next (counter)
1
>>> ITER (counter)
<generator object count_generator at 0x106bf1aa0>
>>> iter (counter) is counter
True
>>> Type (counter)
<type ' generator ' >

Now let's try using the generator to implement the Hofstadter ' s Q queue. This implementation is simple, but we can't implement the previous functions like current_state. As far as I know, it is not possible to directly access the variable state inside the generator internally, so functions such as current_state cannot be implemented (although there are data structures such as gi_frame.f_locals that can be done, but this is a special implementation of CPython, is not a standard part of the language, so it is not recommended for use. If you need to access internal variables, one possible method is to return all the results via yield, and I'll leave this question for practice.

def hofstadter_generator (s):
  a = s[:] While
  True:
    try:
      q = a[-a[-1]] + a[-a[-2]
      a.append (q)
      Yield q
    except Indexerror:
      return

Note that there is a simple return statement at the end of the generator iteration process, but no data is returned. Internally, this throws a Stopiteration exception.

The next example comes from Groupon's face test. Here we first use two generators to implement the Bernoulli process, which is an infinite sequence of random Boolean values, and the probability of True is P and false is q=1-p. It then implements a Von Neumann extractor, which obtains input (0<p<1) from the Bernoulli process and returns another Bernoulli process (p=0.5).

Import Random
 
def bernoulli_process (p):
  if p > 1.0 or P < 0.0:
    raise ValueError ("p should be between 0.0 and 1.0. ")
 
  While true:
    yield Random.random () < P-
 
def von_neumann_extractor (process):
  while true:
    x, y = Process.next (), Process.next ()
    if x!= y:
      yield X

Finally, the generator is a very useful tool for generating stochastic dynamic systems. The following example shows how the famous tent map (tent map) dynamic system is implemented through the generator. (a digression to see how the inaccuracy of the value begins to correlate and grow exponentially, a key feature of a dynamic system such as a tent map).

>>> def tent_map (Mu, x0):
...  x = x0 ...  While True: ...    Yield x ...    x = mu * min (x, 1.0-x) ...
>>>
>>> t = Tent_map (2.0, 0.1)
>>> for _ in Xrange ():
...  Print T.next () ...
0.1
0.2
0.4
0.8
0.4
0.8
0.4 0.8 0.4 0.8 0.4 0.8 0.4
0.8
0.4
0.8
0.4
0.799999999999 0.400000000001 0.800000000003 0.399999999994
0.799999999988
0.400000000023
0.800000000047
0.399999999907
0.799999999814
0.400000000373
0.800000000745
0.39999999851
0.79999999702

Another similar example is the Collatz sequence.

def Collatz (n):
  yield n while
  n!= 1:
   n = N/2 if n% 2 = 0 Else 3 * n + 1
   yield n

Note that in this example, we still do not throw the stopiteration exception manually because it is automatically thrown when the control flow reaches the end of the function.

Please see usage:

>>> # If The Collatz conjecture is true then list (Collatz (n)) for any n would
... # always terminate (though Y Our machine might run out of memory first!)
>>> List (Collatz (7)) [7, A, A, M, A,
5, 8, 4, 2, 1]
>>> list (col Latz (a)
[5, 8, 4, 2, 1]
>>> list (Collatz)
[17, 52, 26, 13, 40, 20, 10, 5, 1 6, 8, 4, 2, 1]
>>> list (Collatz) [19, 58, 29, 88, 44, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16
, 8, 4, 2, 1]

Recursive generators

The generator can be recursively recursive like other functions. Let's look at a itertools.permutations simple version of the generator that generates its full array by giving a list of the item (in practice, use itertools.permutations, which is faster). The basic idea is simple: for each element of the list, we place it in the first position by exchanging it with the first element of the list, and then recursively rearrange the remainder of the list.

def permutations (items):
  If len (items) = = 0:
    yield []
  else:
    pi = items[:] For I in
    xrange (len (pi)): C6/>pi[0], pi[i] = Pi[i], pi[0] for
      p in permutations (pi[1:]):
        yield [pi[0]] + p

>>> for P in permutations ([1, 2, 3]):
...   Print P ...
[1, 2, 3]
[1, 3, 2]
[2, 1, 3]
[2, 3, 1]
[3, 1, 2]
[3, 2, 1]

Generator Expressions

The builder expression allows you to define the builder through a simple, single-line declaration. This is very similar to the list derivation in Python, for example, where the following code defines all the total squares of a generator iteration. Note The return result of the generator expression is a generator type object that implements next and __iter__ two methods.

>>> g = (x * * 2 for X-Itertools.count (1))
>>> g
<generator object <genexpr> at 0x102 9a5fa0>
>>> Next (g)
1
>>> next (g)
4
>>> iter (g)
< Generator Object <genexpr> at 0x1029a5fa0>
>>> iter (g) is G
True
>>> [G.next ( ) for __ in xrange
[9, 16, 25, 36, 49, 64, 81, 100, 121, 144]

You can also use the builder expression to implement the Bernoulli process, which is p=0.4 in this example. If a generator expression requires another iterator as a loop indicator, and the birth-timer expression is used on an infinite sequence, then Itertools.count would be a good choice. Otherwise, xrange would be a good choice.

>>> g = (Random.random () < 0.4 for _ in Itertools.count ())
>>> [G.next () for __ in xrange (10)]< C16/>[false, False, False, True, True, False, True, False, False, true]

As mentioned earlier, the builder expression can be used in any place that requires an iterator as a parameter. For example, we can calculate the sum of the first 10 full squares by using the following code:

>>> SUM (x * * 2 for X in Xrange ())
285

More examples of generator expressions are given in the next section.
Itertools Module

The Itertools module provides a range of iterators to help users easily use permutations, combinations, Cartesian product, or other combination structures.

Before starting the following section, note that all the code given above is not optimized, and here is just an example. In practice, you should avoid having to implement permutations and combinations unless you have a better idea, because the number of enumerations is increased by the point of magnitude.

Let's start with some interesting use cases. The first example looks at how to write a common pattern: loop through all the subscript elements of a three-dimensional array, and iterate through all the subscripts that satisfy the 0≤i<j<k≤n condition.

From Itertools import combinations, product
 
n = 4
d = 3
 
def visit (*indices):
  Print Indices
 
# Loop thro Ugh all possible indices of a-a-b array for
I-xrange (n): for J-in-Xrange (n): for K-in Xrange (n):
      Visit ( I, J, K)
 
# equivalent using itertools.product for
indices in product (* ([Xrange (n)] * d)):
  Visit (*indices)
 
# now loop through all indices 0 <= I < J < K <= N
to I in Xrange (n):
  for J in Xrange (i + 1, N): For
    K in Xrange (j + 1, N):
      Visit (I, J, K)
 
# and equivalent using itertools.combinations for
Indice s in combinations (Xrange (n), D):
  visit (*indices)

The enumerator provided with the Itertools module has two advantages: the code can be done in a single line, and it is easy to scale to a higher dimension. I do not compare the For method and Itertools two methods of performance, perhaps with N has a great relationship. If you want, please test your own judgment.

The second example is to do some interesting math questions: Use builder expressions, itertools.combinations, and itertools.permutations to calculate the number of reverse permutations in a list, and to calculate the sum of the number of ordered lists in reverse order. As Oeis A001809 shows, the result of the summation tends to be n!n (n-1)/4. It is more efficient to use this formula directly in practice than the above code, but I write this example to practice the use of the Itertools enumerator.

Import itertools
import Math
 
def inversion_number (a): "" "return the number of
  inversions in list A.
  " " return sum (1 for x, y in Itertools.combinations (xrange (len (A)), 2) if a[x] > A[y])
 
def total_inversions (n): "" "
  Return $ of inversions in permutations of N. "
  " " return sum (Inversion_number (a) for A in Itertools.permutations (xrange (n)))

Usage is as follows:

>>> [Total_inversions (n) for n in xrange]
[0, 0, 1, 9, 5400, 564480, 6531840]
 
>& gt;> [Math.factorial (n) * n * (n-1)/4 for N in Xrange (Ten)]
[0, 0, 1, 9, 72, 600, 5400, 52920, 564480, 6531840]

The third example calculates recontres number by Brute-force counting method. Recontres number is defined here. First, we write a function that uses a builder expression in a summation to compute the number of fixed points that appear in the permutation. Then use itertools.permutations and other builder expressions in the sum to calculate the total number of permutations that contain n and have K fixed points. and get the results. Of course, this implementation is inefficient and does not encourage use in practical applications. Again, this is just to disguise the builder expression and an example of how the Itertools-related function is used.

def count_fixed_points (P): "" "return the number of
  fixed points of p as a permutation.
  " " return sum (1 for X in P if p[x] = = x)
 
def count_partial_derangements (N, k): "" "Returns the number of
  permutations of N with K fixed points.
  "" " return sum (1 for P in Itertools.permutations (Xrange (n)) if count_fixed_points (p) = = k)

Usage:

# Usage:
>>> [Count_partial_derangements (6, I) for I in Xrange (7)]
[265, 264, 135, 40, 15, 0, 1]

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More