A detailed description of the use of generators and yield statements in Python

Source: Internet
Author: User
Before I started the course, I asked students to fill out a questionnaire that reflected their understanding of some of the concepts in Python. Some topics ("If/else Control Flow" or "defining and using Functions") are not a problem for most students. But there are some topics where most students have little or no contact at all, especially the "generator and yield keywords". I guess this is true for most novice python programmers, too.





It turns out that some people still don't understand the generator and the yield keyword after I've taken a lot of effort. I want to make this problem better. In this article, I'll explain exactly what the yield keyword is, why it's useful, and how to use it.



Note: In recent years, the generator has become more powerful and has been added to THE PEP. In my next article, I'll pass the Coroutine, collaborative multitasking (cooperative multitasking), and asynchronous IO (asynchronous I/O) (specifically GVR the implementation of the "Tulip" prototype being researched) To introduce the true power of yield. But before we do, we have a solid understanding of the generator and yield.



Co-routines and subroutines



When we call a normal Python function, we typically start with the first line of the function, ending with a return statement, an exception, or the end of the function (which can be considered an implicit return of none). Once the function returns control to the caller, it means that it is all over. All the work done in the function and the data saved in the local variables will be lost. When you call this function again, everything will be created from scratch.



This is a very standard process for the functions discussed in computer programming. Such a function can only return a value, but sometimes it can be helpful to create a function that produces a sequence. To do this, this function needs to be able to "save its own work".



As I said, being able to "produce a sequence" is because our function does not return as usual. Return implies that the function is returning control of the execution code to the place where the function was called. The implied meaning of "yield" is that the transfer of control is temporary and voluntary, and our function will regain control in the future.




In Python, a "function" with this ability is called a generator, which is very useful. The initial introduction of the generator (and the yield statement) is intended to allow programmers to write more easily the code that produces the sequence of values. Previously, to implement something like a random number generator, you needed to implement a class or a module that kept track of the state between each invocation while generating the data. Once the generator is introduced, this becomes very simple.



To better understand the problem that the generator solves, let's look at an example. In the process of understanding this example, always remember the problem we need to solve: the sequence of generated values.



Note: Outside of Python, the simplest generator should be something called coroutines. In this article, I will use this term. Keep in mind that in the Python concept, the process referred to here is the generator. The formal terminology of Python is the generator; The process is only easy to discuss and is not formally defined at the language level.



Example: Interesting primes



Suppose your boss lets you write a function, the input parameter is a list of int, and returns a result that can iterate with the number of primes 1.



Remember that iterators (iterable) are just one of the abilities of an object to return a particular member at a time.



You must think "This is very simple" and then quickly write the following code:


def get_primes (input_list):
   result_list = list ()
   for element in input_list:
     if is_prime (element):
       result_list.append ()
 
   return result_list
 
# Or better ...
 
def get_primes (input_list):
   return (element for element in input_list if is_prime (element))
 
# Here is an implementation of is_prime ...
 
def is_prime (number):
   if number> 1:
     if number == 2:
       return True
     if number% 2 == 0:
       return False
     for current in range (3, int (math.sqrt (number) + 1), 2):
       if number% current == 0:
         return False
     return True
   return False





The implementation of the above is_prime completely satisfies the requirement, so we told the boss that it was done. She said that our function worked properly and that was exactly what she wanted.



Handling Infinite Sequences



Oh, is that so? A few days later, the boss came to tell us that she had some minor problems: she was going to use our get_primes function for a large list with numbers. In fact, this list is very large, just creating this list will run out of all of the system's memory. To do this, she wants to be able to take a start parameter when calling the Get_primes function and return all primes that are larger than this parameter (perhaps she wants to resolve Project Euler problem 10).



Let's take a look at this new demand, and it's clear that simply modifying get_primes is not possible. Naturally, we cannot return a list containing all the primes from start to infinity (although there are many useful applications that can be used to manipulate an infinite sequence). It seems that the likelihood of dealing with the problem with ordinary functions is slim.




Before we give up, let's identify the core barrier that prevents us from writing functions that meet the new needs of the boss. By thinking, we get the conclusion that the function has only one chance to return the result, and therefore must return all results at once. It seems pointless to draw such a conclusion: "That's not how the function works", as we all usually think. However, do not learn, do not ask not know, "if they are not so?" ”



Imagine if Get_primes could simply return the next value instead of returning all the values at once, what can we do? We no longer need to create a list. Without a list, there is no memory problem. Because the boss told us that she only had to traverse the results, she would not know the difference between our implementations.



Unfortunately, this seems unlikely. Even if we have a magical function that allows us to traverse from N to infinity, we also get stuck after returning the first value:


def get_primes (start):
   for element in magical_infinite_range (start):
     if is_prime (element):
       return element
Suppose you call get_primes like this:
 
def solve_number_10 ():
   # She * is * working on Project Euler # 10, I knew it!
   total = 2
   for next_prime in get_primes (3):
     if next_prime <2000000:
       total + = next_prime
     else:
       print (total)
       return





Obviously, in get_primes, the input equals 3 and is returned on line 4th of the function. Unlike direct return, what we need is a value that can be prepared for the next request on exit.



But the function can't do that. When the function returns, it means that it is all done. We guarantee that the function can be called again, but we can't guarantee that, "Well, this time from line 4th on the last exit, instead of the regular start from the first line." The function has only one single entry: The 1th line of code for the function.

Into the generator



Such problems are so common that python specifically joins a structure to solve it: generators. A generator will "generate" a value. Creating a generator is almost as simple as the principle of a generator function.



The definition of a generator function is much like a normal function, except that it uses the yield keyword instead of return when it wants to generate a value. If the body of a def contains yield, the function will automatically become a generator (even if it contains a return). In addition to the above, there is no extra step in creating a generator.



The generator function returns an iterator to the generator. This is probably the last time you see the term "generator iterator," because they are often referred to as "generators." It is important to note that the generator is a special kind of iterator. As an iterator, the generator must define some methods, one of which is __next__ (). As with iterators, we can use the next () function to get the next value.



To get the next value from the generator, we use the next () function, just like with an iterator.



(Next () will worry about how to call the generator's __next__ () method). Since the generator is an iterator, it can be used in a for loop.



Whenever the generator is called, it returns a value to the caller. Use yield within the generator to do this (for example, yield 7). The simplest way to remember what yield actually did is to use it as a special return (plus a little bit of magic) for the generator function. **



Yield is the return (plus a little bit of magic) that is dedicated to the generator.



The following is a simple generator function:


>>> def simple_generator_function ():
>>> yield 1
>>> yield 2
>>> yield 3
Here are two simple ways to use it:
 
>>> for value in simple_generator_function ():
>>> print (value)
1
2
3
>>> our_generator = simple_generator_function ()
>>> next (our_generator)
1
>>> next (our_generator)
2
>>> next (our_generator)
3





Magic?



So where's the magic part? I'm glad you asked that question! When a generator function calls yield, the "state" of the generator function is frozen, the values of all the variables are preserved, and the next line of code to execute is also recorded until you call next again. Once next () is called again, the generator function starts where it last left off. If you never call next (), the state of yield preservation is ignored.



Let's rewrite the Get_primes () function, this time we'll write it a generator. Note that we no longer need the Magical_infinite_range function. Using a simple while loop, we created our own infinite string of columns.

def get_primes (number):
While True:
If Is_prime (number):
Yield number
Number + = 1



If the generator function calls return or executes to the end of the function, a stopiteration exception occurs. This informs the caller of next () that the generator has no next value (this is the behavior of the ordinary iterator). This is why this while loop appears in our Get_primes () function. Without this while, the generator function executes to the end of the function, triggering the stopiteration exception when we call next again. Once the generator values have been exhausted, then calling next () will cause an error, so you can only use each generator once. The following code is incorrect:





>>> our_generator = simple_generator_function ()
>>> for value in our_generator:
>>> print (value)
 
>>> # Our generator has no next value ...
>>> print (next (our_generator))
Traceback (most recent call last):
  File "", line 1, in
   next (our_generator)
StopIteration
 
>>> # However, we can always create another generator
>>> # Just call the generator function again
 
>>> new_generator = simple_generator_function ()
>>> print (next (new_generator)) # works fine
1





Therefore, this while loop is used to ensure that the generator function is never executed at the end of the function. Whenever you call next () This generator generates a value. This is a common method of dealing with infinite sequences (which is also very common).



Execution process



Let's go back to the place where we called Get_primes: solve_number_10.


def solve_number_10():
  # She *is* working on Project Euler #10, I knew it!
  total = 2
  for next_prime in get_primes(3):
    if next_prime < 2000000:
      total += next_prime
    else:
      print(total)
      return





Let's take a look at the call to Get_primes in the solve_number_10 for loop and see how the first few elements were created to help us understand. When a for loop requests the first value from Get_primes, we enter Get_primes, which is no different from entering the normal function.


    • The while loop into the third row
    • Stop at If condition judgment (3 is prime number)
    • Return 3 and execution control to solve_number_10 by yield


Next, go back to Insolve_number_10:


    • The For loop gets the return value 3
    • The For loop assigns it to the Next_prime
    • Total Plus Next_prime
    • A For loop requests the next value from the Get_primes


This time, when we entered Get_primes, we did not execute from the beginning, we continued from line 5th, where we left last.


def get_primes(number):
  while True:
    if is_prime(number):
      yield number
    number += 1 # <<<<<<<<<<





Most crucially, number also retains the value we had when we last called yield (for example, 3). Remember that yield passes the value to the caller of next () and also saves the "state" of the generator function. Next, Number adds to 4, returns to the start of the while loop, and continues to increase until the next prime number (5) is obtained. Once again we return the value of number through yield to the for loop of solve_number_10. This cycle is executed until the For loop ends (the number of primes is greater than 2,000,000).



More for Lidian weights.



Support for sending a value to the generator was added in Pep 342. PEP 342 adds a new feature that allows the generator to be implemented in a single statement, generating a value (as before), accepting a value, or generating a value at the same time and accepting a value.



We use the previous function of the prime number to show how to pass a value to the generator. This time, instead of simply generating primes larger than a certain number, we find the smallest primes that are larger than the geometric progression of a certain number (for example, 10, we want to generate more than 10,100,1000,10000 ...). The largest minimum prime number). We start from Get_primes:





def print_successive_primes (iterations, base = 10):
   # Like normal functions, generator functions can take one parameter
  
   prime_generator = get_primes (base)
   # What to add here
   for power in range (iterations):
     # What to add here
 
def get_primes (number):
   while True:
     if is_prime (number):
     # How to write here?
  The last few lines of get_primes need to be explained. The yield keyword returns the value of number, and statements like other = yield foo mean, "return the value of foo, and return the value to the caller, and set the value of other to that value". You can "send" a value to the generator via the send method.
 
def get_primes (number):
   while True:
     if is_prime (number):
       number = yield number
     number + = 1





In this way, we can set a different value for number each time the yield is executed. Now we can make up the missing part of the code in Print_successive_primes:


def print_successive_primes(iterations, base=10):
  prime_generator = get_primes(base)
  prime_generator.send(None)
  for power in range(iterations):
    print(prime_generator.send(base ** power))





Here are two things to note: First, we print the results of the generator.send, which is fine, because send sends the data to the generator and also returns the value generated by the generator through yield (as is done with the yield statement in the generator).



2nd, take a look at Prime_generator.send (None), and when you use Send to "start" a generator (that is, from the first line of the generator function to the position of the first yield statement), you must send None. This is not difficult to understand, according to the description just now, the generator has not gone to the first yield statement, if we happen to a real value, then no one to "receive" it. Once the generator is started, we can send the data as above.



Review



In the latter part of this series, we will discuss some of the high-level uses of yield and its effects. Yield has become one of the most powerful keywords in python. Now that we have a good understanding of how yield works, we have the necessary knowledge to understand some of the more "confusing" scenarios of yield.



Believe it or not, we have only uncovered a corner of the powerful power of yield. For example, send does work as previously said, but in a scenario like our example, where a simple sequence is generated, send is almost never used. Here I post a code that shows how the send is usually used. I'm not going to say much about how this code works and why it works, and it will be a good warm-up for the second part.


import random
 
def get_data ():
   "" "Returns 3 random numbers between 0 and 9" ""
   return random.sample (range (10), 3)
 
def consume ():
   "" "Show the dynamic average of the list of integers passed in each time" ""
   running_sum = 0
   data_items_seen = 0
 
   while True:
     data = yield
     data_items_seen + = len (data)
     running_sum + = sum (data)
     print ('The running average is ()'. format (running_sum / float (data_items_seen)))
 
def produce (consumer):
   "" "Generate a collection of sequences and pass to the consumer" "" "
   while True:
     data = get_data ()
     print ('Produced (}'. format (data))
     consumer.send (data)
     yield
 
if __name__ == '__main__':
   consumer = consume ()
   consumer.send (None)
   producer = produce (consumer)
 
   for _ in range (10):
     print ('Producing ...')
     next (producer) 






Please keep in mind that ...



I hope you can get some key ideas from the discussion in this article:


    • Generator is used to generate a series of values.
    • Yield is like the return result of the generator function.
    • Yield the only other thing to do is to save the state of a generator function
    • Generator is a special type of iterator (iterator)
    • Similar to iterators, we can get the next value from generator by using next ()
    • Ignoring some values by implicitly invoking next ()


I hope this article will be useful. If you have never heard of generator, I hope now you can understand what it is and why it is useful, and understand how to use it. If you are already somewhat familiar with generator, I hope this article will now let you clear some of the confusion about generator.


  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.