A detailed introduction to yield and generator in Python

Source: Internet
Author: User
Tags generator generator what generator
This article mainly to explain in python the yield and generator of the relevant information, the text is introduced in very detailed, for everyone has a certain reference value, need to see the friends below.





Objective



This article will introduce yield and generator in detail, including the following: What generator, how to generate generator, generator features, generator basic and advanced application scenarios, Precautions for use in generator. This article does not include enhanced generator, which is pep342 related content, which is described later.



Generator Foundation



In the function definition of Python, as long as the yield expression appears, the generator function is actually defined, and thegenerator functionreturn value is called a generator. This common function call differs, for example:

def gen_generator ():
 yield 1
def gen_value ():
 return 1
 
if __name__ == '__main__':
 ret = gen_generator ()
 print ret, type (ret) # <generator object gen_generator at 0x02645648> <type 'generator'>
 ret = gen_value ()
 print ret, type (ret) # 1 <type 'int'>
As you can see from the above code, the gen_generator function returns a generator instance

The generator has the following special:

• Follow the iterator protocol, iterator protocol needs to implement __iter__, next interface

• Can enter multiple times, return many times, can suspend the execution of the code in the function body

Let's take a look at the test code:

>>> def gen_example ():
... print 'before any yield'
... yield 'first yield'
... print 'between yields'
... yield 'second yield'
... print 'no yield anymore'
...
>>> gen = gen_example ()
>>> gen.next () # First call next
before any yield
'first yield'
>>> gen.next () # The second call to next
between yields
'second yield'
>>> gen.next () # The third call to next
no yield anymore
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
StopIteratio
Calling the gen example method does not output anything, indicating that the code of the function body has not yet started to execute. When the next method of the generator is called, the generator will execute to the yield expression, return the content of the yield expression, and then pause (hang) at this place, so the first call to next prints the first sentence and returns "first yield. Suspension means that the method's local variables, pointer information, and operating environment are saved until the next call to the next method resumes. After the second call to next, it pauses at the last yield, and when the next () method is called again, a StopIteration exception is thrown.

Because the for statement can automatically capture the StopIteration exception, the more common method of the generator (essentially any iterator) is to use it in a loop:

def generator_example ():
 yield 1
 yield 2

if __name__ == '__main__':
 for e in generator_example ():
 print e
 # output 1 2
What is the difference between the generator generated by the generator function and the ordinary function?

(1) The function is run from the first line every time, and the generator is run from the place where the last yield started

(2) The function call returns one (a group) value at a time, and the generator can return multiple times

(3) The function can be called repeatedly countless times, and a generator instance cannot continue to be called after the last value of yield or return

Using Yield in a function, and then calling the function is a way to generate a generator. Another common way is to use generator expression, For example:

>>> gen = (x * x for x in xrange (5))
>>> print gen
<Generator object <genexpr> at 0x02655710>
generator application

generator basic application

Why use a generator? The most important reason is that you can generate and "return" results on demand, instead of generating all the return values at once, and sometimes you don't know "all return values" at all.

For example, for the following code

RANGE_NUM = 100
 for i in [x * x for x in range (RANGE_NUM)]: # The first method: iterate over the list
 # do sth for example
 print i

 for i in (x * x for x in range (RANGE_NUM)): # The second method: iterate over the generator
 # do sth for example
 print i
In the above code, the output of the two for statements is the same, and the code literally appears to be the difference between brackets and parentheses. But the difference is very big. The first method returns a list, and the second method returns a generator object. As RANGE_NUM becomes larger, the larger the list returned by the first method, the greater the memory occupied; but there is no difference for the second method.

Let's look at an example that can "return" an infinite number of times:

def fib ():
 a, b = 1, 1
 while True:
 yield a
 a, b = b, a + b
This generator has the ability to generate countless "return values", users can decide when to stop iteration

Advanced generator application

Use scene one:

Generator can be used to generate data stream. Generator does not generate the return value immediately, but will generate the return value when it is needed, which is equivalent to an active pull process (pull), for example, there is now a log file, each line One record, for each record, people in different departments may handle it differently, but we can provide a common, on-demand data stream.

def gen_data_from_file (file_name):
 for line in file (file_name):
 yield line

def gen_words (line):
 for word in (w for w in line.split () if w.strip ()):
 yield word

def count_words (file_name):
 word_map = {}
 for line in gen_data_from_file (file_name):
 for word in gen_words (line):
  if word not in word_map:
  word_map [word] = 0
  word_map [word] + = 1
 return word_map

def count_total_chars (file_name):
 total = 0
 for line in gen_data_from_file (file_name):
 total + = len (line)
 return total
 
if __name__ == '__main__':
 print count_words ('test.txt'), count_total_chars ('test.txt')
The example above comes from a lecture at PyCon in 2008. gen_words gen_data_from_file is the data producer, and count_words count_total_chars is the consumer of data. As you can see, the data is only pulled when needed, not prepared in advance. In addition, gen_words (w for w in line.split () if w.strip ()) also generates a generator

Use scenario two:

In some programming scenarios, one thing may need to execute part of the logic, and then wait for a period of time, or wait for an asynchronous result, or wait for a state, and then continue to execute another part of logic. For example, in the microservice architecture, after service A executes a piece of logic, it requests some data from service B, and then continues execution on service A. Or in game programming, a skill is divided into multiple stages, and some actions (effects) are performed first, and then wait for a period of time before continuing. For situations that require waiting and do not want to block, we generally use the callback method. Here is a simple example:

 def do (a):
 print 'do', a
 CallBackMgr.callback (5, lambda a = a: post_do (a))
 
 def post_do (a):
 print 'post_do', a
CallBackMgr here registers a time after 5s, and then calls the lambda function after 5s. It can be seen that a piece of logic is split into two functions, and context transfer is also required (such as parameter a here). We use yield to modify this example, the return value of yield represents the waiting time.

 @yield_dec
 def do (a):
 print 'do', a
 yield 5
 print 'post_do', a
Here you need to implement a YieldManager, register the generator generator to YieldManager through yield_dec, and call the next method after 5s. The Yield version implements the same functions as callbacks, but looks much clearer.

Here is a simple implementation for reference:

#-*-coding: utf-8-*-
import sys
# import Timer
import types
import time

class YieldManager (object):
 def __init __ (self, tick_delta = 0.01):
 self.generator_dict = {}
 # self._tick_timer = Timer.addRepeatTimer (tick_delta, lambda: self.tick ())

 def tick (self):
 cur = time.time ()
 for gene, t in self.generator_dict.items ():
  if cur> = t:
  self._do_resume_genetator (gene, cur)

 def _do_resume_genetator (self, gene, cur):
 try:
  self.on_generator_excute (gene, cur)
 except StopIteration, e:
  self.remove_generator (gene)
 except Exception, e:
  print 'unexcepet error', type (e)
  self.remove_generator (gene)

 def add_generator (self, gen, deadline):
 self.generator_dict [gen] = deadline

 def remove_generator (self, gene):
 del self.generator_dict [gene]

 def on_generator_excute (self, gen, cur_time = None):
 t = gen.next ()
 cur_time = cur_time or time.time ()
 self.add_generator (gen, t + cur_time)

g_yield_mgr = YieldManager ()

def yield_dec (func):
 def _inner_func (* args, ** kwargs):
 gen = func (* args, ** kwargs)
 if type (gen) is types.GeneratorType:
  g_yield_mgr.on_generator_excute (gen)

 return gen
 return _inner_func

@yield_dec
def do (a):
 print 'do', a
 yield 2.5
 print 'post_do', a
 yield 3
 print 'post_do again', a

if __name__ == '__main__':
 do (1)
 for i in range (1, 10):
 print 'simulate a timer,% s seconds passed'% i
 time.sleep (1)
 g_yield_mgr.tick ()
Precautions:

(1) Yield cannot be nested!

def visit (data):
 for elem in data:
 if isinstance (elem, tuple) or isinstance (elem, list):
  visit (elem) # here value retuened is generator
 else:
  yield elem
  
if __name__ == '__main__':
 for e in visit ([1, 2, (3, 4), 5]):
 print e
The above code accesses each element in the nested sequence. Our expected output is 1 2 3 4 5 and the actual output is 1 2 5. Why, as the comment shows, visit is a generator function, so the generator object returned on line 4 and the code does not iterate through this generator instance. Then change the code and iterate on this temporary generator.

def visit (data):
 for elem in data:
 if isinstance (elem, tuple) or isinstance (elem, list):
  for e in visit (elem):
  yield e
 else:
  yield elem
Or you can use yield from in Python 3.3, this syntax was added in pep380

 def visit (data):
 for elem in data:
  if isinstance (elem, tuple) or isinstance (elem, list):
  yield from visit (elem)
  else:
  yield elem
(2) Use return in generator function

In the python doc, it is explicitly mentioned that it is possible to use return. When the generator executes here, it throws a StopIteration exception.

def gen_with_return (range_num):
 if range_num <0:
 return
 else:
 for i in xrange (range_num):
  yield i

if __name__ == '__main__':
 print list (gen_with_return (-1))
 print list (gen_with_return (1))
However, the return in the generator function cannot take any return value



 def gen_with_return (range_num):
 if range_num <0:
  return 0
 else:
  for i in xrange (range_num):
  yield i
The above code will report an error: SyntaxError: 'return' with argument inside generator

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.