1. Why should there be a generator
With the principle of asking why first and then asking how to do it, let's take a look at why Python adds the
generator function.
Python can be said to be very hot in the field of data science. I think part of the credit is its generator.
We know that we can use lists to store data, but when our data is particularly large, creating a list of stored data will take up a lot of memory. Then the generator comes in handy. It can be said to be a method that does not take up much computer resources.
2. Simple
generator
We can use list comprehension (generative) to initialize a list:
list5 = [x for x in range(5)]
print(list5) #output: [0, 1, 2, 3, 4]
We use a similar method to generate a generator, but this time we replace the above [] with ():
gen = (x for x in range(5))
print(gen)
#output: <generator object <genexpr> at 0x0000000000AA20F8>
Seeing that print(gen) above does not directly output the results, but tells us that this is a generator. So how do we call this gen.
There are two ways:
The first:
for item in gen:
print(item)
#output:
0
1
2
3
4
The second kind:
print(next(gen))#output:0
print(next(gen))#output:1
print(next(gen))#output:2
print(next(gen))#output:3
print(next(gen))#output:4
print(next(gen))#output:Traceback (most recent call last):StopIteration
All right. Now we can consider the principle behind it.
From the first call with for, we can know that the generator is iterable. More accurately, he is an iterator.
We can verify:
from collections import Iterable, Iterator
print(isinstance(gen, Iterable))#output:True
print(isinstance(gen, Iterator))#output:True
str, list, tuple, dict, set are all iterable, that is, you can use for to access each element in it. But they are not iterators.
What is an iterator?
We can understand the steps that we usually do one thing.
For example, we make tea:
First, you have to boil water.
Then, take out the tea set and tea leaves
Then, when the water boiled, I started making tea
Finally, it's tea tasting.
Suppose we define a function (iterator) for making tea, and then encapsulate the method of making tea into this function. Each time this function is called, it returns a step and saves the current execution state. If something happens in the middle, for example, when we go to step two, we suddenly pick up a call, and when we return to call this function, we will get step three (the water is boiled, and we will start making tea), that is, the state is saved. We can execute this tea brewing function until all steps are called.
Define a method, this method is executed step by step, and can save the state, this is the iterator.
Returning to the second access method above, to the sixth print(next(gen)), the system tells us Traceback (most recent call last): StopIteration. That is, the gen iteration is over to the end, and it is impossible to continue the iteration.
The generator itself is an iterator
We have encapsulated the algorithm internally, and specified that it returns a result to the caller under certain conditions. (x for x in range(5)) is implemented in this way. It is not realized (0, 1, 2, 3, 4) and then iterated out one by one, but generated one by one. This is why next(gen) works.
3. Application
1. As mentioned earlier, when the generator generates a large amount of data, it can help the system save memory. Is it really? Feel the following code:
import time
def get_time(func):
def wraper(*args, **kwargs):
start_time = time.time()
result = func(*args, **kwargs)
end_time = time.time()
print("Spend:", end_time-start_time)
return result
return wraper
@get_time
def _list(n):
l1 = [list(range(n)) for i in range(n)]
del ge
@get_time
def _generator(n):
ge = (tuple(range(n)) for i in range(n))
del t1
_list(1000)
_generator(1000)
Haha, if someone can't understand it, you can look at the code below, which is equivalent to the above
Off-topic: In fact, the get_time function above is a function that I usually use often. I wrote it in a toolkit for me (making wheels), that is, if I want to see the execution time (test efficiency) of a function, I use the get_time decorator in this package and just decorate it. So I suggest you to try to make your own wheels if you find a function that you often use. Here I will paste the code directly
import time
def _list(n):
l1 = [list(range(n)) for i in range(n)]
del l1
def _generator(n):
ge = (tuple(range(n)) for i in range(n))
del ge
start_time = time.time()
_list(1000)
end_time = time.time()
print("Spend:",end_time-start_time)
start_time = time.time()
_generator(1000)
end_time = time.time()
print("Spend:",end_time-start_time)
Well, running the code we can see:
Spend: 0.04300236701965332
Spend: 0.0
After analysis, you can know that the list is generated after putting 0-999 into a list, so it takes more time.
The generator only encapsulates the algorithm, and the algorithm is called every time it is called, which can save memory.
2.yield keywords
Well, the previous only told us to use () to create a generator. What if we want to define our own generator function? It doesn't seem to work with return. It doesn't matter, python has yield keywords. Its function is similar to the function of return, that is to return a value to the caller, but after the return value of the yield function, the function still maintains the state when the yield is called. When the next call is made, the code execution continues on the original basis until When the next yield is met or the end condition is satisfied, the function ends.
A simple example:
def test():
yield 1
yield 2
yield 3
t = test()
print(next(t))#output:1
print(next(t))#output:1
print(next(t))#output:1
print(next(t))#output:Traceback (most recent call last):StopIteration
Doesn't seem to have any eggs! In the year of Sao, existence is reasonable, and it is not unreasonable for Python to have a generator. There are many algorithms in mathematics that are infinitely exhaustive (such as natural numbers), we cannot exhaust them one by one, so generators can help us.
For example: Yang Hui triangle.
This is an infinite exhaustion, we can encapsulate his algorithm into a generator, just generate it when needed, so that it will not occupy a lot of computer memory resources.
The code is given below:
def triangle():
_list, new_list = [], []
while True:
length = len(_list)
if length == 0:
new_list.append(1)
else:
for times in range(length + 1):
if times == 0:
new_list.append(1)
elif times == length:
new_list.append(1)
else:
temp = _list[times-1] + _list[times]
new_list.append(temp)
yield new_list #Return the value, then suspend the function and wait for the next call
_list = new_list.copy()#After the call, it will continue to execute
new_list.clear()
n = 0
for result in triangle():
n += 1
print(result)
if n == 10:
break
result:
[1]
[1, 1]
[1, 2, 1]
[1, 3, 3, 1]
[1, 4, 6, 4, 1]
[1, 5, 10, 10, 5, 1]
[1, 6, 15, 20, 15, 6, 1]
[1, 7, 21, 35, 35, 21, 7, 1]
[1, 8, 28, 56, 70, 56, 28, 8, 1]
[1, 9, 36, 84, 126, 126, 84, 36, 9, 1]