The scope of the index variable in the for loop in Python

Source: Internet
Author: User
This article mainly introduces the scope of the index variable under the for loop in Python, which is the basic knowledge in Python learning. This article provides an example of Python3 to help readers understand, for more information, see start a Test. What are the functions of the function below?

def foo(lst):  a = 0  for i in lst:    a += i  b = 1  for t in lst:    b *= i  return a, b

If you think its function is to calculate the sum and product of all elements in the lst, do not be frustrated. It is usually difficult to find the error here. If this error is found in a bunch of real code, it will be very bad. -- This error is hard to be found when you do not know it is a test.

The error here is that I instead of t is used in the second loop body. Wait, how does this work? I should be invisible outside the first loop? [1] Oh, no. As a matter of fact, Python has officially declared that the name defined for the for loop target (more strictly called "index variable") can be leaked to the peripheral function scope. So the following code:

for i in [1, 2, 3]:  passprint(i)

This code is valid and can be printed out 3. In this article, I would like to explore why this is the case, why it is unlikely to change, and use it as a tracking bullet to explore some interesting parts of the CPython editor.

By the way, if you do not believe that such behavior may cause real problems, consider this code snippet:

def foo():  lst = []  for i in range(4):    lst.append(lambda: i)  print([f() for f in lst])

If you expect the above code to print [,], your expectation will fail, and it will print [, 3, 3]; because there is only one I in the scope of foo, this I is captured by all lambda.
Official description

The for loop section in the Python reference document clearly records this behavior:

The for loop assigns a variable to the target list ....... When the loop ends, the variables in the value assignment list will not be deleted, but if the sequence is empty, they will not be assigned to all loops.

Pay attention to the last sentence. let's try:

for i in []:  passprint(i)

Indeed, the above code throws a NameError exception. Later, we will see that this is the inevitable result of the Python virtual machine's bytecode method.
Why?

Actually, I asked Guido van Rosum about the cause of this execution. he generously told me some of the historical backgrounds (thanks to Guido !). In this way, the motivation for code execution is to keep Python simple in getting variables and scopes, instead of using hacks (for example, after the loop is complete, delete all variables defined in the loop-think about the exceptions it may cause) or more complex scope rules.

The scope rules of Python are very simple and elegant: the code blocks of modules, classes, and functions can be introduced into the scope. In a function, variables are visible from their definition to the end of a code block (including nested code blocks, such as nested functions. Of course, the rules for local variables, global variables (and other nonlocal variables) are slightly different. However, this has nothing to do with our discussion.

The most important thing here is that the innermost possible scope is a function body. It is not a for loop body. It is not a with code block. Python is different from other programming languages (such as C and its descendant languages), and there is no nested lexical scope at the function level.

Therefore, if you only implement the code based on Python, your code may end with such execution behavior. The following is another enlightening code snippet:

for i in range(4):  d = i * 2print(d)

The variable d is visible and accessible after the for loop ends. Are you surprised by this discovery? No, this is exactly how Python works. So why is the scope of the index variable treated differently?

By the way, index variables in list comprehension also leak to their closed scopes, or more accurately, they can be leaked before Python 3.

Python 3 contains many major changes and fixes the variable leakage issue in the list derivation. There is no doubt that this damages backward compatibility. This is why I think the current execution behavior will not be changed.

In addition, many people still find that this is a useful feature in Python. Consider the following code:

for i, item in enumerate(somegenerator()):  dostuffwith(i, item)print('The loop executed {0} times!'.format(i+1))

If you do not know the number of items returned by somegenerator, you can use this simple method. Otherwise, you must have an independent counter.

Here is another example:

for i in somegenerator():  if isinteresing(i):   breakdostuffwith(i)

This mode can effectively search for an item in a loop and use it later. [2]

For many years, many users have wanted to retain this feature. However, even for harmful features identified by developers, it is difficult to introduce major changes. When many people think that this feature is useful and used in a large amount of real-world code, it will not be removed.
Under the hood

It is now the most interesting part. Let's take a look at how the Python compiler and VM work together to make this code execution possible. In this special case, I think the clearest way to present these is to start reverse analysis from bytecode. I hope to use this example to introduce how to mine information in Python [3] (this is so fun !).

Let's take a look at part of the functions proposed at the beginning of this article:

def foo(lst):  a = 0  for i in lst:    a += i  return a

The generated bytecode is:

 0 LOAD_CONST        1 (0) 3 STORE_FAST        1 (a)  6 SETUP_LOOP       24 (to 33) 9 LOAD_FAST        0 (lst)12 GET_ITER13 FOR_ITER        16 (to 32)16 STORE_FAST        2 (i) 19 LOAD_FAST        1 (a)22 LOAD_FAST        2 (i)25 INPLACE_ADD26 STORE_FAST        1 (a)29 JUMP_ABSOLUTE      1332 POP_BLOCK 33 LOAD_FAST        1 (a)36 RETURN_VALUE

As a prompt, LOAD_FAST and STORE_FAST are bytecode (opcode). Python uses it to access variables used only in functions. Since the Python compiler knows (during compilation) how many such static variables exist in each function, they can be accessed through static array offset instead of a hash table, this makes the access speed faster (hence the _ FAST suffix ). I have some questions. What really matters here is that variables a and I are treated equally. They are all obtained through LOAD_FAST and modified through STORE_FAST. There is absolutely no reason to think that their visibility is different. [4]

So how does this execution happen? Why does the compiler think that variable I is only a local variable in foo. In the code of this logic in the symbol table, when the compiler runs to AST, it creates a control flow chart and then generates bytecode. More details about this process are introduced in my article on symbol tables-so I will only mention the key points here.

The symbol table code does not consider the for statement very special. The following code is available in symtable_visit_stmt:

case For_kind:  VISIT(st, expr, s->v.For.target);  VISIT(st, expr, s->v.For.iter);  VISIT_SEQ(st, stmt, s->v.For.body);  if (s->v.For.orelse)    VISIT_SEQ(st, stmt, s->v.For.orelse);  break;

Index variables are accessed like any other expressions. Since the code accesses AST, it is worth seeing what is inside the for statement node:

For(target=Name(id='i', ctx=Store()),  iter=Name(id='lst', ctx=Load()),  body=[AugAssign(target=Name(id='a', ctx=Store()),          op=Add(),          value=Name(id='i', ctx=Load()))],  orelse=[])

So I is in a node named Name. These are handled by the symbol table code using the following statements in symtable_visit_expr:

case Name_kind:  if (!symtable_add_def(st, e->v.Name.id,             e->v.Name.ctx == Load ? USE : DEF_LOCAL))    VISIT_QUIT(st, 0);  /* ... */

Because variable I is clearly marked as DEF_LOCAL (because the * _ FAST bytecode is accessible, it is easy to see that, if the symbol table is unavailable, use the symtable module). The preceding code calls symtable_add_def and DEF_LOCAL as the third parameter. Now let's take a look at the AST above and notice the ctx = Store section of I in the Name node. Therefore, it stores the AST of I information in the target part of the For node. Let's see how this works.

The AST build part in the compiler goes beyond the parsing tree (this is a fairly underlying representation in the source code-some background information can be obtained here), and in other cases, set the expr_context attribute at some nodes, the most notable is the Name node. Let's look at the following statement:

foo = bar + 1

Both for and bar variables will end in the Name node. However, bar is only loaded into this code, while for is actually stored in this code. The expr_context attribute is used to identify current and future use using the symbol table code [5].

Return to the index variable of our for loop. The content will be processed in the ast_for_for_stmt -- for statement creation AST. The following describes the related functions:

static stmt_tyast_for_for_stmt(struct compiling *c, const node *n){  asdl_seq *_target, *seq = NULL, *suite_seq;  expr_ty expression;  expr_ty target, first;   /* ... */   node_target = CHILD(n, 1);  _target = ast_for_exprlist(c, node_target, Store);  if (!_target)    return NULL;  /* Check the # of children rather than the length of _target, since    for x, in ... has 1 element in _target, but still requires a Tuple. */  first = (expr_ty)asdl_seq_GET(_target, 0);  if (NCH(node_target) == 1)    target = first;  else    target = Tuple(_target, Store, first->lineno, first->col_offset, c->c_arena);   /* ... */   return For(target, expression, suite_seq, seq, LINENO(n), n->n_col_offset,        c->c_arena);}

The Store context is created when the ast_for_exprlist function is called. this function creates a node for the index variable. (note that the index variable of the for loop may also be a tuples of a sequence variable, instead of just a variable ).

This function is the final part of the process of describing why for loop variables are treated equally with other variables in the loop. After marking in AST, the code used to process cyclic variables in the symbol table and virtual machine is the same as the code used to process other variables.
Conclusion

This article discusses some specific behaviors in Python that may be considered "difficult. I hope this article does explain the code execution behaviors of Python variables and scopes, and explains why these behaviors are useful and never change, and how to make the Python compiler work normally. Thank you for reading this article!

[1] Here, I 'd like to joke about Microsoft Visual C ++ 6, but the fact is disturbing, because most of the readers of this blog did not understand this joke in 2015 (this reflects my age, not my reader's ability ).

[2] you may say that dowithstuff (I) can enter if before the break is executed. However, this is not always convenient. In addition, according to Guido's explanation, here we have made a good separation of the issues we are concerned about-the loop is used for and only for search. After the search is complete, what will happen to the variable in the loop is not a matter of loop attention. I think this is a very good idea.

[3]: the code in my article is usually based on Python 3. Specifically, I expect the default branch of the next version (3.5) to be completed in the Python library. However, for this specific topic, the source code of any version in the 3.x series should work.

[4] Another obvious thing in function decomposition is that if the loop is not executed, why is I still invisible, GET_ITER and FOR_ITER treat our cycle as an iterator and call its _ next _ method. If the call ends with a StopIteration exception, the VM captures the exception and ends the loop. The VM continues to execute STORE_FAST for I only when the actual value is returned. Therefore, this value exists so that subsequent code can be referenced.

[5] This is a strange design. I suspect the essence of this design is to use relatively clean recursion to access the code in AST, such as symbol table code and CFG generator.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.