Explain the scope of the index variable under the for loop in Python

Source: Internet
Author: User
Let's start with a test. What is the function of the following functions?




def foo(lst):
  a = 0
  for i in lst:
    a += i
  b = 1
  for t in lst:
    b *= i
  return a, b





If you think its function is "calculate the and product of all elements in LST", don't be depressed. It is often difficult to find errors here. If you find this error in a large heap of real code, it's very powerful. -When you don't know this is a test, it's hard to find this error.



The mistake here is to use I instead of T in the second loop body. Wait, how the hell does this work? I should be invisible outside of the first loop? [1] Oh, no. In fact, Python formally declares that the name defined for the For loop target (the more restrictive formal name "index variable") can be leaked to the perimeter function scope. So the following code:


for i in [1, 2, 3]:
  pass
print(i)





This code is valid and can be printed out in 3. In this article, I want to explore why this is so, why it is unlikely to change, and to use it as a tracking bullet to dig some interesting parts of the CPython editor.



By the way, if you don't believe this behavior can lead to real problems, consider this snippet of code:


def foo():
  lst = []
  for i in range(4):
    lst.append(lambda: i)
  print([f() for f in lst])





If you expect the above code to print out [0,1,2,3], your expectations will be dashed, and it will print [3,3,3,3]; because there is only one I in the scope of Foo, this I is captured by all lambda.
Official notes



This behavior is explicitly documented in the For Loop section of the Python Reference document:



The For loop assigns a variable to the target list. ...... When the loop ends, the variables in the assignment list are not deleted, but if the sequence is empty, they will not be assigned to all loops.



Note the last sentence, let's try:


for i in []:
  pass
print(i)





Indeed, the above code throws a Nameerror exception. Later, we'll see this as a corollary to how the Python virtual machine executes the bytecode.
Why would that be?



Actually, I asked Guido van Rossum about the reason for the execution, and he generously told me some of the historical background (thanks to guido! )。 The motivation for executing the code is to keep python simple for variables and scopes without resorting to hacks (for example, after the loop is complete, delete all variables defined in the loop-think about the exceptions it might throw) or more complex scope rules.



Python's scope rules are simple and elegant: modules, classes, and code blocks of functions can be introduced into scopes. In the body of a function, variables are visible from their definition to the end of the code block, including nested blocks of code, such as nested functions. Of course, the rules for local variables, global variables (and other nonlocal variables) are slightly different. However, this does not have much to do with our discussion.



The most important point here is that the possible scope of the most inner layer is a function body. is not a for loop body. is not a with code block. Unlike other programming languages, such as C and its descendant languages, Python does not have a nested lexical scope at the function level.



So, if you're just implementing based on Python, your code might end up behaving like this. Here's another piece of heuristic code snippet:


for i in range(4):
  d = i * 2
print(d)





The variable d is visible and accessible at the end of the for loop, are you surprised by this discovery? No, that's exactly how Python works. So why is the scope of the index variable treated differently?



By the way, the index variables in list comprehension are also leaked to their enclosing scopes, or, more accurately, before Python 3.



Python 3 contains a number of breaking changes, which also fix the problem of variable disclosure in the list derivation. There is no doubt that this undermines backward-compatible neutrality. This is why I think the current execution behavior will not be changed.



In addition, many people still find this to be a useful feature in Python. Consider the following code:


for i, item in enumerate(somegenerator()):
  dostuffwith(i, item)
print('The loop executed {0} times!'.format(i+1))





You can use this concise way if you do not know the number of somegenerator returned items. Otherwise, you will have to have a separate counter.



Here's an additional example:


for i in somegenerator():
  if isinteresing(i):
   break
dostuffwith(i)





This mode effectively finds an item in a loop and then uses it. [2]



Over the years, many users have wanted to preserve this feature. But even for developers to identify harmful features, it is difficult to introduce significant changes. When many people think that this feature is useful, and in the real world of code in a lot of use, it will not remove this feature.
Under the hood



Now is the most interesting part. Let's take a look at how Python compilers and VMS work together to make this code execution possible. In this particular case, I think the clearest way to present these is to start the inverse analysis from bytecode. I would like to use this example to show you how to dig inside the Python [3] information (this is so fun!). )。



Let's take a look at the part of the function that this article starts with:


def foo(lst):
  a = 0
  for i in lst:
    a += i
  return a





The resulting byte code is:


 0 LOAD_CONST        1 (0)
 3 STORE_FAST        1 (a)
 
 6 SETUP_LOOP       24 (to 33)
 9 LOAD_FAST        0 (lst)
12 GET_ITER
13 FOR_ITER        16 (to 32)
16 STORE_FAST        2 (i)
 
19 LOAD_FAST        1 (a)
22 LOAD_FAST        2 (i)
25 INPLACE_ADD
26 STORE_FAST        1 (a)
29 JUMP_ABSOLUTE      13
32 POP_BLOCK
 
33 LOAD_FAST        1 (a)
36 RETURN_VALUE





As a hint, load_fast and store_fast are bytecode (opcode), which Python uses to access variables that are used only in functions. Because the Python compiler knows how many such static variables are in each function (at compile time), they can be accessed through a static array offset rather than a hash table, which makes access faster (and hence the _fast suffix). I'm a bit off the subject. What really matters here is that variables a and I are treated equally. They are obtained through load_fast and modified by Store_fast. There is absolutely no reason to think that their visibility is different. [4]



So how does this behavior happen? Why does the compiler think that the variable i is just a local variable in foo. This logic is in the code in the symbol table, when the compiler executes to the AST to begin creating a control flow graph, which then produces the bytecode. More details of this process are presented in my article on the symbol table-so I'll just mention the focus here.



The symbol table code does not consider the For statement to be special. The following code is in the symtable_visit_stmt:


case For_kind:
  VISIT(st, expr, s->v.For.target);
  VISIT(st, expr, s->v.For.iter);
  VISIT_SEQ(st, stmt, s->v.For.body);
  if (s->v.For.orelse)
    VISIT_SEQ(st, stmt, s->v.For.orelse);
  break;





An indexed variable is accessed as if it were any other expression. Since the code accesses the AST, it's worth looking at what's inside the For statement node:


For(target=Name(id='i', ctx=Store()),
  iter=Name(id='lst', ctx=Load()),
  body=[AugAssign(target=Name(id='a', ctx=Store()),
          op=Add(),
          value=Name(id='i', ctx=Load()))],
  orelse=[])





So I is in a node named name. These are handled by the symbol table code through the following statement in SYMTABLE_VISIT_EXPR:


case Name_kind:
  if (!symtable_add_def(st, e->v.Name.id,
             e->v.Name.ctx == Load ? USE : DEF_LOCAL))
    VISIT_QUIT(st, 0);
  /* ... */





Since the variable i is clearly marked as def_local (because * _fast bytecode is accessible, it is also easy to observe that if the symbol table is not available then use the Symtable module), the above obvious code calls SYMTABLE_ADD_DEF with Def_local As a third parameter. Now take a look at the AST above and notice the ctx=store portion of I in the name node. Therefore, it is the AST that stores the information of I in the target portion of the for node. Let's see how this is achieved.



The AST build part of the compiler crosses the parse tree (this is a fairly low-level representation of the source code-some background information can be obtained here), while in other things, the Expr_context attribute is set at some nodes, most notably the name node. Think about it, in this way, in the following statement:


Foo = bar + 1





Both the for and bar variables will end in the name node. But bar is just loaded into this code, and for is actually stored in this code. The Expr_context attribute is used to differentiate between current and future use by symbol table code [5].



Return to our index variable for the for loop. These will be handled in the function ast_for_for_stmt--for statement creation ast--. The following is the relevant part of the function:


static stmt_ty
ast_for_for_stmt(struct compiling *c, const node *n)
{
  asdl_seq *_target, *seq = NULL, *suite_seq;
  expr_ty expression;
  expr_ty target, first;
 
  /* ... */
 
  node_target = CHILD(n, 1);
  _target = ast_for_exprlist(c, node_target, Store);
  if (!_target)
    return NULL;
  /* Check the # of children rather than the length of _target, since
    for x, in ... has 1 element in _target, but still requires a Tuple. */
  first = (expr_ty)asdl_seq_GET(_target, 0);
  if (NCH(node_target) == 1)
    target = first;
  else
    target = Tuple(_target, Store, first->lineno, first->col_offset, c->c_arena);
 
  /* ... */
 
  return For(target, expression, suite_seq, seq, LINENO(n), n->n_col_offset,
        c->c_arena);
}





The store context was created when the function ast_for_exprlist was called, and the function created a node for the indexed variable (note that the for-loop index variable might also be a tuple of a sequence of variables, not just a variable).



In the process of describing why a for loop variable and other variables in a loop are treated equally, this function is the last total part. After tagging in the AST, the code used to process the loop variable in the symbol table and the virtual machine is the same as the code that handles the other variables.
Conclusion



This article discusses some of the specific behaviors that may be considered "incurable" in Python. I hope this article does explain the code execution behavior of Python variables and scopes, explaining why these behaviors are useful and never likely to change, and how the Python compiler works internally. Thank you for reading!



[1] Here, I would like to open a Microsoft Visual C + + 6 joke, but the fact is a little uncomfortable, because in 2015 this blog most readers will not understand the joke (this reflects my age, not my reader's ability).



[2] You may say that dowithstuff (i) can enter if before the break is executed. However, this is not always convenient. In addition, according to Guido's explanation, there is a good separation of our concerns-loops are used and used only for search. What happens to variables in a loop after the search is over is not a matter of circular concern. I think that's a very good point.



[3]: Usually the code in my article is based on Python 3. Specifically, I look forward to the default branch of the next version (3.5) that will be completed in the Python library. However, for this particular topic, any version of the source code in the 3.x series should be working.



[4] Another obvious thing in function decomposition is that if the loop does not execute, why I is still invisible, get_iter and for_iter this pair of bytecode will take our loop as an iterator and then call its __next__ method. If this call ends with throwing a stopiteration exception, the virtual machine catches the exception and ends the loop. Only the actual value is returned, the virtual machine will continue to execute store_fast on I, so let this value exist so that subsequent code can refer to it.



[5] This is a strange design, and I suspect that the essence of this design is to use relatively clean recursive access to the code in the AST, such as the symbol table code and the CFG generator.


  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.