Explains the scope of index variables under the for loop in Python _python

Source: Internet
Author: User
Tags stmt in python

Let's start with a test. What is the function of the following function?

def foo (LST):
  a = 0
  for i in LST:
    A +
  = i b = 1 for
  t in LST:
    b *= I return
  A, b

If you think its function is to "compute the LST of all the elements in a product," Don't be discouraged. It is often difficult to find errors here. If you find this error in a lot of real code, it's very powerful. It's hard to find this error when you don't know it's a test.

The error here is to use I instead of T in the second loop body. Wait, how the hell does this work? I should be invisible outside the first cycle? [1] Oh, no. In fact, Python has formally declared that the name defined for the For loop target (the more stringent official name "index variable") can be leaked to the perimeter function range. So the following code:

For i in [1, 2, 3]:
  pass
print (i)

This piece of code is valid and can print out 3. In this article, I want to explore why this is so, why it is unlikely to change, and to use it as a tracking bullet to dig up some interesting parts of the CPython editor.

Incidentally, if you do not believe that this behavior may cause real problems, consider this code fragment:

def foo ():
  lst = [] for
  I in range (4):
    lst.append (LAMBDA:I)
  print ([F () to f in LST])

If you expect the above code to print out [0,1,2,3], your expectations will be dashed and it will print out [3,3,3,3], because there is only one I in the scope of Foo, and this I is captured by all the lambda.
Official description

This behavior is clearly documented in the For Loop section in the Python Reference document:

The For loop assigns the variable to the destination list. ...... When the loop ends, the variables in the assignment list are not deleted, but if the sequence is empty, they will not be assigned to all loops.

Pay attention to the last sentence, let's try:

For i in []:
  pass
print (i)

Indeed, the above code throws a Nameerror exception. Later, we'll see that this is the inevitable result of the way Python virtual machines execute bytecode.
Why would that be?

In fact, I asked Guido van Rossum about the cause of the execution, and he generously told me some of the historical background (thank guido! )。 The motivation for executing the code is to keep Python's simplicity of variable and scope without resorting to hacks (for example, after the loop completes, delete all the variables defined in the loop-think about the exceptions it might throw) or more complex scope rules.

Python's scoping rules are simple and elegant: The code blocks for modules, classes, and functions can be introduced into scopes. In a function body, variables are visible from their definition to the end of a block of code, including nested blocks of code, such as nested functions. Of course, the rules for local variables, global variables (and other nonlocal variables) are slightly different. However, this does not have much to do with our discussion.

The most important point here is that the possible scope of the inner layer is a function body. is not a for loop body. is not a with code block. Python differs from other programming languages (for example, C and its descendants), and there is no nested lexical scope at the function level.

So if you're just implementing it based on Python, your code might end up with this execution behavior. Here's another exciting piece of code:

For I in range (4):
  d = i * 2
print (d)

Are you surprised by the discovery that variable d is visible and accessible after the end of the for loop? No, that's just how Python works. So why is the scope of index variables treated differently?

By the way, the index variable in the list-derivation (comprehension) is also leaked to its enclosing scope, or, more precisely, before Python 3 can be compromised.

Python 3 contains a number of significant changes that also fix the problem of variable disclosure in the list derivation. There is no doubt that this undermines backward compatibility neutrality. That's why I don't think the current execution behavior is going to change.

In addition, many people still find this to be a useful feature in Python. Consider the following code:

For I, item in enumerate (Somegenerator ()):
  dostuffwith (i, item)
print (' the loop executed {0} times! '. Format (i+1))

If you do not know the number of somegenerator returned items, you can use this concise approach. Otherwise, you'll have to have a separate counter.

Here's one other example:

For I in Somegenerator ():
  if Isinteresing (i):
   break
dostuffwith (i)

This pattern can effectively find an item in a loop and then use that item. [2]

For years, many users have wanted to retain this feature. But even for the harmful features that developers have identified, it is difficult to introduce significant changes. This feature is not removed when many people think that this feature is useful and is used heavily in real-world code.
Under the Hood

Now is the most interesting part. Let's take a look at how the Python compiler and the VM work together to make this code execution behavior possible. In this particular case, I think the clearest way to render these is to start the reverse analysis from bytecode. I would like to use this example to illustrate how to mine the information inside Python [3] (This is so fun!). )。

Let's take a look at some of the functions proposed in this article:

def foo (LST):
  a = 0 for
  i in LST:
    A + = I return
  a

The resulting byte code is:

 0 load_const        1 (0)
 3 store_fast        1 (a)
 
 6 Setup_loop       (to)
 9 load_fast        0 (LST)
Get_iter
For_iter (to        )
store_fast        2 (i)
 
load_fast        1 (a)
Load_ FAST        2 (i)
inplace_add
store_fast        1 (a)
jump_absolute
pop_block
 
load_fast        1 (a)
Return_value

As a hint, load_fast and store_fast are byte codes (opcode) that Python uses to access variables that are used only in functions. Because the Python compiler knows how many of these static variables (at compile time) are in each function, they can be accessed through a static array offset rather than a hash table, which makes the access faster (hence the _fast suffix). I digress a little. What is really important here is that the variables a and I are treated equally. They are both acquired through Load_fast and modified by Store_fast. There is absolutely no reason to think that their visibility is different. [4]

So how does this implementation happen? Why does the compiler think that variable i is just a local variable in foo. This logic is in the code in the symbol table, and when the compiler executes to the AST and starts to create a control flow graph, then the bytecode is generated. More details of this process are presented in my article on the symbol table--so I'll just mention the focus here.

The symbol table code does not consider a for statement to be special. The following code is available in the symtable_visit_stmt:

Case For_kind:
  VISIT (St, expr, s->v.for.target);
  VISIT (St, expr, s->v.for.iter);
  Visit_seq (St, stmt, s->v.for.body);
  if (s->v.for.orelse)
    visit_seq (St, stmt, s->v.for.orelse);
  Break

An index variable is accessed as any other expression. Because the code accesses the AST, it's worth looking at what's inside the For statement node:

For (Target=name (id= ' i ', Ctx=store ()),
  iter=name (id= ' lst ', Ctx=load ()),
  body=[augassign (Target=name ' A ', Ctx=store ()),
          Op=add (),
          value=name (id= ' i ', Ctx=load ()))],
  orelse=[]

So I'm in a node named name. These are handled by the symbol table code through the following statements in SYMTABLE_VISIT_EXPR:

Case Name_kind:
  if (!symtable_add_def (St, e->v.name.id,
             e->v.name.ctx = = Load?) use:def_local))
    Visit_quit (St, 0);
  /* ... */

Because the variable i is clearly marked as def_local (because * _fast bytecode is accessible, it is also easy to observe that if the symbol table is not available then use the Symtable module), the above obvious code calls SYMTABLE_ADD_DEF and Def_local As the third parameter. Now let's take a look at the AST above and notice the ctx=store part of I in the name node. Therefore, it is the AST that stores the information of I in the target portion of the for node. Let's see how this is achieved.

The AST build part of the compiler crosses the parse tree (which is a fairly low-level representation of the source code-some background information can be obtained here), while in other cases setting the Expr_context property at some nodes, the most notable of which is the name node. Think about it, so, in the following statement:

Foo = bar + 1

Both the for and bar variables will end in the name node. But bar is only loaded into this code, and for is actually stored in the code. The Expr_context attribute is used to differentiate current and future use by symbol table code [5].

Returns the index variable for our for loop. The content will be processed in the ast--of the function ast_for_for_stmt--for statement creation. The following are the relevant parts of the function:

Static stmt_ty
ast_for_for_stmt (struct compiling *c, const node *n)
{
  Asdl_seq *_target, *seq = NULL, *suite_ seq;
  Expr_ty expression;
  Expr_ty Target, A;
 
  /* ... * *
 
  node_target = Child (n, 1);
  _target = Ast_for_exprlist (c, Node_target, Store);
  if (!_target) return
    NULL;
  * Check the # of children rather than the length of _target, since for
    X, in ... has 1 element in _target, but still Requires a Tuple. */
  A (expr_ty) asdl_seq_get (_target, 0);
  if (nch (node_target) = = 1)
    target = i;
  else
    target = Tuple (_target, Store, First->lineno, First->col_offset, C->c_arena);
 
  /* ... */return for
 
  (target, expression, suite_seq, seq, Lineno (n), N->n_col_offset,
        C->c_arena);

The store context was created when the function ast_for_exprlist was invoked, which creates a node for the index variable (note that the index variable of the For loop may also be the tuple of a sequence variable, not just a variable).

This function is part of the last resort in the process of describing why the for loop variable and other variables in the loop are treated equally. After marking in the AST, the code used to process the loop variable in the symbol table and in the virtual machine is the same as the code that handles the other variables.
Conclusion

This article discusses some specific behaviors that may be considered "incurable" in Python. I hope this article does explain Python's variables and scope code execution behavior, explaining why these actions are useful and are never likely to change, and how the Python compiler internally makes it work correctly. Thank you for your reading!

[1] Here, I'd like to make a joke about Microsoft Visual C + + 6, but the fact is disturbing because most readers of the blog in 2015 don't know the joke (which reflects my age, not the ability of my readers).

[2] You may say that dowithstuff (i) can enter the if before executing to the break. However, this is not always convenient. In addition, according to Guido's explanation, here is a good separation of our concerns-loops are used and used only for searching. After the search is over, what happens to the variables in the loop is not a recurring concern. I think this is a very good point.

[3]: Usually the code in my article is based on Python 3. Specifically, I expect the default branch of the next version (3.5) to be completed in the Python library. However, for this particular topic, any version of the source code in the 3.x series should be able to work.

[4] The other obvious thing about the function decomposition is that if the loop does not execute, why I is still invisible, get_iter and for_iter this pair of bytecode will our loop as an iterator, and then call its __next__ method. If the call finally ends with a throw stopiteration exception, the virtual machine catches the exception and ends the loop. Only when the actual value is returned does the virtual machine continue to perform store_fast to I, so that the value exists so that subsequent code can be referenced.

[5] This is a strange design, I suspect the essence of this design is to use relatively clean recursion to access the code in the AST, such as symbol table code and CFG generator.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.