The excessive discussion of the order in which C/C ++ expressions evaluate values is unnecessary.

Source: Internet
Author: User

Qiu zongyan: Evaluate expressions in C/C ++

I often see the following questions in some discussion groups: "Who knows what value the C statement assigns to n ?"
M = 1; n = m ++ M ++;
Recently, an unfamiliar friend sent me an email asking why two 4 expressions are printed in a C ++ system, instead of four and five:
A = 4; cout <a ++ <;
C ++ is not a rule <is the left-side combination of operations? Is it wrong in C ++ or is there a problem with the implementation of this system?
To understand this, you need to understand the following problem: If a variable is modified somewhere in the Program (through assignment, increment/decrease operations, etc ), when can I get a new value from this variable? Someone may say, "What's the problem! I modified the variable, and from this variable value, the modified value is of course obtained !" In fact, it is not that simple.
C/C ++ is an expression-based language. All calculations (including assignments) are completed in the expression. "X = 1;" is the semicolon that indicates the end of the statement after the expression "x = 1. To understand the meaning of a program, first understand the meaning of the expression, that is, 1) the calculation process determined by the expression; 2) it has an impact on the environment (which can be viewed as all available variables at the time. If an expression (or subexpression) calculates only the value without changing the environment, we can say that it references transparent, this type of expression does not affect other computations (it does not change the computing environment. Of course, its value may be affected by other computations ). If an expression not only calculates a value but also modifies the environment, it has side effects (because it does more ). A ++ is an expression with side effects. These arguments also apply to similar problems in other languages.
Now the question becomes: if a certain expression (Part) in the C/C ++ program has a side effect, when will this side effect be actually reflected in use? To make the problem clearer, we assume that the program has a code snippet "... A [I] ++... A [J]... ", assuming that the values of I and j are exactly the same (A [I] And a [J] Just reference the same array element ); assume that a [I] ++ is indeed calculated before a [J], and that there are no other actions for modifying a [I. Under these assumptions, can a [I] ++'s modifications to a [I] be reflected in the evaluate of a [J? Note: static determination is not possible because I and j are equal. In the target code, the access to these two array elements (access to memory) must be completed through two separate codes. The computation of modern computers is done in registers. The problem now becomes: whether the updated value of a [I] has been (from registers) before the code that obtains the value of a [J] is executed) save to memory? The answer to this question is clear if you understand the language requirements.
Generally, the language specifies the latest implementation time (called sequence point, sequence point, or execution point) for variable modification during execution ). There are a series of sequence points (time points) in program execution. The language ensures that all modifications (side effects) that occur before the execution arrive at a sequence point) must be implemented (must be reflected to the access to the same storage location), and all subsequent modifications have not yet taken place. There is no guarantee between the order points. The concept of sequence points is particularly important for languages such as C/C ++ that allow expressions to have side effects.
Now the answer to the above question is clear: if there is an order between a [I] ++ and a [J, therefore, a [J] can obtain the modified value; otherwise, it cannot be guaranteed.
C/C ++ language definition (Language Reference Manual) clearly defines the concept of sequence points. The order point is located:
1. When each complete expression ends. The complete expression includes the variable initialization expression, expression statement, Return Statement expression, and control expression of the condition, loop, and switch statement (the for header has three control expressions );
2. Operators &, | ,? : After calculation with the first operation object of the comma operator;
3. After the evaluation of all the actual parameters and function name expressions in the function call (the function to be called may also be described through the expression) is completed (before entering the function body ).
Assume that Ti AND Ti + 1 are two successive sequence points. to Ti + 1, any C/C ++ System (VC and BC are C/C ++ systems) all side effects after Ti must be achieved. Of course, they can also achieve side effects at any time between the time period [T, Ti + 1] and between the time periods [T, Ti + 1, this is because the C/C ++ language allows these options.
In the previous discussion, we assumed that a [I] ++ did this before a [I. Whether a [I] ++ is first executed in a program fragment depends on the computing process determined by its expression. We are all familiar with the priority, associativity, and parentheses in C/C ++, but the computing sequence of multiple computing objects is often ignored. Take the following example:
(A + B) * (C + D) Fun (A ++, B, A + 5)
Which of the two calculation objects of "*" is counted first? In what order are fun and its three parameters calculated? For the first expression, it does not matter if any calculation order is used, because the subexpressions are reference transparent. The real parameter expression in the second example has a side effect, so the order of calculation is very important. A few languages specify the computing sequence of the computing object (from left to right in Java), and C/C ++ does not specify it intentionally, neither specify the calculation sequence (except &, | and,) of two objects in most binary operations, nor specify the calculation sequence of function parameters and called functions. When calculating the second expression, first calculate fun, a ++, B, and a + 5 in a certain order, followed by a sequence point, and then enter the function execution.
Many books have errors on these issues (including some popular books ). For example, C/C ++ calculates the left side (or the right side) first, or a C/C ++ system calculates one side first. These statements are all incorrect! A c/C ++ system can always calculate the left or right, or sometimes the left or right, or in the same expression, you can calculate the value on the left or on the right. Different systems may adopt different sequences (because they all comply with language standards). different versions of the same system can adopt different methods. The same version is optimized in different ways, different sequences may be used at different locations. Because these practices comply with language standards. Here, we also need to pay attention to the order point problem: even if the expression on one side is calculated first, its side effects may not be reflected in the memory, so there is no impact on the calculation on the other side.
Return to the previous example: "Who knows what value the following C statement assigns to n ?"
M = 1; n = m ++ M ++;
The correct answer is: no! The language does not specify what it should calculate, and the results depend entirely on the specific processing of the specific system in the specific context. It involves the order of values of calculation objects and the implementation time of variable modification. For:
Cout <a ++ <;
We know it is
(Cout. Operator <(A ++). Operator <();
. First, let's look at the outer function call. Here we need to calculate the used function (obtained by an underscore) and the value of. The language does not specify which one to calculate first. If we calculate the function first, another callback function is called in this computation. There is an order before the executed function body. Then, the side effects of a ++ will be realized. If the parameter is calculated first, the value 4 of A is obtained, and the side effects of the function are certainly not changed (in this case, two 4 is output ). Of course, these are just assumptions. Actually, we should say that such things should not be written at all, and it is meaningless to discuss the effects.
Some people may say, why do people not clearly define the sequence when designing C/C ++, saving them from these troubles? C/C ++ is intended to allow the compiler to take any order of value, this allows the compiler to adjust the sequence of commands that implement expression evaluation as needed to get code with higher efficiency. Strictly specifying the order and Effect of expressions like Java not only limits the language implementation method, but also requires more frequent memory access (to achieve side effects ), these may cause considerable efficiency losses. It should be said that on this issue, both C/C ++ and Java have implemented their respective design principles, each having their own gains (potential efficiency of C/C ++, java clearer program behavior), of course, also lost. It should also be pointed out that most programming languages actually adopt C/C ++-like rules.
After so many discussions, What conclusions should we draw? The C/C ++ language tells us that the results of any expressions dependent on the specific computing order and dependent on the modification effect between the order points are not guaranteed. The rule that should be implemented in programming is: if there are multiple references to the same "variable" in any "full expression" (forming a computation ended by a sequence point, then the expression should not have side effects on this "variable. Otherwise, the expected results cannot be obtained. Note: The problem here is not a try in a system, because it is impossible for us to test all possible expression combinations and all possible contexts. The language, rather than an implementation, is discussed here. All in all, do not write this expression. Otherwise, we may encounter problems in an early stage or evening stage.
Postscript: I attended an academic conference last year. I saw some colleagues writing articles to discuss the order in which expressions in a C system are evaluated and summarize some "rules ". I learned from the discussion that a "Programmer Level Test" has such questions. This made me feel uneasy. This year, I gave lectures to a teacher's class. I found that many professional class teachers are not very clear about this basic problem, and I think the problem is indeed serious. Therefore, I have compiled this short article for your reference.
Note: over four years have passed, and many new and old textbooks are still tirelessly discussing the original meaningless problems in C language (as mentioned in this Article ). People who want to learn and use C language should not fall into it.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.