Qiu Zongyan: An expression evaluation in C + + language __c++

Source: Internet
Author: User
You can often see the following question in some discussion groups: "Who knows what value the C statement below assigns to N." ”
m = 1; n = m+++m++;
Recently an unknown friend sent me an email asking why in a C + + system, the following expression prints two 4 instead of 4 and 5:
A = 4; cout << a++ << A;
C + + is not a rule << operation left combined. C + + book is wrong, or the implementation of the system has problems.
One of the things to understand about this is that if a variable is modified somewhere in the program (by assignment, increment, decrement, etc.), when the variable is able to fetch the new value. One might say, "What's the problem?" I changed the variable, and then I took the value from the variable, which is, of course, the modified value. "It's not that simple," he said.
The C + + language is an expression-based language, and all calculations (including assignments) are done in the expression. "x = 1;" is the expression "x = 1" plus the semicolon that represents the end of the statement. To understand the meaning of the program, first of all, the meaning of the expression, that is: 1 The expression of the calculation process determined, 2 it to the environment (can see the environment as all the variables available at the time) impact. If an expression (or subexpression) evaluates only a value without altering the environment, we say that it is a reference to transparency, an expression that does not affect other computations (without changing the computing environment). Of course, its value may be affected by other calculations). If an expression not only calculates a value, but also modifies the environment, it says that the expression has a side effect (because it does a lot of extra work). a++ is an expression that has side effects. These statements also apply to similar problems in other languages.
The question now becomes: if there is a side effect of an expression (part) in a C + + program, this side effect can actually be reflected in use. To make the problem clearer, we assume that there are snippets of code in the program ... a[i]++ ... a[j] ... ", assuming that the value of I and J is exactly equal (A[i] and a[j] just refer to the same array element), assuming that a[i]++ is actually evaluated before a[j, and that there are no other modifications in between a[ I] action. Under these assumptions, a[i]++ changes to a[i] can be reflected in the evaluation of A[J]. Note: Since I and j are equal to the problem cannot be statically determined, in the target code, the two array element access (access to memory) must be done through two separate pieces of code. Modern computing is done in registers, and the question now becomes: Before the code that takes the A[j value is executed, A[i] The updated value has been saved from the register to memory. The answer to this question is clear if you understand the language's provisions in this regard.
Programming languages usually specify the latest implementation time (called a sequence point, order point, or execution point) in which variable modifications are performed. There is a sequence of sequences (moments) in the execution of a program, and once the execution reaches a point of order, all modifications (side effects) that occurred prior to this must be implemented (must be reflected in subsequent access to the same storage location), and none of the changes after that have occurred. There is no guarantee between the order points. The concept of sequential points is particularly important for languages with side effects that allow expressions to be expressed.
Now the answer to the above question is clear: if there is a point of order between a[i]++ and A[j, then it is guaranteed that a[j] will get the modified value;
C + + language definition (reference manual for language) clearly defines the concept of the order point. The order points are located at:
1. At the end of each full expression. Full expressions include variable initialization expressions, expression statements, return statement expressions, and control expressions for conditions, loops, and switch statements (for headers have three control expressions);
2. Operator &&, | |,?: and the first arithmetic object of the comma operator is computed;
3. After the evaluation of all actual and function name expressions (functions that need to be invoked may also be described by expressions) in a function call (before the function body is entered).
Assuming that Ti and ti+1 are two sequential points before and after, to the ti+1, any C + + system (VC, BC, etc. are C/s + + systems) must realize all the side effects after ti. Of course, they can also not wait until the moment ti+1, you can choose at any time between [T, Ti+1] to achieve the side effects during this period, because the C + + language allows these choices.
The previous discussion assumed that a[i]++ was done before A[i]. Whether a[i]++ is done first in a program fragment is also related to the computational process determined by the expression in which it is located. We are familiar with the rules for precedence, binding, and parentheses in C + + languages, and the Order of computation in which multiple objects occur is often overlooked. Look at the following example:
(A + B) * (C + D) Fun (a++, B, a+5)
Which of the two operands of the "*" Here is first counted. Fun and its three parameters are computed in what order. It doesn't matter if the first expression is in any calculation order, because the subexpression in it is all reference transparent. In the second example, the argument expression has side effects, and the order of calculation is very important. A few languages specify the order in which computing objects are computed (Java rules are left to right), and C + + does not specify the order of calculation of the two objects for most of the two-dollar operation (except for the &&, | | And, nor does it specify the order in which the function parameters and the adjusted functions are calculated. When evaluating the second expression, the fun, a++, B, and a+5 are first sorted in some order, followed by the order point, and then into the function execution.
Many books are wrong on these issues (including some very popular books). For example, C + + is first counted to the left (or right), or to a C + + system to calculate a certain side first. These statements are all wrong. A/C + + system can always be counted to the left or always first to the right, but also can sometimes calculate the left sometimes first to the right, or in the same expression sometimes first counted to the left sometimes first counted to the right. Different systems may be in different order (because they all conform to the language standard); Different versions of the same system can be used in different ways; the same version may be in different order in different optimization modes. Because these practices are consistent with the language specification. Here also note the problem of order point: even if the expression on one side of the first, its side effects may not be reflected in memory, so the other side of the calculation has no effect.
Back to the previous example: "Who knows what value the C statement below assigns to N." ”
m = 1; n = m++ +m++;
The correct answer is: I don't know. The language does not prescribe what it should be, and the result depends entirely on the specific context in which the specific system is handled. It involves the calculation sequence and the realization time of variable modification. For:
cout << a++ << A;
We know it is
(Cout.operator << (a++)). Operator << (a);
's Shorthand. Looking at the outer function call, we need to work out the function (which is obtained from the underlined section) and also calculate the value of a. The language does not stipulate which is to be counted first. If the function is really first, there is another function call in this calculation, there is a sequence point before the function body is executed, then the side effect of the a++ will be realized. If it is the first parameter, the value of a is calculated 4, and then the side effect of the function will not change it (in this case, output two 4). Of course, these are just assumptions, and the practical thing to say is that this stuff shouldn't be written at all, and it doesn't make sense to discuss its effects.
One might say, why do people design C + + without the order clearly, to avoid these problems. C + + language practices are entirely intentional, the purpose of which is to allow the compiler to use any order of evaluation, so that the compiler in the optimization can be adjusted to implement the expression evaluation of the sequence of instructions to get more efficient code. The ordering and effect of expressions, as strictly defined in Java, not only limits the way language is implemented, but also requires more frequent memory access (for side effects) that can result in considerable efficiency losses. It should be said that, on this issue, the C + + and Java choices have been implemented in their respective design principles, each has been (C + + potential efficiency, Java clearer procedural behavior), of course, has been lost. It should also be noted that the majority of programming languages actually adopt a similar requirement as C + +.
Having discussed so much, what should be the conclusion? The C + + language rules tell us that any expression that relies on a particular order of computation and that relies on implementing a modification between sequential points is not guaranteed. The rule to be implemented in programming is that if there are multiple references to the same "variable" in any "complete expression" (which forms a calculation at the end of a sequence point), then the side effects of the "variable" should not appear in the expression. Otherwise there can be no guarantee of the expected results. Note: The problem here is not a question to try in a system because we cannot experiment with all possible combinations of expressions and all possible contexts. The language is discussed here, not an implementation. All in all, never write this expression, or we will have trouble in some kind of environment sooner or later in the evening.
PostScript: Last year to attend an academic conference, saw a peer to write articles to discuss the expression of a C system in what order to evaluate the value, and summed up some "laws." We learned from the discussion that a "Programmer's proficiency test" had such a problem. This makes me feel very uneasy. This year to teach a teacher class, found that many professional teachers are also not very clear on this basic problem, but also feel that the problem is indeed serious. Therefore, this essay is sorted out for your reference.
Post PostScript: More than 4 years later, many new and old textbooks are still taking pains to discuss the original meaningless problems in C (as noted in this article). People who want to learn and use C don't get caught up in it.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.