C + + side effects and sequence points

Source: Internet
Author: User
Tags openssl api volatile
Http://allchange.blog.sohu.com/156456875.html

SIDE EFFECT (side effects) and sequence points (sequence point) sharing

When the execution sequence runs to certain specific areas called sequential points, all the side effects that were previously computed should be completed without any side effects from the calculation of any sub-sequences.

The side effect of the expression is a simple example: the result of int b,a=5;b=++a*--a;b is probably 25, because this expression is to make a first self-add and then decrement a equals 5, the final result is 5*5 equals 25, it is possible to 30, because a first self-plus equals 6 and then decrement equals 5 The final result is 6* 5 equals 30. Anyway, the result is not necessarily. Because the C standard does not specify the value of which moment a is taken. But one thing is for sure, * after the + +,--operation.


The side effect of a function is that it is called by a function to make a real parameter worthwhile to change, or to produce a specific action called a function side effect. The function of a function call as a statement is reflected by the side effects of the function, so it makes no sense to invoke a function without side effects as a statement.

In C, the term side effect (side effect) refers to the modification of a data object or file. For example, the following statement var = 99;
The side effect is to change the value of Var to 99. Evaluation of an expression can also have side effects, such as:
SE = 100
The side effect of evaluating this expression is that the value of SE is modified to 100.
Sequence points (sequence point) are a special point in time at which a program runs, all side effects before that point have ended, and subsequent side effects have not occurred.
The C statement end flag-a semicolon (;) is a sequence point. That is, the side effects caused by assignment, self-increment, or self-subtraction in the C statement must end before the semicolon. We'll talk about some operators that contain sequence points later. The point at which any complete expression (fullexpression) operation ends is also a sequence point. The so-called complete expression, that is, the expression is not a sub-expression. The so-called sub-expression refers to an expression in an expression. For example:
f = ++e% 3
This entire expression is a complete expression. ++e, 3, and ++e% 3 in this expression are all its sub-expressions.
With the concept of sequence points, let's analyze a very common error below:
int x = 1, y;
y = x + + x + +;
Here y = x + + x + + is the complete expression, and X + + is its subexpression. The point at which the complete expression operation ends is a sequence dot, int x = 1,y; In the; is also a sequence point. In other words, X + + x + + is located between two sequence points. The standard specifies that, between two sequence points, the value saved by an object can be modified at most. But we can clearly see that in this example, the value of x has been modified two times between two sequence points. This is obviously wrong! Compiling this code on a different compiler may cause the value of Y to be different. The more common result is that the value of Y is finally modified to 2 or 3. Here, I do not intend to do a more in-depth analysis of this problem, you just have to remember that this is wrong, do not use it. If you are interested, take a look at the relevant information listed below.
The C language standard defines side effects and sequence points as follows:
Accessing a volatile object, modifying an object, modifying a file, orcalling a function then does any of those operations Is all sideeffects, which is changes in the state of the the execution environment. Evaluation of an expression may produce side effects. At certainspecified points in the execution sequence called sequence points, allside effects of previous evaluations shall be complete and no sideeffects of subsequent evaluations shall has taken place.
Translate as follows:
Accessing a volatile object, modifying an object or file, or invoking a function that contains these operations are all side effects, and they all change the state of the execution environment. Evaluating an expression can also cause side effects. Some specific points in the execution sequence are called sequence points. At the sequence point, the side effects of all operations before that point should end, and the side effects of subsequent operations do not occur.

----------------------------------------------------------------------------------------------
Let's take a look at the following code:
int i=7; printf ("%d\n", i++ * i++);
What do you think will return? 56? No The correct answer is to return 49? Many people will ask why? Shouldn't you print out 56? There is a very detailed explanation in Ccfaq, the root cause lies in the sequence points in C.
Note that although the suffix-plus and postfix-decrement operators + + and-do operations after outputting their old values, the "after" here is often misunderstood. There is no guarantee that the self-increment or decrement will occur immediately after the output variable's original value and before the rest of the expression is evaluated. There is no guarantee that the update of a variable will occur at some point before the expression is "done" (in terms of ANSI C, before the next "sequence point"). In this example, the compiler chooses to multiply the old values of the variables by multiplying them later. The self-increment operation is guaranteed to be truly executed only after reaching a sequence point.
The behavior of code that contains multiple, indeterminate side effects is always considered undefined. (In simple terms, "multiple side effects" refers to any combination of the self-increment, decrement, and assignment operators that cause the same object to be modified two times or modified later in the same expression.) This is a rough definition. Don't even try to explore how these things are implemented in your compiler (contrary to many of the mentally retarded exercises in C textbooks); As K&r wisely points out, "If you don't know how they are implemented on different machines, such ignorance may just help protect you."
So what does the so-called sequence point mean?
The sequence point is a point in time (after the entire expression has been fully computed or before the function call, or at the &&,?: or comma operator), the dust settles and all the side effects are guaranteed to end. The Ansi/iso C standard describes this:
Between the previous and the next sequence point, the value saved by an object can be modified only once by the expression's calculation. And the previous value can only be used to determine which value will be saved.
The second sentence is more puzzling. It says that in an expression, if an object needs to be written, access to the object in the same expression should be limited to the value that is directly used to evaluate the write. This rule effectively restricts expressions that only ensure that variables are accessed before they are modified to be valid.
For example i = i+1 legal, while a[i] = i++ is illegal. Why this code: a[i] = i++; Not working? subexpression i++ has a side effect-it changes the value of I-because I am referenced elsewhere in the same expression, which leads to undefined results, it is not possible to determine whether the reference (in the left A[i] is old or new. So, for a[i] = i++; We don't know which component of a[] will be rewritten, but I do increase by 1, right?
Not necessarily! If an expression and a program become undefined, all its aspects become undefined.
Why && | | operator can produce sequence points? These operators have a special exception here: If the left subexpression determines the final result (that is, true for | | and False for &&), then the subexpression on the right is not evaluated. Therefore, a left-to-right calculation ensures that the same is true for comma expressions. Furthermore, all of these operators (including?:) introduce an additional internal sequence point.

Sequence points in C/A + +


0. What are side effects (side effects)

C99 is defined as follows
Accessing a volatile object, modifying an object, modifying a file, or
Calling a function that does any of those operations is all side effects,
Which is changes in the state of the execution environment.

C++2003 is defined as follows
Accessing an object designated by a volatile lvalue, modifying an object,
Calling a library I/O function, or calling a function that does any of
Those operations is all side effects, which is changes in the state of
The execution environment.

It can be seen that the definition of side effects of C99 and c++2003 is basically similar, a program can be regarded as a state machine,
At any one time the state of the program contains all its object content and all its file contents (standard input
Output is also a file), side effects can cause the state to jump

Once a variable is declared as a volatile-qualified type, the value of the variable may be represented by the program's
Outside of the event changes, the value of each read is only valid at the moment of reading, and then if the value of the variable is used
Must be re-read and cannot inherit the last value, so a variable reading the volatile-qualified type is also
Think that there are side effects, not just rewriting

Note, it is generally not considered that the state of the program contains the contents of the CPU register, unless the register represents a variable,
For example
void Foo () {
Register int i = 0; The variable i is placed directly in the register, which is referred to as the Register variable
Note that register is only a recommendation and does not necessarily fit into the register
and the auto variable without the Register keyword may also be placed in the register
This is just for example, assuming I did put in a register
i = 1; Register content changes, corresponding to the state of the program changes, the statement has side effects
i + 1; At compile time the statement generally has a warning: "Warning:expression has no effect"
If the CPU executes this statement, it will certainly change the value of a register, but the program state
Does not change, except for the registers representing I, the program status does not contain the contents of other registers,
So the statement has no side effects
}
In particular, C99 and c++2003 noted that no effect expression allowed to be executed
An actual implementation need is evaluate part of an expression if it
Can deduce that it value is not used and that no needed side effects
is produced (including any caused by calling a function or accessing
A volatile object).


1. What is a sequence point (sequence points)

C99 and c++2003 have the same definition for sequence points
At certain specified points in the execution sequence called sequence
Points, all side effects of previous evaluations shall is complete and
No side effects of subsequent evaluations shall has taken place.

In the Chinese language, the sequence point is a specially specified position, requiring that the evaluations in front of that position
All of the side effects that are contained here have been completed, and any of the evaluations contained in the
The side effects haven't started yet.

For example, C + + has a sequence point after the complete expression (full-expression) is specified
extern int I, J;
i = 0;
j = i;
In the above code i = 0 and J = I are all a complete expression, which illustrates the end of the expression, so
There is a sequence point, which, according to the definition of sequence point, requires that the sequence before i = 0 J = i
The evaluation of i = 0 at the point and the end of all side effects (0 is written in i), and any side effects of j = I
It hasn't started yet. Because the side effect of j = i is to assign the value of I to J, and the side effect of i = 0 is to assign I to a value of
0, if the side effect of i = 0 occurs after j = i, it will result in the value of J after assignment is the old value of I, which shows
It's not right.

By the definition of sequence points and side effects it is easy to see that at a sequence point, all may affect program state
has been completed, can it be inferred that the state of a program at a sequence point should be deterministic
?! The answer is not necessarily, it depends on how our code is written. However, if the program is on a sequence point
The state cannot be determined, then the standard rules for such procedures are undefined behavior, which will later explain
This question


2. Expression evaluation (evaluation of expressions) and side effects occur in order of each other

C99 and c++2003 both stipulate
Except where noted, the Order of evaluation of operands of individual
Operators and subexpressions of individual expressions, and the order
In which side effects take place, is unspecified.

In other words, C + + indicates the order in which operands are evaluated in the general case during the evaluation of an expression and the secondary
The sequence of effects occurs is not illustrated (unspecified). Why doesn't the C + + + Define these orders in detail?
The reason for this is because C + + is a language that is extremely efficient, and does not specify these sequences in order to allow the compiler
There is a greater margin of optimization, such as
extern int *p;
extern int i;
*p = i++; (1)
According to the foregoing, it is up to the compiler to determine whether *P is evaluated first or i++ in the expression (1).
Two times the order in which the side effects (*p assignments and i++) occur is determined by the compiler; even the child table
The evaluation of the i++ (that is, the initial value of I) and side effects (adding 1 to i) do not require synchronization to occur,
The compiler can assign a value to *p first with the value of the initial I (that is, the value of the subexpression i++), and then add I to 1.
In this way, the entire computational process of the i++ expression is divided into two nonadjacent steps. and usually the compiler
is realized because the i++ evaluation process differs from *p = i++, for a separate table
i++, the execution order is generally (assuming that the INC directive is not considered): first load I into a register a (e.g.
If I is a register variable, this step can be skipped), the value of register A is added 1, the new value of register A is written back
The address of I, for *p = i++, if you want to first complete the calculation of the subexpression i++, because the value of the i++ expression
is the old value of I, it also requires an additional register B and an additional instruction to assist *p = i++
execution, but if we first write the value loaded to a to *p and then execute the instruction to add 1 to I, then
Only one register is required, which is important for many platforms because the number of registers
is limited, especially if someone writes the following statement
extern int I, j, K, X;
x = (i++) + (j + +) + (k++);
The compiler can compute (i++) + (j + +) + (k++) values before adding 1 to I, J, and K, and finally
Writes I, J, K, and x back to memory, which is more efficient than the semantics of every complete + + +


3. Limitations on side effects of sequence points

C99 and c++2003 have similar provisions, as follows
Between the previous and next sequence point a scalar object shall
Has its stored value modified at the very once by the evaluation of the
Expression. Furthermore, the prior value shall be accessed
Determine the value to be stored. The requirements of this paragraph
Shall is met for each allowable ordering of the subexpressions of a
Full expression; Otherwise the behavior is undefined.

In other words, an object is allowed to be modified only once between adjacent two sequence points, and if a
object is modified, the only purpose of reading the variable between the two sequence points is to determine the
The new value of the image (for example, i++, you need to first read the value of I to determine that the new value of I is the old value +1). In particular, the standard
Requires that any possible order of execution must satisfy the condition, otherwise the code will be undefined behavior

The reason for this is that the sequence points have such limitations on side effects because the C + + standard does not stipulate that sub-expressions require
Values and the order in which the side effects occur, such as
extern int i, a[];
extern int foo (int, int);
i = ++i + 1; The two modifications that the expression makes to I need to be written back to the object, and the final value of I depends on
In the end, if the assignment action is last written back, the value of I
is the old value of I plus 2, if the ++i action is last written back, then the value of I is the old value plus 1,
So the behavior of this expression is undefined
a[i++] = i; If the expression to the left is evaluated first and the side-effect of the i++ is completed, the right
The value is the old value of I plus 1, if the side effect of i++ is finally completed, then the value on the right is I
, which also results in an indeterminate result, so the behavior of the expression will be
Undefined
Foo (foo (0, i++), i++); For function calls, standard does not prescribe evaluation of function parameters
Order, but the standard specifies that all parameters are evaluated to enter the function body
There is a sequence point before execution, so there are two types of this expression
Line, one is to evaluate the i++ of the outer foo call first and then evaluate the value
Foo (0, i++), then goes into foo (0, i++) execution, which
There is a sequence point, which is executed in two adjacent sequences
I was modified two times between points, undefined
Another way to do this is to first evaluate foo (0, i++), because here
There is a sequence point, and then the second i++ is evaluated in the new sequence
Point, so it is not considered that the two consecutive sequence points are modified between I
Twice
However, it has been pointed out in the preceding standard that any possible execution path
Must satisfy the condition is defined behavior, this code is still
It's undefined.

I mentioned earlier that the state of the program at a sequence point is not necessarily deterministic, because the next two sequential
Multiple side effects may occur between column points, and the order in which these side effects occur is unspecified if more than one
A side effect is used to modify the same object, such as sample code i = ++i + 1, the result of the program is
In the order in which the side effects occur, and if an expression modifies an object and needs to read the
And the value of the Read object is not used to determine the new value of the object, reading and modifying the sequence of two actions
can also cause the state of the program to not be uniquely determined
Fortunately, "an object is allowed to be modified only once between adjacent two sequence points, and if a
object is modified, the two sequence points can only be read once to determine the new value of the object.
Regulations ensure that the required procedures are in place at any one of the sequence points and their status can be determined.

Note that because operator overloads exist for UDT types, function semantics provide new sequence points, so some
For an expression with a built-in type of undefined behavior may be well-defined for the UDT,
For example
i = i++; If I is an built-in type object, the expression is between two contiguous sequence points
I modified two times, undefined
If I is a UDT type the expression may be i.operator= (i.operator++ (int)),
After the function parameter is evaluated, there is a sequence point, so the expression is not in two
Modify I two times between adjacent sequence points, OK

This shows that common problems such as printf ("%d,%d", i++, i++) are wrong, and this type of question
It doesn't make any sense to use it as a pen test or interview question.
Similar problems occur in cout << i++ << i++, if overload resolution
Select member function Operator<&lt, which is equivalent to (cout.operator<< (i++)) .operator<< (i++),
Otherwise equivalent to operator<< (operator<< (cout, i++), i++), if I is the built-in type to
Like, this notation is consistent with the problem of Foo (foo (0, i++), i++), which is undefined behavior because there are
An execution path causes I to be modified two times between two contiguous sequence points, or if I is a UDT
is well-defined, as I = i++, but this is not recommended, because the standard for the function
The order in which the parameters are evaluated is unspecified, so which i++ is not expected to be calculated first, which still brings
The question of portability, which should be avoided


4. Compiler cross-sequence point optimization

According to the foregoing discussion, the permissible behavior for the same variable i within the same expression is
A. Do not read, overwrite once, for example
i = 0;
B. read one or more times, overwriting once, but all reads are only used to determine the new value after the overwrite, for example
i = i + 1; Read once, rewrite it once
i = i & (i-1); Read two times, rewrite once, thank Puke for the example
C. Do not overwrite, read one or more times, for example
j = i & (i-1);

For cases B and C, the compiler has a certain optimization right, and it can read only the value of the variable once and then
Use this value directly multiple times

However, when the variable is a volatile-qualified type, the compiler allows the behavior of exactly how it is currently
No definitive answer was found, ctrlz that if you read the same volatile-between two adjacent sequence points
The qualified type object is still undefined behavior multiple times because the read action has a Vice-
And this side effect is equivalent to modifying the object, Roachcock's opinion is that two contiguous sequence points are read between
The same volatile-qualified type should be legal, but cannot be optimized to read only once. One
Examples of code that are common in embedded development such as the following
extern volatile int i;
if (i! = i) {//detect if I have changed in a very short time
// ...
}
If I! = i is optimized for read-only once, the result is constant false, so roachcock that the compiler cannot
Enough to make a read-only optimization of variables of type volatile-qualified. Ctrlz that this piece of code
itself is not correct, should be changed to write
int j = i;
if (j! = i) {//will separate multiple reads of volatile-qualified type variables with sequence points
// ...
}

Although it is not yet possible to determine that a variable of type volatile-qualified is read multiple times between adjacent two sequence points
Whether the behavior is legitimate and how it will be optimized (anyway, for the volatile-qualified type
The code should be avoided as much as possible, but to be sure, for variables of type volatile-qualified
Must be re-read after a sequence point, volatile is used to prevent the compiler from making a cross-sequence point
, and multiple reads of a non-volatile-qualified-type cross-sequence point may be
Optimized to read only once (until a statement or function changes the variable, before the compiler
You can assume that a variable of type non-volatile-qualified will not change because the current C + +
Abstract machine models are single-threaded), such as
BOOL flag = TRUE;
void Foo () {
while (flag) {//(2)
// ...
}
}
If the compiler detects that Foo () does not have any statements (including functions called by Foo ()), flag has been repaired
Change, you might optimize (2) to read the value of the flag only when you enter Foo () instead of each loop
All read once, this cross-sequence point optimization is likely to lead to a dead loop. But this code in multithreaded
is common in the process, although Foo () has not modified flag, perhaps in a function call of another thread
Flag is modified to terminate the loop, in order to avoid this cross-sequence point optimization brought to an error, the flag sound should be
The description of volatile bool,c++2003 is as follows
[Note:volatile is a hint to the implementation to avoid aggressive
Optimization involving the object because the value of the object
Might is changed by means undetectable by an implementation. See 1.9
For detailed semantics. In general, the semantics of volatile is
Intended to being the same in C + + as they is in c.]


5. List of sequence points defined by C99

-the call to a function, after the arguments has been evaluated.
-the end of the first operand of the following operators:
Logical AND &&;
Logical OR | | ;
Conditional?;
Comma,.
-the end of a full declarator:
declarators;
-the end of a full expression:
an initializer;
The expression in an expression statement;
The controlling expression of a selection statement (if or switch);
The controlling expression of a while or does statement;
Each of the expressions of a for statement;
The expression in a return statement.
-immediately before a library function returns.
-after the actions associated with each formatted Input/output function
Conversion specifier.
-immediately before and Immediately after each call to a comparison
function, and also between any call to a comparison function and any
Movement of the objects passed as arguments to.


6. List of sequence points defined by c++2003

All C99 defined sequence points are also the sequence points defined by c++2003
In addition, C99 only specifies that a sequence point is returned after the library function returns, and does not stipulate that the normal function returns
There is a sequence point, and c++2003 specifically points out that the entry function (Function-entry) and the Exit function
(Function-exit) Each has a sequence point, that is, after copying the return value of a function, there is also a
Sequence points

It is necessary to note that because operator| |, operator&&, and operator can be overloaded, when it
We do not provide the sequence points specified by built-in operators when using function semantics, and
There is only one sequence point after all parameters of the function are evaluated, and also the function semantics do not support | |, &&
Short-circuit semantics, these changes are likely to cause hard-to-find errors, so it is generally not recommended to overload this
Several operators


7. Effects of two changes on lvalue in c++2003 on sequence points

In the C language, the result of assignment operators is that non-lvalue,c++2003 will assignment
Operators's results have been changed to Lvalue, it is unclear what this change means for the built-in type
But it leads to a lot of legal C code in the current C + + is undefined behavior, for example
Such as
extern int i;
extern Int J;
i = j = 1;
Since the result (j = 1) is lvalue, the result is a right-hand operand assigned to I, which requires a lvalue-
To-rvalue conversion, this conversion represents a read semantics, so i = j = 1 is
is to assign a value of 1 to J first and then read the value of J to assign to I, the behavior is undefined, because the standard Rules
A read between two adjacent sequence points can only be used to determine the new value of the modified object, not after the modification
Re-read
As the result of the C++2003 Regulation assignment operators is lvalue, the following are illegal in C99
The code can be compiled in c++2003
extern int i;
(i + = 1) + = 2;
Obviously, according to C++2003, this code behaves as undefined, which is between two contiguous sequence points.
Modified I two times

Similar problems occur on the prefix ++/--operators of the built-in type, c++2003 the prefix ++/--
The results from Rvalue to Lvalue, which even led to the following code also undefined behavior
extern int i;
extern Int J;
i = ++j;
It is also because Lvalue, as the right operand of the assignment operator, requires an lvalue conversion, which
A read action that occurs after the object has been modified

This change in C + + is clearly ill-conceived, leading to a lot of C-language idioms being undefined
Behavior, Andrew Koenig submitted to the C + + standards Committee in 1999 a
Add new sequence points for assignment operators, but so far the C + + standards Committee
have yet to agree on the issue, I will attach Andrew Koenig's offer, and if any of them sometimes
Have an interest, can see, but do not see there is no loss:-)
Read (847) | Comments (0) | Forwards (0) |0

Previous: The representative suggested that the NPC and CPPCC representative members to admire Chairman Mao photograph

Filed under: Kernel command using the Linux system calls

Related Popular articles
    • Test123
    • Write security code--be careful with the number of symbols ...
    • Encrypting and decrypting using the OpenSSL API ...
    • Print your own C program for a while ...
    • C + + interface for SQL Relay
    • Linux DHCP Peizhi ROC
    • Soft links to Unix files
    • What does this command mean, I'm new ...
    • What does sed-e "/grep/d" mean ...
    • Who can help me solve Linux 2.6 10 ...
Leave something to the owner! ~~ Comment on the hot topic

C + + side effects and sequence points

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.