Data types are the most basic components of programming languages, but they are the most easily overlooked. Programmers are willing to spend almost 100% of their energy on Algorithm Research, program flow control, and other aspects, however, it is rare to repeatedly consider data types.
Details determine success or failure. A screw error may lead to the destruction of an aircraft. A data type error can also cause a huge software system to crash.
The data type rules in Misra-C are mainly divided into two aspects. One is the programming style related to data types, and the other is the conversion between different data types. The latter is the focus. Here we will introduce some of misra_c's rules on data types. For more rules, see MISRA-C: 2oo4.
In the following section, all rules without special instructions are mandatory (required) rules. The advisory rules are marked with the "recommendation.
Before proceeding to the discussion, read two questions first. You can read this chapter with questions.
Question 1: What is the value of result_8 when executing the following program?
Ulnt8_t Pori = 0x5a;
Uint8 a t resuh_8;
Result_8 = (~ Port)> 4;
/* Note: uint8_t indicates an 8-bit unsigned integer */
Question 2: What is the value of D when executing the following program?
Uintl6_t A = 10;
Uin | 16_t B = 6553l;
Uint32_t C = 0;
Uint32_t D;
D = A + B + C;
/* Note: uintl6_t indicates a 16-bit unsigned integer, and uint32_t indicates a 32-bit unsigned integer */
1. programming style related to data types
Rule 6.3 (recommended): All data must be explicitly identified using typedef
To avoid using standard data types directly.
For example, a 32-bit integer system can be defined as follows:
Typedef char chat_t;
Typedef sigrled char int8_t;
Typedef signed short intl6_t;
Typedef signed int int32_t;
Typedef signed long int64_t;
Typedef unsitgned chat uint8_t;
Typedef unsigned short uint16_t;
Typedef unsigned int uint32_t;
Typedef unsigned 1ong uint64_t;
The reason for replacing standard data type identifiers such as signed short and unsigned int with intl6_t and uint32_t is that different compilers have different definitions of the length of standard data types. For example, a 16-bit system may define both short and INT as 16 bits and long as 32 bits, which is inconsistent with the length of the standard data type in the above 32-bit system. Variables are defined using identifiers such as intl6_t and uint_32. On the one hand, the readability of the program is increased, so that the programmer himself or other readers can be confident in the specific information of the program data; on the other hand, it also helps program porting between different systems, saving development time and reducing risks. Rule 7 1: Do not use an octal constant (except O) or an octal escape character.
Consider the following array:
Code [1] = 109;
Code [2] = 100;
Code [3] = o52
Code [4] = o71;
/* Note: The octal constant must be added with O at the highest bit */
The actual value of code [3] is 42 (decimal), and the actual value of code [4] is 57 (decimal ); however, it is estimated that many readers recognize code [3] As 52 (decimal) and Code [4] As 7l (decimal ).
Octal numbers are used much less frequently in C Programs than decimal and hexadecimal numbers. To ensure program readability and security, programmers are not allowed to use Octal numbers and octal escape characters.
2. Data Type Conversion
If the programmer has a clear understanding of the conversion of data types and has made a correct explicit forced conversion where necessary, the program is secure. But sometimes, due to the programmer's negligence or overly trusting the compiler's "wisdom", there are many implicit conversions in the expressions (that is, there is no explicit forced conversion ), however, implicit data type conversion is likely to pose a critical vulnerability. The focus of data type conversion rules in MISRA-C is to avoid the implicit data conversion vulnerability.
Before introducing some of the MISRA-C's rules for data type conversion, we will first introduce the "balance" principle of integer operands. The so-called integer operand "balance" principle, that is, for implicit expressions, the compiler will expand the number of digits of the operands according to the established rules, INT and unsiglled int play an important role in the "balance" Process of integer expressions.
The following is a simple implicit integer expression C = a + B (assuming that the number of storage digits of A is not greater than the number of storage digits of B). The compiler will process this expression like this:
If B is a short INTEGER (that is, the number of digits is less than int, such as char and short) or an integer (Int or unsigned INT), A is also a short integer or an integer, before the "+" operation is executed, both A and B will be expanded to an integer (Int or unsigned INT), and the result of the addition will be assigned to C (if C is not of the Int or unsigned int type, the value assignment operation also includes implicit expansion or truncation operations ).
If B is a long integer (more than int storage digits), A will be expanded to a long integer equivalent to B, and then perform the "+" operation, the result is assigned to C (which may include implicit expansion or truncation operations ).
When most operators are used for integer operations, they follow the aforementioned "balance" principle, such as arithmetic operators, bitwise operators, and Relational operators.
However, logical operators do not follow the above "balance" principle. In addition, the Left shift (<) and right shift (>) operators do not follow the "balance" principle and are only related to the integer operands on the left of the shift operator. Assume that the value of an eight-digit short integer is oxf5 (hexadecimal), the result of the four-digit shift to the right is o xof (hexadecimal ).
After clarifying the above background, let's take a look at Question 1 proposed at the beginning of this article (for the code, see the previous article ). Most people with Embedded C program development experience understand that the original intention of this Code is to reverse the port value and shift the 4-bit value to result_8 (control gongyang with I/O port ). ), the expected result is obviously resuit_8 = 0xof. However, due to the integer "balance" principle, in a 16-bit compiler ,~ The port value is oxffa5; In a 32-bit compiler ,~ The value of Pott is oxffffffa5. In either case, the final result (a truncation operation is performed when the value is assigned to result_8 after the 4-digit shift to the right) is resuit_8 = oxfa, not the programmer's expected result_8 = oxof.
If you change the last line of code to result 8 = (uin8_t )(~ Port)> 4, then result_8 can get the expected value.
In view of the above situation, Misra-C proposes the corresponding rules.
Rule 10.5: If the bitwise operator ~ When combined with the shift operator <(or>) to operate on the operands of the unsigned char or unsigned short type, the result of the intermediate operation step must be explicitly converted to the expected short Integer Data Type immediately.
In order to deepen our understanding of the "balance" principle, let's analyze "Question 2 ".
If a 32-bit compiler is used to compile this program, the final result is d = 6554l, And the programmer "lucky" gets the expected result. If it is a 16-bit compiler, the result is d = 5.
Because the "+" operation is left-bound, D = A + B + C is equivalent to d = (a + B) + C, that is, A + B is first executed, the sum of the obtained sum with C. the final result is assigned to D. The problem lies in the intermediate step a + B. Because both A and B are 16-bit integers (note that the compiler is also 16-bit), the result of A + B is also a 16-bit integer, then the value of A + B is ox0005 (with overflow), and the 32-bit integer ox00000005 and C are added to D, D = 5, which is not the expected result of the programmer.
Therefore, in a 16-bit compiler, the code in question 2 may cause serious errors. Of course, if the programmer uses () to specify the computing priority, that is, the last line of code is written as D = a + (B + C), the above overflow error can also be avoided, this is not a cure. Only by specifying the actual data type of each operand can code security be guaranteed.
The MISRA-C imposes strict restrictions on Implicit data type conversion in expressions.
Rule 10.1: Implicit data type conversion is not allowed in integer expressions.
① Integer operands are not the same signed integers extended to more digits;
② The expression is a complex expression;
③ The expression is not a constant expression and is a function parameter;
④ The expression is not a constant expression and is the return expression of the function ..
Rule 10.2: Implicit data type conversion is not allowed in floating-point expressions.
① Floating-point operands are not the same-Signed floating-point numbers extended to more digits;
② The expression is a complex expression;
③ The expression is a function parameter;
④ The expression is the return expression of the function.
The rules for integer expressions are similar to those for floating-point expressions, but the rules for floating-point expressions are more rigorous and the constants of floating-point expressions are strictly limited.
These two rules refer to the concept of "complex expressions. Note that the concept of "complex expressions" in MISRA-C is different from that in other books about C Programming specifications. In a MISRA-C, a non-complex expression is basically limited to the return values of a constant expression or function. To clarify the concepts of "complex expressions" and "Return expressions" in the above rules, here is an example. Define a function uintl6_t Foo (void). The function body is as follows:
Uintl6_t Foo (void ){
Return (A + B + C );
A + B + C in the last return (A + B + C) of the function body is the return expression. If a statement like a = Foo () exists elsewhere in the C program, the return value of the Foo () function is used. In Misra-C, the resources have completed the development of thermal printers using USB interface technology, and fully protected the print head. The value assignment expression is not a complex expression by using the corresponding algorithm ".
When expressions are used as function parameters, we will not detail them here due to space reasons.
Weigh the pros and cons, in the case of data type conversion, rather than spending a lot of effort to distinguish whether an implicit expression in the MISRA-C rule "Blacklist, it is better to use a forced conversion operator to explicitly identify the actual data type of each operand. This is the most secure method. All in all, the central meaning of the MISRA-C's data type conversion rules is to require the programmer to specify the actual data type of any operand.
3 knots
As a good programmer, the first step is to treat every piece of data in the program with a rigorous attitude, understand the key to any data operation, and thus write the most clear, easy-to-understand, and secure code. MISRA-C rules on data types ensure that programmers do not fall down when taking this step.