[C] 02-program structure and preprocessing

Source: Internet
Author: User

Before entering the C syntax, it is necessary to browse its overall appearance and composition elements. This part of content is unfamiliar to most people, but they are the starting point and skeleton of C. The background or details involved in these content can be expanded into special topics. Here is just a brief introduction. Just give a rough description.

1. c Program Composition

Any program first exists in the form of a source file, which is a common text file. A c program is generally composed of a series of files suffixed with. C and. h. The former contains the execution content of the program, and the latter contains various declarations or definitions. In fact, file names are not important. Such extensions are just a convention. However, we recommend that you maintain this style. First, you can check the program clearly, and second, they have become the identifier of the C file in the integrated development environment (IDE.

A text file consists of many characters. The encoding method of these characters is determined by the editor. What makes sense to the C compiler is the characters they represent rather than the encoding itself, which will be mapped to the source character set (typically UTF-8) before preprocessing begins ). Preprocessing is performed under this character set. After preprocessing, the characters and strings are mapped to the execution character set, which is determined by the target platform, but generally the same as the former.

Both character sets contain base character set, which means that we normally use English letters and symbols (the encoding is the same in both character sets), and letters are case sensitive. The new standard also supports extended character set, which can appear in two character sets. For example, Unicode can be used in multiple places in the source Character Set: identifier, char constant, string literal, headfile name, comment, and preprocessing token. The following is an example (the compiler must support or enable the switch), but this encoding style is not recommended.

// Define variable αwchar_t \u03B1 = L‘α‘;

C language compilation is based on Translation Unit, which is a pre-processed. c file. The compilation of each unit is irrelevant, and the connector will eventually integrate them with the library into the execution file. I plan to open another topic for compilation, connection, and debugging. The following are common errors of a Multi-file program, but no errors are reported during the compilation of the connection because the syntax cannot be checked across units. In runtime file 2, the array a element is used as the address and an error occurs.

// File 1int a[3];// File 2extern int *a;  // should be a[]

C Programs may run independently (embedded) or in the operating system. The C Specification has slightly different requirements for these two cases, namely, freestanding implementation and hosted implementation. The latter requires more libraries and must have a main function. The former requires only a few necessary libraries, and the program entry is not specified (but it is recommended to use main ). The main function can be in the following two forms: for the second form, the specification requires that argv [0] is the program name, argv [argc] = NULL.

int main(void);int main(int argc, char* argv[]);  // or char** argv

When the program is running, there will be a heap and stack in addition to the constant area (code and string) and data area in the memory. Generally, the bottom of the stack increases from a high address to a low address. The address of the hosted program is generally a logical address, and the OS is responsible for ing to the physical address during running.

2. preprocessing steps

Step 2: Character Set ing. Map the characters in the source text file to the source character set, and even include the unified encoding of line breaks. C also requires that each line end with a line break. If there is no line break at the end of the file, the compiler will warning (to open ).

Step 2: trigraph sequance. To support some old keyboards, how does C use ?? X to escape the symbols they do not have. So please avoid using it in the code ?? Sequence, which can be \? Escape. The following table shows the trigraph sequance that is supported by the specification and is not escaped from the table.

?? ( ??) ?? < ?> ?? = ?? / ?? ' ??! ?? -
[ ] { } # \ ^ | ~

Step 2: Remove "\ + press ENTER ". Remove the combination without generating or eliminating spaces. Therefore, identifier can also be disconnected, but it is generally used for macro definition and string line feed (see schematic code ). Note that the leading white space of the next row is retained, so spaces cannot be added for format alignment.

#define INC(a)           \{                           a++;                  }char str[] = "this is a  long string";

Step 2: preprocessing token. Replace the Annotation with a space to parse the PP token and white space ). White spaces include spaces, line breaks, tabs, etc. The line breaks are retained, and other white spaces are implemented based on the processing. Comments/**/can be cross-row and cannot be nested. If you want to temporarily comment out a piece of code, you 'd better use # If 0. Note // serves the end of the line. The old C does not support this usage.

// should use #if 0/*int a; /* declare var */*/

Step 2: preprocessing commands. Expand the macro, import the contained files, and execute the preprocessing command until the process ends. The pre-processing instruction (Directive) starts with "#" and only contains the current row. The front and back of "#" can be blank.

Step 2: string processing. The character (string) is mapped to the execution Character Set, including the escape sequence \ x encoding. Concatenates adjacent strings into one string and adds only one Terminator '\ 0 '.

char str[] = "This is a "             "long string";

Step 2: C token. Map pp token to C token, and the blank space is discarded.

3. Token Parsing

The Programming Language analyzes lexical, syntax, and Semantic Based on the character set. C lexical analysis is to break down the program into tokens, which is completed in the preprocessing phase. Generally, tokens are not given much meaning, but are roughly classified based on the series features. The Compiler parses tokens based on these features. Token parsing adopts the greedy principle (also called the longest principle). The next character of a token cannot constitute a meaningful token with it. Segmentation that does not satisfy the greedy principle is not adopted even if it makes sense. During preprocessing, tokens are roughly divided into four categories: identifier, data constant, string, and punctual.

Identifier is the identifier, which includes Directive, keyword, object, function, Tag, Member, name, lable, Macro, etc. It is simply used to represent the name of a thing. Identifier lexical information is familiar to everyone. It consists of letters, numbers, and '_', but cannot begin with a number. Identifier should not be too long, because some compilers will intercept it. In addition, when naming, try to avoid keywords such as _ XXX _ and _ axx _ (starting with an uppercase letter), which are reserved for the system.

Data constant is a variety of constants, including integer, floating point, and character. They have their own formats, which will be described in the next chapter. Headfile name and string literal use <> or "" as the boundary, which can contain spaces. In other cases, spaces and symbols are often the boundaries of tokens.

Punctual is a variety of symbols. It also follows the greedy principle and uses the longest meaningful symbol string as a token. Note that there cannot be blank spaces. In addition, C also supports digraph escape sequences (The following table), but unlike trigraph, It is performed during token parsing.

<: :> <% %> %: %:
[ ] { } # ##

It should be emphasized that token resolution is completed in the preprocessing phase and will not be re-parsed unless otherwise specified. Some adjustments (Escape Character, discard blank space, etc.) will be made when the token in the preprocessing is compiled to C, but the token segmentation has been completed. That is to say, after token resolution is complete, the component unit of the program is token, not the character set. It can be seen that macro definition is not only a simple string replacement, at least it also affects token parsing. The following example shows the content of this section.

#define plus     +a+++b;           // (a++) + ba+ ++b;          // a + (++b)a+ + +b;         // illegala plus++b;       // a + (++b)a+ =b;           // illegala plus=b;        // illegala/*p;            // should be a/ *p
4. Preprocessing command 4.1 macro

Macros are the most complex and powerful functions in Preprocessing. Here we will explain the use of macro separately. In short, a macro replaces the macro identifier with its defined token sequence. The white spaces at the beginning and end of the alternative token sequence are removed, and the white spaces in the middle may be merged, including macro parameters, this is different from the "substitution" we have always known. In addition, macros are only replaced and constant expressions are not computed.

#define r    1#define C    (2*r*3.14)        // (2 * 1 * 3.14), not 6.28#define mul(a, b)             (    (a)     *     (b)    )mul(  2   +    3   ,   5   );  // ( (2 + 3) * (5) ), attention to the space

There are two operators in the macro that can change the token: # And ##. # It is called stringify operator, which can stringize macro parameters. The macro parameter may be a token sequence, where a string may exist. In this case, "And \ In the string are escaped with \ (not in the string ). # It only applies to macro parameters and cannot be used for general tokens.

#define str    #hi    // not "hi"#define str(a) #astr(\a"hi\!");        // "\a\"hi\\!\""

# Called token pasting operator, used to merge tokens. Its left and right operation objects can be macro parameters or general tokens, and the blank space between them will be removed. # It can even be used together to concatenate more operation objects. #### It only works when the macro is expanded. If # Or # appears in the macro result, it no longer works.

#define twoj  # ## #                     // ###define fun(pre, post)   pre##_f_##postfun(res, get)();                         // res_f_get()

Macros without parameters are called object-like macro and function-like macro with parameters. In the definition of function macros, the macro name and () cannot be blank, and the parameter can be blank. The macro name and () can be blank during the call, and the new specification allows the macro real parameter to be empty. The macro definition of the function is best to allow the user to add ';' freely. For details, see do while statement.

#define add1 (a, b)   (a+b)add1(1, 1);                        // wrong. (a, b) (a+b)(1, 1)#define add2(a, b)    (a+b)add2 (1, 1);                       // ok, 1+1#define fun1()                     // ok#define fun2(a, b)    add##a##bfun2();                            // addfun2(1);                           // add1fun2(, 2);                         // add2

The new specification supports variable-length macro parameters. You only need to use the parameter at the end of the macro to represent the variable-length macro. Unlike the variable length parameter of C, the macro does not need the leading parameter. In the definition, _ va_args _ is used to replace the real parameter. The real parameter is the token sequence (including ','). There is no blank at the beginning and end, and there may be blank in the middle.

#define show(...)    printf(#_VA_ARGS_)show(  hi, there!  );                    // printf("hi, there!")#define fun(a, ...) a##_VA_ARGS_fun(1, 2, 3);                            // 12, 3fun(1);                                  // 1

Macro embedding is the most complex scenario in macro expansion, but in fact, you only need to clarify three points: (1) when macro parameters encounter # Or #, it immediately takes effect and does not continue to expand; (2) in other cases, macro parameters must be expanded first before being brought into the result. (3) macros that have been fully expanded in the result are not processed, expand other macros. Combined with (1) (2), if you want to expand and then do # Or #, You can package the # Or # operation itself in the macro. In the following code, the outer m is not fully expanded when the inner M (0) is expanded, so the inner M (0) can be expanded. After F () is expanded to F, F () cannot be expanded again.

#define one      1#define show(a)  printf(#a" = %d\n", a)show(one);                               // printf("one = %d\n", 1)#define name     edward#define str(s)   #s#define show(a)  printf(str(a))show(name);                              // printf("edward")#define M(x)     x#define f()      fM(M(0));                                 // 0f()();                                   // f()

Macro expansion is relatively advanced, and the # include and # If commands can use macro definition. Since the entire file name (including <> "") is a token, the macro must also define the complete file name. Macros cannot be redefined unless the # UNDEF or definition is the same. The same here means that the number of parameters is the same as that of the token sequence, and the parameter names can be different.

#define name1       stdio#define name2       <stdio.h>#include <name1.h>             // <name1.h>#include name2                 // <stdio.h>#define add()#undef add#define add(a, b)   a+b     // ok#define add(x, y)   x+y     // ok#define add(x, y)   x + y   // illegal

The system provides some predefined macros that can be used in programs. The following are the macros that must be defined in the specification. These macros cannot be # UNDEF. Some constantly changing macros (such as _ line _) are actually system variables, the specification requires that each function start with a hidden definition static const char _ FUNC _ [] = file_name.

Macro Type Description
_ File __ "Path \ name" Include path, actual file
_ Line __ Integer Actual File
_ Date __ "Mmm dd YYYY" Zero for bits with no value
_ Time __ "HH: mm: SS" Zero for bits with no value
_ Stdc __ 0 or 1 Compatibility with specifications
_ Stdc_version __ Yyyymml Compatible standard version
_ Stdc_hosted __ 0 or 1 OS?

Because macros only replace tokens, they imply many disadvantages and sometimes use other alternative methods. An integer defined by a macro cannot be displayed during debugging. It can be replaced by an enumeration constant. The macro-defined String constant may generate multiple parts and can be replaced by const string. Function macros have no parameter check and side effects. They can be replaced by the inline function.

4.2 other commands

# The include command includes the header file in this unit, followed by the header file name. <> Find the contained files in the library directory. You can specify this directory in the compiling environment. "" The contained files are first searched in the current directory and then in the database, so that you can use the custom Library first. To eliminate repeated inclusion, you can use a macro (see the example) or a compiler extension statement.

To increase portability and flexibility, preprocessing supports conditional compiling. It generally starts with # If, # ifdef, or # ifndef branch, followed by 0 or multiple # Elif branches, followed by at most one # else branch at the end, and ended with # endif. The result of a conditional expression is an integer, which can be an integer constant, a macro, or a defined operator. character (string) or floating point constants are not allowed. Defined is the unique keyword of preprocessing. It has two forms: defined (M) and defined M.

# The line command is followed by the line number and the optional file name. This command is generally used to generate a. c file, so that the line number (File Name) points to the original file, rather than the. c file. # The error command is followed by any token sequence. These tokens cannot be expanded into macros. # Error occurs during preprocessing and the token sequence is displayed.

# Pragma commands are defined by the compiler. commands starting with stdc are reserved for standard use. Some function switches have been defined. # Macros cannot be expanded in Pragma. The keyword _ Pragma ("command") is used in the new specification to support macro expansion. It is equivalent to # pragma command.

// headfile, only included once#ifndef HEAD.H#define HEAD.H// ...#endif#if defined X            // the same as #ifdef X#elif defined(Y)// ...#elif#else#endif#line 100#line 100 "\test.c"#define name  edward#error fail, name!       // show fail, name!#define str(cmd)  #cmd_Pragma(str(align(4)));  // the same as #pragma align(4)
 

[C] 02-program structure and preprocessing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.