Copyright notice: This article by Hangyuan Li original article, reprint please indicate source:
Article original link: https://www.qcloud.com/community/article/109
Source: Tengyun https://www.qcloud.com/community
The purpose of this article is to share some of the benefits of analyzing Shell's source code at school, to help you understand a shell's workflow, and to look at some of the design advantages and drawbacks of a long-standing software like the shell from a software design standpoint. This article is not about Shell grammar, I believe many colleagues play the shell is very familiar.
The limitations of this article: limited to my technical level and time, there must be a lot of errors and omissions, in the process of the source annotation at the time, there will be no understanding and doubt of the place, also please correct me. But in general, the main logic and process can be clear.
Analyzed version: First choose the most popular bash, then the version is Bash4.2-release
Bash code Introduction: Before doing a statistic, Shell source code about 100,000 lines, where the core logic in more than 10,000 lines, which is the objective of the analysis. The rest includes the introduction of the ReadLine library (also an open source library, processing input), the YACC Parser Generation tool (open Source Library, I believe that many have learned the principles of compiling knowledge of this thing), and many to improve the user interface-friendliness of the Optimization and auxiliary code (such as! Historical operations).
Recommendation: While understanding the shell operating mechanism, from the software design point of view, he will find that there are many things that can be optimized and improved (of course, because the shell itself is from a relatively long period of development, a variety of historical factors related), especially, read the following content of the students should be able to find that Command parsing that piece, with C + + Oo thought can reasonably design the class hierarchy of commands, greatly simplifying the amount of code and logic, interested students can even write their own to try to replace this part.
I. Start-up process
shell.c
is the file where the shell main function main is located. So the shell startup can be thought of starting with a shell.c
file. The main work flow that is done by the main function is to check the startup environment (whether it is booting through sshd, whether it is running in the Emacs environment, whether it is running in the Cgywin environment, whether it is an interactive shell, whether it is a login shell, etc.), and checks the system for memory leak. Whether it is a restricted shell), read the configuration file ( /etc/profile and
in order ( ~/.bash_profile OR ~/.bash_login OR ~/.profile
) before the existence is not read), set the value of the global variables required to run (current environment variable, shell name, startup time, input and output file descriptor, language localization related settings), Process parameters and options (that is, with parameters and -c -s --debugger
options), set values for parameters and options ( run_shopt_alist ()
the function calls shopt_setopt
function to set the value of the option, bind the value of the positional parameter), and then enter the following different branches according to the different startup parameters:
If the parameter extension is only performed without executing the command, the run_wordexp
function extension parameter is called, and then exit_shell
the call ( last_command_exit_value
) function exits with the return value of the last command execution.
If you start the shell in the-c parameter mode, there are two cases: if the string parameter is included as the command to execute, the call executes the command included with-C run_one_command (command_execution_string)
, and command_execution_string
the parameter holds the string command value that is included with-C. Call exit after execution is complete exit_shell (last_command_exit_value)
. Second: If you expect the user to enter a command to execute, jump to branch 3.
Set shell_initialized
to 1 to indicate that shell initialization is complete. The eval.c
function defined in the call reader_loop()
reads and resolves the user input continuously, and if the reader_loop
function returns, invokes exit_shell
and exits the (last_command_exit_value)
shell.
Two. Command parsing and execution Flow 1. Main related documents
Eval.cCommand.hCopy_cmd.cExecute_cmd.cMake_cmd.c
2. SHELL command structure:
The shell uses the following structure to represent a command.
typedefstruct Command {Enum Command_type type;/* Type of command */int flags;/* Tag bit that will affect the execution environment of the command */int line;*/* command from which line start */REDIRECT *redirects;/* Associated REDIRECT action */Union {/* Below is a union value that holds the specific "command body", which may be for loops, case conditions, while loops, and so on, the union structure is characterized by only one value is valid, so the following command types are tied, followed by a comment for each command type */struct for_com *for;struct case_com *case;struct while_com *while;struct if_com *if;struct connection *connection;struct simple_com *simple;struct Function_def *function_def;struct group_com *group;#If defined (Select_command)struct select_com *select; #endif# If defined (dparen_arithmetic) struct arith_com *arith; #endif#if defined (COND_ COMMAND) struct cond_com *cond; #endif#if defined (ARITH _for_command) struct arith_for_com *arithfor; #endif struct subshell_com *Subshell; struct coproc_com *coproc; } value; COMMAND;
One of the key members is the Union union type value, which indicates the type of the command and a pointer to the specific contents of the Save command. From the optional values of the structure, the shell-defined commands have 14 kinds, such as for loops, case conditions, while loops, function definitions, and cooperative asynchronous commands.
Where all the command execution paths are analyzed, it is determined that the command of type simple is the most atomic command operation after the substitution of commands, and the remaining types of commands are composed of several simple command.
After the shell is started, either the 2 and 32 branches above are entered, and the function used to parse the command is the one execute_cmd.c
defined in the last. Branch 1 does not involve parsing the command, so it is not analyzed here.
3. The first case of branch 2:
Run_one_command (command_execution_string) executes a call parse_and_execute
(defined in evalstring.c) to parse and execute the command, which parse_and_execute
actually invokes execute_command_internal
the function for execution of the command.
4. Second case of branch 2 and branch 3:
reader_loop
Function call function read_command
Parse command, read_command
function call function parse_command()
to parse, parse_command()
call parser Y.TAB.C in Yyparse () (the function is automatically generated by YYAC, so no longer go inside the function), The command string that parses the result is saved in the global variable GLOBAL_COMMAND
, and then the function execute_command
(defined in) is executed, and the function execute_cmd.c
execute_command
calls execute_command_internal
the function to execute the command. At this point, the case of branch 2 and branch 3 is merged into execute_command_internal
the execution.
5. execute_command_internal Internal process:
This function is the actual operation function of executing commands in shell source code. He needs to parse the value member for the specific command structure passed in as an action parameter, and then invoke the specific type of command execution function to perform the interpretation of the specific command for the different value types.
Specifically: If value is simple, the function is called directly to execute_simple_command
Execute, execute_simple_command
and then executed according to the command is an internal command or an external disk command, execute_builtin
execute_disk_command
where execute_disk_comman
D is called make_child
when executing an external command The function fork executes the external command for the child process.
If value is a different type, the function of the corresponding type is called to branch control. For example, if the value is for_commmand
, that is, this is a For loop control structure command, the function is called execute_for_command
. In this function, the elements in each action field are enumerated, and the function is called again for execute_command
analysis. That execute_for_command
is, this kind of function implements the function of the expansion of a command and the process control as well as the recursive invocation execute_command
.
Therefore, the main flowchart that starts from the main function to the execution of the command can appear as follows:
6. Function-level flowchart from Boot to command interpretation:
The file where the function definition is located in parentheses.
Three. Variable control 1. Key related Documents
variables.cvariables.h
2. Important Data structures
Bash describes a variable structure primarily through the variable context and the variable two struct bodies. The following are described separately.
Variable contexts: The context can also be understood as a scope, which can be interpreted against the scope of a function in the C language and the global scope. Variables in a context are visible in this context.
Variable context structure definition:
typedef struct var_context { char *name; /* name如果为空则表示它存储的是bash全局上下文,否则表示名为name的函数的局部上下文*/ int scope; /*上下文在调用栈中的层数,0代表全局上下文 ,每深入一层函数调用scope递增1*/ int flags; /*标志位集合flags记录该上下文是否为局部的、是否属于函数、是否属于内部命令,或者是不是临时建立的等信息*/ struct var_context *up; /* 指向函数调用栈中上一个上下文*/ struct var_context *down; /*指向函数调用栈中下一个上下文*/ HASH_TABLE *table; /* 同一上下文中的所有变量集合hash表,即名值对 */} VAR_CONTEXT;
Structure that describes the scope of a variable. All variables in a context, stored in the table member of the Var_context.
Variables: Variables in bash do not emphasize types and can be thought of as strings. The storage structure is as follows
typedef struct variable { char *name; /* points to the name of the variable */char *value; /* points to the value of the variable */char *exportstr; /* points to a string such as "name = value" */sh_var_value_func_t *dynamic_value; Span class= "Hljs-comment" >/* if it is a function to return a dynamic value, such as $seconds or $random, the function pointer points to the function that generated the value. */sh_var_assign_func_t *assign_func; /* if the special variable is assigned to call the callback function, then its function pointer value is saved here */int attributes; /* read-only, visible and other properties */int context; /* records which layer of the local variable stack the context variable belongs to within the accessible scope */} Shell_var
Since all variables are generally represented by a string, a attributes attribute member is provided to modify the properties of the variable, such as the property can be a att_readonly
read-only, att_array
an array variable, a att_function
function, an att_integer
integer class variable, and so on.
3. Mechanism of Action
The execution of the shell program is accompanied by a context switch, and the variable control in the shell source code is based on this. Binds a variable to one of the contexts.
For example, the first default is the global context, called Global, which contains the value of the variable passed in by the parameter or configuration file of the main function. If this is entered into the execution of a function foo, then Foo gets the variable to be exported from the global context, plus its own new variable, which forms the context local variable of foo and presses the context of Foo into the call stack. At this point the call stack looks as follows.
To explain a more detailed scenario, assuming that the fun function is called in Foo, the fun first gets the variable to be exported from Foo, adds its own new variable, forms the context local variable of the fun, and then presses the fun context into the stack top of the call stack.
。 This is the call stack that looks like the following.
Stack Top: Fun context (all local variables containing the fun context)
Stack: Foo context (all local variables containing the Foo context)
Bottom of Stack: Global Context (contains all global variables)
Assuming that the fun function is finished, the fun context pops out of the stack, and the local variables are all invalidated. The call stack becomes as follows.
Variable lookup order: from the top of the stack to the bottom of the stack, that is, if there is no variable to find in the top of the stack, find its next context in the stack, and if the entire call stack is not found, the lookup fails. For example, if there is a PWD variable (the current working path) in the top of the stack, the global PWD variable is not looked up, which guarantees the correct semantics for local variable overrides.
4. Special variables:
Several special variables are defined in bash, meaning that some additional coherence is required after the variable has been modified. For example, the variable tz that represents the time zone is modified to call the Tzset function to modify the corresponding time zone settings in the system. Bash provides a callback function interface for this class of variables to invoke the callback function when its value changes. This can be analogous to the trigger mechanism in the database. In bash, special variables are stored in a global array special_vars
. It is defined as follows:
struct name_and_function { char *name;/*变量名*/ sh_sv_func_t *function;/*变量值修改时要触发的回调函数的函数指针*/};
The structure represents a special variable structure that is used to generate an Specialvars array. The callback function is typically named for the SV variable name.
Staticstruct Name_and_function special_vars[] = {{"BASH_XTRACEFD", SV_XTRACEFD},#If defined (READLINE)#If defined (strict_posix) {"COLUMNS", sv_winsize},#endif {"Comp_wordbreaks", Sv_comp_wordbreaks},#endif {"Funcnest", sv_funcnest}, {"Globignore", Sv_globignore},#If defined (history) {"Histcontrol", Sv_history_control}, {"Histfilesize", sv_histsize}, {"Histignore", Sv_histignore}, {"Histsize", sv_histsize}, {"Histtimeformat", sv_histtimefmt},#endif#If defined (__cygwin__) {"HOME", Sv_home},#endif#If defined (READLINE) {"Hostfile", Sv_hostfile},#endif {"IFS", sv_ifs}, {"Ignoreeof", sv_ignoreeof}, {"LANG", Sv_locale}, {"Lc_all", Sv_locale}, {"Lc_collate", Sv_locale}, {"Lc_ctype", Sv_locale}, {"Lc_messages", Sv_locale}, {"Lc_numeric", Sv_locale}, {"Lc_time", Sv_locale},#If defined (READLINE) && defined (strict_posix) {"LINES", sv_winsize},#endif {"MAIL", Sv_mail}, {"MailCheck", Sv_mail}, {"Mailpath", Sv_mail}, {"Opterr", Sv_opterr}, {"Optind", Sv_optind}, {"PATH", Sv_path}, {"Posixly_correct", Sv_strict_posix},#If defined (READLINE) {"Term", sv_terminal}, {"TERMCAP", sv_terminal}, {"TERMINFO", sv_terminal},#endif/* READLINE */{"Textdomain", Sv_locale}, { "Textdomaindir", Sv_locale},#if defined (have_tzset) && defined (PR Ompt_string_decode) { "TZ", Sv_tz},#endif#if defined (history) && defined (bang_history) { c9> "Histchars", Sv_histchars},#endif/ * History && Bang_history * * { "ignoreeof", sv_ Ignoreeof}, {(char *)0, (sh_sv_func_t *)0}};
Shell main logic Source level analysis (1)--shell running process