Linux Learning Summary
After the end of a semester of Linux learning I harvest very rich, in this study, we tried a new way of learning, Linux Flip class, every week on the MOOC website followed by the teacher to explain the video learning and experiment, 1.1 slowly on the Linux this operating system to understand, At the same time, some basic commands are also mastered. Every unit of the content I have their own understanding and harvest. In the Linux kernel these weeks of learning, I learned a lot of knowledge that I have never known before.
about how the computer works:
The computer system consists of two parts: hardware and software system. von Neumann (John von Neumann) lays the basic structure of modern computers, also known as the von Neumann structure, characterized by:
1) The use of a single processing unit to complete the calculation, storage and communication work.
2) The storage unit is a fixed-length linear organization.
3) The unit of storage space is addressed directly.
4) Use the low-level machine language, instruction through the operation code to complete the simple operation.
5) centralized sequential control of the calculations.
6) The computer hardware system is composed of five components of the arithmetic, memory, controller, input device and output device, and stipulates their basic functions.
7) Use binary form to represent data and instructions.
I think that when the computer is running, it takes the first instruction from memory, and then according to the requirements of the instruction, takes the data out of the memory for the specified operation and logic operation, and then sends the result to the memory by the address. Next, take out the second instruction, complete the specified operation under the command of the controller, which is equivalent to the for loop in C language, and proceed accordingly. Until a stop command is encountered.
about how the operating system works:
When a process executes a program, an interrupt comes in, the CPU first presses the current EIP,ESP into the kernel stack, then points the ESP to the kernel stack, and the EIP points to the ingress of interrupt processing, the most critical of which is TSS, because the TSS stores information about the various registers in the user state, Therefore, the user-state EIP and ESP into the stack, it is equivalent to a state before the process of saving, in order to continue to perform the previous task after returning from the interrupt, and then kernel call Save_all to save the other register information in the stack, and then according to the CS Eip point to the interrupt program, The interrupt is processed. When an interrupt is executed, it is dispatched via schedule, the most important of which is switch to, which calls the switch to function to dispatch. When the interrupt is finished, to return to the user state, you need to go back to the beginning of the initial outage to continue to perform the previous task, which requires that restore all stack the values previously saved in the kernel stack, and then restore the state of their calculators through iret, so that we are back to the user state, Resumes execution of a task that was paused due to an outage.
I think the core of the operating system is the process, and many operating systems have their own core processes, the process is the dynamic execution of the program. The most important operation of the system is to complete the core process, while the other program is the derivative of its core process, it can control the process of the derivation process start and switch can even terminate the process, when the core system ends, will affect the operation of the entire operating system.
about Linux system Start-up process :
Start kernel is found in MAIN.C first, regardless of which part of the analysis kernel is called start kernel.
Arch/x86 in Trap init, a lot of interrupts are set, and system calls are also interrupts. Init process is a number of Linux system processes, when there is no process in the system to execute the time to dispatch to the idle process, start kernel from the beginning of the kernel will always exist, this is the No. 0 process, then process No. 0 to create the 1th process is init, So the system started up. This is the boot process of the kernel. Init_task is the process descriptor used by process 0 and the first process descriptor in the Linux system, the process descriptor is defined in ARCH/POWERPC/KERNEL/INIT_TASK.C and the code snippet is as follows: struct TASK_STRUCT Init_task = Init_task (init_task); Init_task Descriptor initializes the Init_task process descriptor using the macro Init_task in Init_task Task.h file, Init_task is the first thread in the Linux kernel that runs through the initialization of the entire Linux system, which is the only process in the Linux system that is not created with the Kernel_thread () function, in Init_ When the task process executes late, it calls the Kernel_thread () function to create the first core process kernel_init, while the init_task process continues to initialize the Linux system. After initialization is complete, the init_task is degraded to the cpu_idle process, which will get CPU run when there are no other processes in the ready queue for core 0. The newly created 1th process Kernel_init will start the secondary CPU one by one and eventually create the user process.
About the system call method:
User program------>C library (i.e. API): INT 0x80----->system_call-------> System invoke Service Routines--------> Kernel programs
First of all, we often say that the user API is actually a system-provided C library.
The system call is implemented through the soft interrupt instruction int 0x80, and this INT 0x80 directive is encapsulated in the function of the C library.
(soft interrupts differ from what we often call hard interrupts in that soft interrupts are triggered by instructions, not by hardware peripherals.) )
INT 0x80 The execution of this instruction will cause the system to jump to a preset kernel space address, which points to the system call handler, the System_call function.
(NOTE:!!!) System call handler System_call is not a system invoke service routine, the system invoke service routine is a kernel implementation function for a specific system call, and the system call handler is a boot process before the system invokes the service routine, which is the directive for int 0x80. For all system calls. To put it simply, any system call is performed first by invoking the function in library C, where there will be a soft interrupt INT 0x80 statement, and then go to execute system call handler System_call,
The system_call then goes to the execution of the specific system invocation service routine based on the specific system call number. )
The System_call function looks up the system call table through the system call number sys_call_table! When the soft interrupt instruction int 0x80 executes, the system call number is placed in the EAX register, the System_call function can read the EAX register fetch, multiply it by 4, generate an offset address, and then use sys_call_table as base address, base address with offset, You can get the address of the specific system call service routine!
Then the system invokes the service routine. It is necessary to note that the system invoke service routine takes only the parameters from the stack, so the parameters are stored in the registers before the System_call executes, and the registers are first pressed into the stack when System_call executes. After the system_call exits, the user can obtain (modified) parameters from the register.
In addition: the system calls through the soft interrupt int 0x80 into the kernel, jumps to the system call handler System_call function, and then executes the corresponding service routine. However, because it represents the user process, the execution process is not part of the interrupt context, but the process context. Therefore, during system call execution, many of the information that can be accessed by the user process can be preempted by other processes and can hibernate.
Once the system call is complete, the kernel will be dispatched once the control is handed back to the user process that initiated the call. If you find that a higher priority process or the current process has run out of time slices, you will select a higher priority process or re-select the process execution.
about System-call Interrupt Handling Process:
The system call is implemented through the soft interrupt instruction int 0x80, and this INT 0x80 directive is encapsulated in the function of the C library. INT 0x80 The execution of this instruction will cause the system to jump to a preset kernel space address, which points to the system call handler, the System_call function. System call handler System_call is not a system invoke service routine, the system invoke service routine is a kernel implementation function for a specific system call, and the system call handler is a boot process before the system invokes the service routine, which is the directive for int 0x80. For all system calls. To put it simply, any system call is performed first by invoking the function in library C, where there will be a soft interrupt INT 0x80 statement, and then go to execute system call handler System_call, System_call Then go to execute the specific system call service routine according to the specific system call number.
When the System_call function looks up the system call table sys_call_table! the soft interrupt instruction int 0x80 execution through the system call number, the system call number is placed in the EAX register, and the System_call function can read the EAX register and multiply it by 4. Generate an offset address, and then take sys_call_table as the base address, the base address plus the offset addresses, you can get the specific system call service routine addresses!
Then the system invokes the service routine. It is necessary to note that the system call service routine only takes parameters in the stack, so the parameters are stored in the register before System_call executes, and the registers are first pressed into the stack when executed System_call. After the system_call exits, the user can obtain (modified) parameters from the register.
In addition: the system calls through the soft interrupt int 0x80 into the kernel, jumps to the system call handler System_call function, and then executes the corresponding service routine. However, because it represents the user process, the execution process is not part of the interrupt context, but the process context. Therefore, during system call execution, many of the information that can be accessed by the user process can be preempted by other processes and can hibernate.
Once the system call is complete, the kernel will be dispatched once the control is handed back to the user process that initiated the call. If you find that a higher priority process or the current process has run out of time slices, the higher priority process or the re-select Process Execution is selected
about Linux the process by which the kernel creates a new process
The only way to create a new process in Linux is to use the fork function, where fork () executes once but has two return values.
In the parent process, the return value is the process number of the child process, and in the child process, the return value is 0. The return value can therefore be used to determine whether the current process is a parent or child process.
At the end of the fork is the task is set to a ready state, because fork () is a system call, in the system call section SYSTEM_CALL.S, you can see after the system function returns, call the Dispatch function schedule (), in schedule (), The ready state of the new process is detected and switched to the new process with switch_to () to execute.
The child process that is obtained by using the fork function is a replica of the parent process, which copies the entire process's address space from the parent process, including the process context, process stack, memory information, open file descriptor, signal control setting, process priority, process group number, current working directory, root directory, resource limit, control terminal, and so on. The child process is unique only to its process number, resource usage, and timers. As you can see, the cost of using the fork function is very large, it replicates the code snippet in the parent process, the data segment, and the majority of the stack segment, making the fork function not execute fast. A new process created by the fork () system call is called a child process. The function is called once, but returns two times. If the fork () process call succeeds, the difference between two returns is that the return value of the child process is 0, and the return value of the parent process is the process number of the new child process
about Linux How to load and start an executable program
In Linux, the parent process is first created, and then a new process is created by calling the fork () system call, and then the new process calls the EXECVE () system call to execute the specified elf file. The main process continues to return waiting for the new process to finish executing, and then waits for the user to enter the command again. The EXECVE () system call is defined in Unistd.h, and its prototype is as follows:
int Execve (const char *filenarne, char *const argv[], char *const envp[]);
Its three parameters are executed by the program file name, execution parameters and environment change most. GLIBC EXECVP () system calls are packaged, providing 5 different forms of exec series APIs such as Execl (), EXECLP (), Execle (), Execv (), and EXECVP (), which differ only in the parameters of the call, But it will eventually be called to the Execve () system.
Call the EXECVE () system call, and then call the kernel's ingress Sys_execve (). Sys_execve () calls Do_execve () after some parameters are checked for replication. Because executables are more than just elf, there are Java programs and "#!" Start of the script and so on, so Do_execve () will first check the executed file, read the first 128 bytes, especially the beginning of 4 bytes of magic number, to determine the format of the executable file. If the script is an interpreted language, the first two bytes "#!" It makes up the magic number, and once the system determines the two bytes, the subsequent string is parsed to determine the path of the program interpreter. Environment, the executable file is in the elf format, the file header indicates that the file is loaded into memory necessary information, followed by the form of segments of the code and data, the division is mainly based on the load into memory read and write properties. The system call EXECVE is responsible for the dispatch of the executable file, first carries on the correlation parameter transfer and the pre-call environment processing, then loads the executable file the information, looks for the corresponding executable file parsing module, for the elf format executable file, according to the format request load to the corresponding address space in memory, If it is statically linked, start with the entry address indicated in the header of the file, and if it is an executable file that relies on the dynamic link library, it needs to start with the portal address of the dynamic linker ld. Execve is a special system call in which the total fork of the subprocess returns to a specific point, the current program executes to EXECVE when it falls into the kernel state, when Execve returns is a new executable execution starting point, the shell environment executes EXECVE, When the system call is trapped in the kernel, call Evecve,do execve, load the header according to the executable file, and look for the kernel module to parse in the list.
I think in the process of learning Linux, it is understood that Linux has a large user rights, and do everything is very free, so you have to know what you do every step of the way, because hackers will exploit the vulnerability of Linux to obtain information. For example, buffer overflow in this experiment, buffer overflow refers to when the computer fills the buffer in the number of bits of data exceeded the buffer itself capacity overflow data coverage on the legitimate data, ideally, the program to check the length of the data does not allow the input of the buffer length characters, However, most programs assume that the data length always matches the allocated storage space, which is a hidden danger for buffer overflow. The buffers used by the operating system are also referred to as "stacks". The instructions are temporarily stored in the "stack" between the various operations, and a buffer overflow occurs on the stack. A buffer overflow causes a buffer overflow by writing content beyond its length to the program's buffer, which destroys the program's stack and enables the program to execute other instructions in order to achieve the purpose of the attack. A buffer overflow is a scenario in which a program attempts to write to a buffer beyond the pre-allocated fixed-length data. This vulnerability could be exploited by malicious users to alter the flow control of a program, or even execute arbitrary fragments of code. This vulnerability occurs because of a temporary shutdown of the data buffer and the return address, which causes the return address to be rewritten.
Linux is a very intelligent system, but at the same time it has a lot of system vulnerabilities, such as Set-uid, it is an important security mechanism. When a set-uid program is run, it is assumed to have the owner's permission, anyone running the program will get the permission of the program owner, but also to the hackers to create a breach of the information to steal the condition. This requires that we continue to learn in the future of continuous learning breakthroughs, and my knowledge of Linux is far from enough, the details of the understanding is not thorough enough, I hope that in the future learning life, the Linux kernel has a deeper understanding, but also to learn more about information security related knowledge, feel that only learning information security, Can ascend into the grasp of the procedures inherent in various mechanisms. From the angle of attack can be more meticulous view of programming. Because only understand the logic of the normal operation of the program, it can be destroyed from the point of supply. Therefore, more solid foundation is the right way to truly grasp information security. Can not be a cluster and, without a good foundation to learn good information security knowledge. In short this semester of the Linux kernel This course of study gave me a new experience and harvest, thank the teacher!
Linux Learning Summary