In linux user space, we often call the system call. Next we track the read system call. the Linux kernel version used is linux2.6.37. The implementation varies slightly in different Linux versions. In some applications, we can see some of the following definitions:
#define real_read(fd, buf, count ) (syscall(SYS_read, (fd), (buf), (count)))
|
Actually, the system function syscall (sys_read) is called, that is, the sys_read () function, which is implemented using several macro definitions in linux2.6.37.
The implementation mechanism of the Linux System Call (SCI, system call interface) is actually a process of multi-channel aggregation and decomposition. The aggregation point is the entry point of 0x80 interruptions (x86 system structure ). That is to say, all system calls are aggregated from the user space to 0x80, and the specific system call number is saved. When the 0x80 interrupt handler program runs, different system calls are processed based on the system call number (Different kernel functions are called for processing ).
Two ways to cause system calls
(1) int $0 × 80, the only method that causes system calls in the old Linux kernel version
(2) sysenter Assembly command
Use the following macro to call the system in the Linux Kernel
SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count) { struct file *file; ssize_t ret = -EBADF; int fput_needed;
file = fget_light(fd, &fput_needed); if (file) { loff_t pos = file_pos_read(file); ret = vfs_read(file, buf, count, &pos); file_pos_write(file, pos); fput_light(file, fput_needed); }
return ret; }
|
The macro definition of syscall_define3 is as follows:
#define SYSCALL_DEFINE3(name, ...) SYSCALL_DEFINEx(3, _##name, __VA_ARGS__)
|
# Replace the characters in the macro directly,
If name = read, _ nR _ # name in the macro is replaced with _ nr_read. _ NR _ # name indicates the system call number. # indicates two macro expansions. replace "name" with the actual system call name, and then replace _ nR _... expand. for example, name = IOCTL is _ nr_ioctl.
#ifdef CONFIG_FTRACE_SYSCALLS #define SYSCALL_DEFINEx(x, sname, ...) \ static const char *types_##sname[] = { \ __SC_STR_TDECL##x(__VA_ARGS__) \ }; \ static const char *args_##sname[] = { \ __SC_STR_ADECL##x(__VA_ARGS__) \ }; \ SYSCALL_METADATA(sname, x); \ __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) #else #define SYSCALL_DEFINEx(x, sname, ...) \ __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) #endif
|
Whether or not the config_ftrace_syscils macro is defined, the following macro definition will be executed:
_ Syscall_definex (x, sname, _ va_args __)
#ifdef CONFIG_HAVE_SYSCALL_WRAPPERS
#define SYSCALL_DEFINE(name) static inline long SYSC_##name
#define __SYSCALL_DEFINEx(x, name, ...) \ asmlinkage long sys##name(__SC_DECL##x(__VA_ARGS__)); \ static inline long SYSC##name(__SC_DECL##x(__VA_ARGS__)); \ asmlinkage long SyS##name(__SC_LONG##x(__VA_ARGS__)) \ { \ __SC_TEST##x(__VA_ARGS__); \ return (long) SYSC##name(__SC_CAST##x(__VA_ARGS__)); \ } \ SYSCALL_ALIAS(sys##name, SyS##name); \ static inline long SYSC##name(__SC_DECL##x(__VA_ARGS__))
#else /* CONFIG_HAVE_SYSCALL_WRAPPERS */
#define SYSCALL_DEFINE(name) asmlinkage long sys_##name #define __SYSCALL_DEFINEx(x, name, ...) \ asmlinkage long sys##name(__SC_DECL##x(__VA_ARGS__))
#endif /* CONFIG_HAVE_SYSCALL_WRAPPERS */
|
The following macro definitions will be called:
Asmlinkage long sys # NAME (_ SC _decl # X (_ va_args __))
That is, the sys_read () system function we mentioned earlier.
Asmlinkage notifies the compiler to extract only the parameters of this function from the stack. This qualifier is required for all system calls! This is similar to the macro definition mentioned in quagga in the previous article.
That is, the following code in macro definition:
struct file *file; ssize_t ret = -EBADF; int fput_needed;
file = fget_light(fd, &fput_needed); if (file) { loff_t pos = file_pos_read(file); ret = vfs_read(file, buf, count, &pos); file_pos_write(file, pos); fput_light(file, fput_needed); }
return ret;
|
Code parsing:
- Fget_light (): extracts the corresponding file object from the current process descriptor Based on the index specified by FD (see figure 3 ).
- If the specified file object is not found, an error is returned.
- If the specified file object is found:
- Call the file_pos_read () function to retrieve the current location of the read/write file.
- Call vfs_read () to execute the file read operation, and this function finally calls the function pointed to by file-> f_op.read (). The Code is as follows:
If (file-> f_op-> Read)
Ret = file-> f_op-> Read (file, Buf, Count, POS );
- Call file_pos_write () to update the current read/write location of the file.
- Call fput_light () to update the reference count of the file.
- Finally, the number of bytes of data read is returned.
At this point, the processing done by the virtual file system layer is complete, and the control is handed over to the ext2 file system layer.
Http://blogold.chinaunix.net/u3/104447/showart_2527011.html