The creation process of socket in Linux kernel source analysis (Summary nature)

Source: Internet
Author: User

Http://www.jianshu.com/p/5d82a685b5b6

After a long analysis of the socket creation source, found a piece of paste, so this summary, my blog at the same time there is another detailed source analysis, the kernel version of 3.9, it is recommended that after reading this article if you are interested in another blog post. Never look at another article alone.

One: Call chain:

Two: Data structure

Take a look at the meaning of each data structure:

1) socket, sock, Inet_sock, tcp_sock relationship
Once you have created the SK variable, return to the Inet_create function:

Here is the address of the Inet_sock variable based on the SK variable, which is noted here to distinguish between the various structures.
A. Struct socket: This is the basic BSD socket, facing the user space, the application through the system call to create the socket is the structure, it is based on the virtual file system created;
There are three types, namely streaming, datagram, and the original socket protocol;

B. struct sock: It is the socket of the network layer, it should be TCP, UDP, raw three, oriented to the kernel driver;

Its state is more granular than the socket structure:

C. struct Inet_sock: It is the socket representation of the inet domain, an extension of the struct sock, providing some properties of the inet domain, such as TTL, multicast list, IP address, port, etc.;
D. struct Raw_socket: It is a socket representation of the raw protocol, an extension of the struct Inet_sock, which handles ICMP-related content;
E. sturct Udp_sock: It is the socket representation of the UDP protocol, which is an extension of the struct inet_sock;
F. struct Inet_connection_sock: It is all connection-oriented socket representations, which are extensions to the struct inet_sock;

G. struct Tcp_sock: It is the TCP protocol socket representation, is the extension of the struct inet_connection_sock, mainly increases the sliding window, congestion control some TCP-specific properties;
H. struct Inet_timewait_sock: It is the socket representation of the network layer for time-out control;
I. struct Tcp_timewait_sock: It is a socket representation of the TCP protocol for time-out control;

Third: The specific process

1. Function entry:
1) The sample code is as follows:

int SERVER_SOCKFD = socket (af_inet, sock_stream, 0);


2) Entrance:
Net/socket.c:sys_socketcall (), creating a Socket will execute the Sys_socket () function based on the subsystem call number;

2. Assigning the socket structure:
1) Call Chain:
Net/socket.c:sys_socket ()->sock_create ()->__sock_create ()->sock_alloc ();

2) Create the I node in the socket file system:

Inode = New_inode (SOCK_MNT->MNT_SB);
Here, the New_inode function is the general function of the file system, which is to create an Inode in the corresponding file system, and its main code is as follows (FS/INODE.C):

There is a condition to judge: if (sb->s_op->alloc_inode), meaning that if the current file system's Super block has its own allocation of inode operation function, then call its own function to allocate inode, Otherwise, a chunk of inode is allocated from the common cache area;

3) Create socket dedicated inode:
As mentioned later in the article "Socket File system registration", when installing the socket file system, the Super block of the filesystem is initialized, and the operation pointer of the S_OP is initialized to the SOCKFS_OPS structure, so allocating the inode now calls SOCK_ALLOC_ Inode function to complete: a socket_alloc struct is actually assigned, which contains the socket and inode, but eventually the Inode member in the struct is returned, and the socket structure and the inode structure are allocated; The application can perform operations such as read ()/write () on the socket via a file descriptor, which is done by the virtual file system (VFS).

3, according to the inode to obtain the socket object:
Since creating an inode is a common logic for the file system, its return value is a pointer to the Inode object, but here, after creating the inode for the socket, you need to get the socket object based on the inode; the inline function socket_i, Two important macro containerof and offsetof are used here


4. Use the Protocol family to initialize the socket:

1) Register the AF_INET protocol domain:

In the "Socket file system registration" refers to the work of the system initialization, af_inet registration is precisely through this to complete;

Initialize Portal net/ipv4/af_inet.c: Here call the Sock_register function to complete the registration:

The Af_inet protocol domain inet_family_ops is registered to the net_families array in the kernel according to family, and the following is its definition:

static struct net_proto_family inet_family_ops = {. Family = pf_inet,. Create = Inet_create,. Owner = This_module,};

Where family specifies the type of protocol domain, create a socket creation function that points to the corresponding protocol domain;

2) Socket type

In the same protocol domain, there may be multiple socket types, such as the existence of a stream socket (SOCK_STREAM), datagram Socket (SOCK_DGRAM), and the original socket (SOCK_RAW) in the af_inet domain, the protocols established on these three types of sockets are TCP, Udp,icmp/igmp and so on.

In the Linux kernel, struct struct proto represents a socket type in a domain that provides all operations and related data on that type of socket (the corresponding buffer area is allocated when the kernel is initialized, see the Inet_init function mentioned above).

The three socket type definitions for the Af_ient domain are represented by struct INET_PROTOSW (NET/IPV4/AF_INET.C), as follows: where Tcp_prot (net/ipv4/tcp_ipv4.c), Udp_prot (net/ IPV4/UDP.C), Raw_prot (NET/IPV4/RAW.C) represent three types of sockets, respectively, representing the operation of the corresponding socket and related data; OPS members provide a collection of all operations for the protocol domain, for three different socket types, There are three different domain operations Inet_stream_ops, inet_dgram_ops, Inet_sockraw_ops, whose definitions are located under net/ipv4/af_inet.c;

When the kernel initializes, in Inet_init, different sockets are stored in the global variable INETSW unified management; INETSW is a list of linked lists, each of which is a linked list of struct INET_PROTOSW struct, with Sock_max in total, inet When the _init function initializes the af_inet domain, the calling function INET_REGISTER_PROTOSW registers all the socket types defined in the array inetsw_array into the INETSW array, where the same socket type, different The socket of the protocol type is stored in the INETSW array by the linked list, and the socket type is indexed, and when the system is actually used, only INETSW is used instead of inetsw_array;

3) Use the protocol domain to initialize the socket

Having learned the above knowledge, we return to Net/socket.c:sys_socket ()->sock_create ()->__sock_create ():

PF = rcu_dereference (net_families[family]); Err = pf->create (NET, sock, protocol);

In the above code, locate the protocol domain that was registered when the kernel was initialized, and then call its Create method;

5. Allocation Sock structure:

SK is the network layer for the socket representation, struct struct sock is relatively large, not detailed here, only a few important members, Sk_prot and Sk_prot_creator, these two members point to a specific set of protocol processing functions, the type is struct struct Proto,struct proto types of variables have a total of three in the protocol stack. Its call chain is as follows:

Net/socket.c:sys_socket ()->sock_create ()->__sock_create ()->net/ipv4/af_inet.c:inet_create ();

Inet_create () mainly completes the following work:

1) Set the status of the socket to ss_unconnected;

Sock->state = ss_unconnected;

2) Locate the corresponding socket type according to the type of socket:

Because sockets of different protocol of the same type are stored in the same linked list in INETSW, they need to be traversed through the list, and in the example above, the protocol is re-assigned to Answer->protocol, IPPROTO_TCP, its value is 6;

3) Initialize SK with matching protocol family operation set;

Combined with the source code, the sock variable's ops points to the INET_STREAM_OPS structure variable;

4) Assigning sock structure variables Net/socket.c:sys_socket ()->sock_create ()->__sock_create ()->net/ipv4/af_inet.c:inet_ Create ()->net/core/sock.c:sk_alloc ():

Among them, Answer_prot points to Tcp_prot structure variables;

Where Sk_prot_alloc allocates sock structure variables; Because a buffer zone is allocated for different sockets in Inet_init, the sock struct variable allocates space in the buffer, and after the allocation is complete, it does some initialization work:

i) initialize the Sk_prot and sk_prot_creator of the SK variable;
II) initialize the wait queue for the SK variable;
III) Set the net spatial structure and increase the reference count;

6. Establish the relationship between the socket structure and the sock structure:

inet = Inet_sk (SK);

Why is it possible to force the sock structure variable into a inet_sock structure variable directly? There is only one possibility, that is, when allocating sock structure variables, the real distribution is inet_sock or other structures;

Let's go back to the code that allocates the Sock struct (refer to the previous 5.4 bars: NET/CORE/SOCK.C):

static struct sock *sk_prot_alloc (struct proto *prot, gfp_t priority, int family) {struct sock *sk; struct Kmem_cache *SL Ab Slab = prot->slab; if (slab! = NULL) SK = Kmem_cache_alloc (slab, priority); Else SK = Kmalloc (prot->obj_size, priority); Return SK; }

The above code allocates the sock structure in two ways, one is from the TCP dedicated cache, and the other is directly allocated from memory, when the cache is initialized, the struct size is prot->obj_size; and the latter has a specified size of prot->obj_ Size

Based on this, let's look at the Obj_size (net/ipv4/tcp_ipv4.c) in the Tcp_prot variable:

. obj_size = sizeof (struct tcp_sock),

In other words, the actual structure of the allocation is Tcp_sock, because the Tcp_sock, Inet_connection_sock, Inet_sock, sock are 0 offsets, so you can directly convert Tcp_sock directly to Inet_sock.


2) Establish socket, sock relationship
After the sock variable is created, it initializes the sock struct and establishes a reference relationship between the sock and the socket, and the call chain is as follows:
Net/socket.c:sys_socket ()->sock_create ()->__sock_create ()->net/ipv4/af_inet.c:inet_create ()->net/ Core/sock.c:sock_init_data ():
The main work of this function is:
A. Initialize the buffer, queue, etc. of the sock structure;
B. The state of the initialized sock structure is tcp_close;
C. Establish a cross-referencing relationship between sockets and sock structures;


7. Initialize the sock with the TCP protocol:
The Inet_create () function finally initializes the sock structure with the appropriate protocol: this is called the Tcp_prot init hook function Net/ipv4/tcp_ipv4.c:tcp_v4_init_sock (), which is primarily for tcp_ Sock and Inet_connection_sock do some initialization;

8, Socket and File System association:

Once you have created a socket-related structure, you need to associate it with the file system, as described in the SOCK_MAP_FD () function:

1) Apply the file descriptor, and assign the structure of files and directory items;
2) associated socket related file Operation function table and table of contents item operation function;
3) point the file->private_date to the socket;

After the socket is associated with the file system, the socket can then be manipulated by the file system read/write;

The creation process of socket in Linux kernel source analysis (Summary nature)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.