Sock structure and Socket Structure

Source: Internet
Author: User
Tags connection reset

Some time ago I read some TCP/IP protocol stack things. There are many things I want to write and I don't have time to do.


//************************************** ************************************

/* 1. Each opened file, socket, and so on are represented by a file data structure. In this way, the file and socket are differentiated by members in inode-> U (union:
Struct inode {
.....................
Union {
Struct ext2_inode_info ext2_ I;
Struct ext3_inode_info ext3_ I;
Struct socket socket_ I;
.....................
} U ;};
2. Each socket data structure has a sock data structure member. Sock is an extension of the socket. The two correspond one by one. socket-> SK points to the corresponding sock, sock-> socket
Point to the corresponding socket;
3. socket and sock are two sides of the same thing. Why not combine the two data structures into one? This is because the socket is part of the inode structure, that is, the inode
An internal union is used as the socket structure. Due to the special nature of plug-in operations, this data structure requires a large number of structural components, if you put all these components into the socket
In the structure, the union in the inode structure will become very large, so the inode structure will become very large, and the Union in other file systems does not need to be so large,
Therefore, it will cause a huge waste. The number of inode structures used in the system far exceeds the number of sockets used. Therefore, the solution is to divide the plug-in into two parts
The unified relationship is closely placed in the socket structure, and the communication relationship is closely put in another separate structure sock;
*/

Struct socket
{
Socket_state state; // This State indicates the current status of the socket.
Typedef Enum {
Ss_free = 0,/* not allocated */
Ss_unconnected,/* unconnected to any socket */
Ss_connecting,/* in process of connecting */
Ss_connected,/* connected to socket */
Ss_disconnecting/* in process of disconnecting */
} Socket_state;
Unsigned long flags; // The possible values of this Member are as follows. This flag is used to set whether the socket is busy.
# Define sock_async_nospace 0
# Define sock_async_waitdata 1
# Define sock_nospace 2
Struct proto_ops * OPS; // operation function pointer of a specific protocol family bound to the socket according to the protocol. For example, IPv4 TCP is inet_stream_ops.
Struct inode * inode; // indicates the inode to which the socket belongs
Struct fasync_struct * fasync_list; // asynchronous wake-up queue
Struct file * file; // file pointer
Struct sock * SK; // sock pointer
Wait_queue_head_t wait; // waiting queue of sock. Sleep is in this queue when TCP needs to wait.
Short Type; // indicates the type of the socket in a specific protocol family, such as sock_stream,
Unsigned char passcred; // this parameter is not required for TCP analysis.
};

//************************************** ************************************

Struct sock {
/* Five factors used by the socket to match the incoming package */
_ U32 daddr; // dip, foreign IPv4 ADDR
_ U32 rcv_saddr; // record the address bound local IPv4 ADDR bound to the socket
_ 2010dport; // dport
Unsigned short num;/* the port number of the socket. If the port number is smaller than 1024, It is the privileged port. Only the privileged user can bind it.
The system will provide an unallocated user port when the port is set to zero. If raw socket is used, the num can be used again.
Save the protocol in socket (INT family, int type, int protocol) instead of the port number.
Assign the source port number to the member, and the sport member obtains the source port number from the member */
Int bound_dev_if; // bound device index if! = 0

/* Master hash chain. The allocated ports are indexed by tcp_hashinfo. _ tcp_bhash. The index slot structure is tcp_bind_hashbucket, and the port binding structure is described by tcp_bind_bucket,
It contains the pointer (owners) pointing to the socket bound to the port, and the sk-> Prev pointer of the socket points to the binding structure.
*/
Struct sock * next;
Struct sock ** pprev;
/* SK-> bind_next and SK-> bind_pprev are used to describe the sockets bound to the same port, such as the HTTP server */
Struct sock * bind_next;
Struct sock ** bind_pprev;
Struct sock * Prev;

Volatile unsigned char state, zapped; // connection state, zapped does not need to be considered in TCP Analysis
_ Sport; // source port, see num

Unsigned short family; // protocol family, such as pf_inet
Unsigned char reuse; // whether the address can be reused. Only raw is used.
Unsigned char shutdown; // determines whether the socket connection is closed in a certain direction or in both directions.
# Define shutdown_mask 3
# Define rcv_shutdown 1
# Define send_shutdown 2
Atomic_t refcnt; // reference count
Socket_lock_t lock; // lock flag. Each socket has a spin lock, which provides a synchronization mechanism during user context and Soft Interrupt Processing.
Typedef struct {
Spinlock_t slock;
Unsigned int users;
Wait_queue_head_t WQ;
} Socket_lock_t;
Wait_queue_head_t * sleep; // sleep queue of the thread to which the sock belongs
Struct dst_entry * dst_cache; // route cache of the destination
Rwlock_t dst_lock; // The lock when the dst_entry value is assigned to the socket

/* Sock's sending and receiving operations all occupy the memory, that is, the sending buffer and the receiving buffer. The system has limits on the memory usage. Generally, each sock is counted from the quota.
Pre-allocated: forward_alloc:
1) For example, if a SKB is received, it is calculated to rmem_alloc and deducted from forward_alloc. After receiving and processing (such as user-mode reading), The SKB is released and
Use tcp_rfree () to return the memory of the SKB to forward_alloc.
2) Send an SKB and put it in the sending buffer temporarily. This should also be calculated to wmem_queued and deducted from forward_alloc. Released after actually sent
And forward_alloc. When you deduct the value from forward_alloc, it is possible that forward_alloc is not enough. In this case, you need to call tcp_mem_schedule () to add
Add forward_alloc. Of course, you can add it if you don't want to add it. The system has a general limit on the memory usage of the entire TCP, that is, sysctl_tcp_mem [3]. For each sock
The memory usage is limited, that is, sysctl_tcp_rmem [3] and sysctl_tcp_wmem [3]. Forward_alloc can only meet these limits (with certain flexibility)
Yes. When the memory is insufficient, tcp_mem_reclaim () will be called to reclaim the pre-allocated quota of forward_alloc.
*/
Int rcvbuf; // accept the buffer size (byte)
Int sndbuf; // size of the sending buffer (byte)
Atomic_t rmem_alloc; // number of bytes of data stored in the receiving queue
Atomic_t wmem_alloc; // number of bytes of data stored in the sending queue
Int wmem_queued; // The total number of bytes of all sent data
Int forward_alloc; // number of remaining bytes pre-allocated

Struct sk_buff_head receive_queue; // receives the queue
Struct sk_buff_head write_queue; // sending queue
Atomic_t omem_alloc; // you do not need to consider * "O" is "option" or "other "*/

_ U32 saddr;/* indicates the real Sending address. Note that rcv_saddr is the address bound to the record socket, which may be broadcast or
Multicast: only the IP address of the interface can be used for the packet to be sent, instead of the broadcast or multicast address */
Unsigned int allocation; // The mode selected when the sock SKB is assigned, such as gfp_atomic or gfp_kernel.

Volatile char dead, // tcp_close.tcp_listen_stop.inet_sock_release
Done, // used to determine whether the socket has received fin. If yes, set this member to 1.
Urginline, // if this value is set to 1, it indicates that the emergency data is put in the normal data stream for processing, instead of processing it separately.
Keepopen, // whether to enable the timer
Linger, // lingertime, specifying the retention time after close ()
Destroy, // do not need to consider in TCP Analysis
No_check, // check whether the sent SKB is checked and valid only for UDP
Broadcast, // Whether broadcast is allowed, only valid for UPD
Bsdism; // this parameter is not required for TCP analysis.
Unsigned char debug; // skip this step during TCP analysis.
Unsigned char rcvtstamp; // whether to send the timestamp of the received SKB to the app
Unsigned char use_write_queue; // The value is initialized to 1 in init.
Unsigned char userlocks; // a combination of the following values to change the execution sequence of operations such as package receiving.
# Define sock_sndbuf_lock 1
# Define sock_rcvbuf_lock 2
# Define sock_bindaddr_lock 4
# Define sock_bindport_lock 8
Int route_caps; // indicates the route information used by the sock.
Int proc; // Save the PID of the user thread
Unsigned long lingertime; // a combination of lingertime, indicating the retention time after close ()
Int hashent; // store the hash value of 4 yuan
Struct sock * pair; // skip this step during TCP analysis.

Struct {// when sock is locked, put the received data here first
Struct sk_buff * head;
Struct sk_buff * tail;
} Backlog;

Rwlock_t callback_lock; // protection lock for internal operations of sock-related functions
Struct sk_buff_head error_queue; // queue of error messages, rarely used
Struct proto * prot; // For example, point to tcp_prot

Union {// Private TCP-related data storage
Struct tcp_opt af_tcp;
.............
} Tp_pinfo;

Int err, // save various errors, such as econnreset Connection reset by peer, which will affect the processing of subsequent processes.
Err_soft; // save various soft errors, such as eproto protocol error, which will affect the processing of subsequent processes.
Unsigned short ack_backlog; // Number of accept instances
Unsigned short max_ack_backlog; // maximum number of accept records
_ U32 priority;/* packet queueing priority, used to set the TOS field. packets with a higher priority may be processed first, depending on the device's queueing discipline. see so_priority */
Unsigned short type; // For example, sock_stream, sock_dgram, or sock_raw
Unsigned char localroute; // route locally only if set-set by so_dontroute option.
Unsigned char protocol; // protocol in socket (INT family, int type, int Protocol)
Struct ucred peercred; // skip this step during TCP analysis.
Int rcvlowat;/* declares that the data is transmitted by the user who starts to send data (so_sndlowat) or is receiving data (so_rcvlowat)
The minimum number of bytes in the front buffer. in Linux, these two values cannot be changed, fixed to 1 byte .*/
Long rcvtimeo; // timeout setting for receiving, and an error is reported during timeout.
Long sndtimeo; // timeout setting for sending, and an error is reported during timeout

Union {// Private iNet-related data storage
Struct inet_opt af_inet;
.................
} Protinfo;

/* The timer is used for so_keepalive (I. e. Sending occasional keepalive probes to a remote site-by default, set to 2 hours in
Stamp is simply the time that the last packet was encoded ed .*/
Struct timer_list timer;
Struct timeval stamp;
Struct socket * socket; // corresponding socket
Void * user_data; // Private Data, which does not need to be considered during TCP Analysis

/* The state_change operation is called whenever the status of the socket is changed. Similarly, data_ready is called
When data have been received, write_space when free memory available for writing has increased and error_report
When an error occurs, backlog_rcv When socket locked, putting SKB to backlog, destruct for release this sock */
Void (* state_change) (struct sock * SK );
Void (* data_ready) (struct sock * SK, int bytes );
Void (* write_space) (struct sock * SK );
Void (* error_report) (struct sock * SK );
INT (* backlog_rcv) (struct sock * SK, struct sk_buff * SKB );
Void (* destruct) (struct sock * SK );
};

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.