Linux/unix System Programming Manual--socket Chapter reading notes

Source: Internet
Author: User
Tags ack epoll

Socket Chapter Reading notes

Highly recommended Linux/unix system programming manual, known as the God book Beyond Apue.

Backlog meaning
#include <sys/socket.h>int listen(intint backlog)

The backlog parameter limits the number of pending connections (not accept), and within that number,connect will succeed immediately .
The upper limit on Linux is 128, defined in

UDP is connected to the socket

A UDP socket can also call connect (), which is called a connected socket, and the kernel logs the address of the socket's peer socket.
When a UDP socket is connected:

1. Packet send can use write or send, will automatically send to the peer socket, like SendTo, each write will send a separate packet
2. only the data sent by the peer socket can be read on this socket * *

0 O'Clock Port

Bind () is not called, TCP/UDP allocates a zero port (TCP server can also not call bind)
/proc/sys/net/ipv4/ip_local_port_range can be used to modify the scope, generally do high concurrency test, the test machine may need to modify to support multiple concurrency.

UDP avoids IP fragmentation

In general, UDP uses a conservative approach to avoid IP fragmentation, which ensures that the size of the transmitted IP packet is less than the IPV4 group buffer of 576 bytes. 8 own UDP head, need at least 20 bytes to hold the IP header, the remaining 548 bytes to hold the UDP packet. In practice, 512来 is generally chosen to store packets.

Shutdown and Close differences

SHUT_WR is called a semi-closed socket, and the end of the file notifies the peer that the local write end is closed.
Shoutdown and close One important difference: Shotdown closes the socket channel regardless of whether other file descriptors are also associated with the socket.
If SOCKFD points to a connected streaming socket, make the following call, the connection remains open, and you can still do I/O on the connection through the file descriptor fd2.

fd2=dup(sockfd);close(sockfd);

However, if you execute the following call, the connected bidirectional channel will be closed and I/O cannot be performed through FD2.

fd2=dup(sockfd);shutdown(sockfd,SHUT_RDWR);

If the socket file Descriptor fork () is duplicated, if a process performs a SHUT_RDWR operation on the descriptor copy after fork (), no other process can perform I/O on the file descriptor.

It is important to note that shutdown does not close the file descriptor, and you need to call Clsoe to close the file descriptor.

Tcp_cork Socket Options

HTTP uses sendfile () to send files, in order to provide bandwidth utilization, you can use the Tcp_cork to buffer the HTTP header, and data packets into a message.

int optval =1sizeof//write http headers// send dataoptval =0sizeof(optval));
Time_wait status

The Time_wait status value appears at the end of the active shutdown, and the migration to the closed state requires 2MSL (maximum message survival time). The MSL is the maximum estimated time that IP packets can survive in the network before they have eaten the TTL limit.
The time_wait in Linux will continue to be 60s.
Objective:

1. Achieve a reliable connection termination

If the last ACK sent is lost, wait for 2MSL to resend the last Ack, and if the active shutdown does not exist, the TCP protocol will send the RST message (RST will be interpreted as an error)

2. Expire old duplicate message segments in the network so that they are no longer accepted when a new connection is established

A new connection cannot be re-established by the same IP and port number when there is a TCP node in the TIME_WAIT state. The SO_REUSEADDR option can be used to avoid encountering eaddrinuse errors while still allowing the TIME_WAIT state to provide reliability guarantees.

Out-of-band data

Send () and recv () require a msg_oob flag that, when the socket is accepted for out-of-band data notification, the kernel generates a Sigurg signal for the socket owner (usually the process that uses the socket).

Out-of-band data is a feature of TCP sockets that allows the sending side to mark the transmitted data as high priority. TCP identifies emergency (out-of-band) data through the URG flag bit and points the emergency pointer to emergency data, but TCP does not perform an emergency data length and therefore considers the emergency data to consist of only one byte. Telnet,ftp, and so on, use this feature to terminate previously transmitted commands.

s.send("hello",socket.MSG_OOB)c.recv(100"o"c.recv(100"hell"
Decentralized aggregation IO

Readv and Writev can read and write multiple non-contiguous buffers in a single system call

#include <sys/uio.h>ssize_t readv(intconststructint iovcnt);ssize_t writev(intconststructint iovcnt)struct iovec{void//buffer//size of buffer}
I/O multiplexing, signal driven, epoll

Level Trigger Notifications:

If the file description can be non-blocking execution of the I/O system call, it is considered ready at this time. So every time you don't need to read a lot of data, multiple descriptor read balance.

EDGE Trigger Notification:

If the file descriptor has new I/O activity since the last state check, this triggers the notification. Because only the new state is notified, all data needs to be read at one time, otherwise it can result in data loss, which can cause other descriptors to starve.

I/O mode Horizontal Trigger Edge Trigger
Select,poll Yes
Signal-driven I/O Yes
Epoll Yes Yes
Select
#include <sys/time.h>#incldue <sys/select.h>int select(intstruct timevel *timeout) 返回就绪(3个集合)的描述符数量,0 超时,-1 错误

Exceptfds generally only occurs in the following two situations:

1. Change from device status on a pseudo-terminal master device connected to the envelope mode
2.TCP socket received out-of-band data

Nfds is the largest file descriptor +1in the collection to make select more efficient, and the kernel does not have to check whether descriptors larger than this value belong to this collection.

Eopll

max_user_watches: The total number of file descriptors per user that can be registered to the Epoll instance (/proc/sys/fs/epoll/max_user_watches, the default school value is calculated based on the available memory of the system)

Can study https://github.com/cloudwu/socket-server, and Tornado source code, these two are horizontal trigger, edge triggering can study nginx or Golang, of course nginx more suitable.

Typical asynchronous I/O reads and writes:

intSz = Write (fd, buffer, SZ);if(SZ <0) {Switch(errno) { CaseEINTR:Continue; CaseEagain:return-1;}intn = read (fd, buffer, SZ);if(n<0) {Switch(errno) { CaseEINTR: Break; CaseEagain:fprintf(stderr,"Socket-server:eagain capture.\n"); Break;default://Close when error        returnSocket_error; }return-1;}
Configuration parameters
    1. System Limit/etc/sysctl.conf
      Share from the Gopush group
Net.ipv4.ip_forward =0Net.ipv4.conf.default. Rp_filter =1Net.ipv4.conf.default. Accept_source_route =0KERNEL.SYSRQ =0Kernel.core_uses_pid =1KERNEL.MSGMNB =1048576Kernel.msgmax =1048576Kernel.shmmax =68719476736Kernek.shmall =4294967296Fs.file-max =1048576Kernel.pid_max =1048576Net.ipv4.tcp_syncookies =1Net.ipv4.tcp_synack_retries =2Net.ipv4.tcp_timestsmps =0Net.ipv4.tcp_tw_reuse =1Net.ipv4.tcp_tw_recycle =1Net.ipv4.tcp_fin_timeout = -Net.ipv4.tcp_keepalive_time = -Net.ipv4.ip_local_port_range =1024x768 65535Net.ipv4.tcp_max_syn_backlog =8192Net.ipv4.tcp_max_tw_buckets = theNet.ipv4.tcp_wmem =4096 4096 1677216Net.ipv4.tcp_rmen =4096 4096 1677216Net.ipv4.tcp_mem =94500000 91500000 92700000Net.ipv4.tcp_max_orphans =3276800Net.core.netdev_max_backlog =32768Net.core.somaxconn =32768Net.core.wmem_default =8388608Net.core.rmem_default =8388608Net.core.rmem_max =16777216Net.core.wmem_max =16777216Vm.overcommit_memory =1
    1. Login User Limit/etc/security/limits.conf
      Soft is a warning setting, hard is the threshold (bird's-cousin private cuisine)
* soft nofile 150000* hard nofile 150000

If you can set it directly with Ulimit-n 10000

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Linux/unix System Programming Manual--socket Chapter reading notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.