Linux IPv4 protocol implementation

Source: Internet
Author: User
Name)

IP-Linux IPv4 protocol implementation

Synopsis (Overview)

# Include <sys/socket. h>
# Include <net/netinet. h>

Tcp_socket= Socket (pf_inet, sock_stream, 0 );
Raw_socket= Socket (pf_inet, sock_raw,Protocol);
Udp_socket= Socket (pf_inet, sock_dgram,Protocol);

Description)

Linux implementation describes the Internet protocol in rfc791 and rfc1122, version 4.IPThis includes the implementation of multi-channel broadcast technology that complies with rfc1112 Layer 2. It also includes IP Routers with packet filters.

Programmer interfaces are compatible with BSD sockets. For more information about sockets, seeSocket(7)

To create an IP socketSocket (pf_inet, socket_type, Protocol)Method callSocket(2) functions. Valid socket types (socket_type) include:Sock_streamUsed to openTCP(7) socket,Sock_dgramUsed to openUDP(7) socket, orSock_rawUsed to openRaw(7) sockets are used to directly access the IP protocol.ProtocolIt refers to the IP protocol included in the IP header identifier (header) to be received or sent. It is unique and valid for TCP sockets.ProtocolThe value is0AndIpproto_tcpUnique and valid for UDP socketsProtocolThe value is0AndIpproto_udp.ForSock_rawYou can specify a valid iana ip protocol code defined in rfc1700 to assign values.

When a process wants to accept a new access package or connection, it should useBind(2) bind a socket to a local interface address. Only one IP socket can be bound to any given local address (address, Port). It is declared when BIND is called.Inaddr_anyThe socket will be boundAllLocal interface. When called on an unbound socketListen(2) orConnect(2) The socket is automatically bound to a local addressInaddr_anyRandom idle port.

Unless you have setS0_reuseaddrOtherwise, a bound TCP local socket address is unavailable for a period of time after it is disabled. Be careful when using this identifier because it will make TCP unreliable.

Address format)

An IP socket address is defined as a combination of an IP interface address and a port number. The basic IP Protocol does not provide a port number.UDP(7) andTCP(7). For raw sockets,Sin_portSet to IP protocol.

 

 

Struct sockaddr_in {sa_family_t sin_family;/* address family: af_inet */u_int16_t sin_port;/* port in byte order */struct in_addr sin_addr;/* Internet address */}; /* Internet address. */struct in_addr {u_int32_t s_addr;/* address in byte order */};

Sin_familyAlways setAf_inet. This is required. In Linux 2.2, if this setting is missing, most network functions will returnEinval Sin_portContains the port numbers sorted by network bytes. The port numbers below 1024 are calledReserved port.Only valid user IDs are 0 orCap_net_bind_serviceOnly functional processes can be used.Bind(2) to these sockets, note that the original (raw) IPv4 protocol does not have such a port concept. They only use a higher protocol suchTCP(7) andUDP(7.

Sin_addrThe IP host address.Struct in_addrInADDRSome include host interface addresses in byte order of the network.In_addrYou can only useInet_aton(3 ),Inet_addr(3 ),Inet_makeaddr(3) database functions or directly use the name Parser (seeGethostbyname(3. IPv4 addresses are divided into single-point broadcast, broadcast transmission, and multi-point broadcast addresses. the single-point broadcast address specifies a single interface of a host. The broadcast address specifies all hosts in a CIDR Block, while the multi-point broadcast address addresses all hosts in a multi-point transfer group. only when the socket ID is setSo_broadcastIn the current implementation, connection-oriented sockets can only use a single point of transfer address.

Note that the address and port are always stored in byte order, which means you need to call the number assigned to the port.Htons(3).All Address/port processing functions in the standard library are run in byte network.

There are several special addresses:Inaddr_loopback(127.0.0.1) always represents the local host through the loop device;Inaddr_any(0.0.0.0) indicates any address that can be bound;Inaddr_broadcast(255.255.255.255) indicates any host, which is boundInaddr_anyIt has the same effect.

Socket options)

IP supports protocol-related socket options.Setsockopt(2) Set and useGetsockopt(2) The socket option level for reading. IP isSol_ip

 

Ip_options
Set or obtain the IP option for each packet sent by the socket. this parameter is a pointer to the storage buffer containing the option and option length. Setsockopt(2) The system call sets the IP Option associated with a socket. The maximum IPv4 Option Length is 40 bytes. See rfc791 to obtain available options. If Sock_streamWhen the initial connection request packet received by the socket contains the IP option, the IP option is automatically set to the option from the initial package, and the Routing header is reversed. after the connection is established, the access package modification option is not allowed. by default, the source route option for all incoming packets is disabled. You can use Accept_source_routeSysctl to activate. Other options such as timestamp are still processed. For datagram sockets, IP options can only be set by local users. Ip_optionsOf Getsockopt(2) The current IP Option for sending will be placed in the buffer you provide.

 

Ip_pktinfo
Pass an include PktinfoStructure (this structure provides some information about the access package) Ip_pktinfoAuxiliary information. This option is only valid for the socket of the datagram class.
Struct in_pktinfo {unsigned int ipi_ifindex;/* interface Index */struct in_addr ipi_spec_dst;/* Route Destination Address */struct in_addr ipi_addr;/* Header ID Destination Address */};
Ipi_ifindexIt refers to the unique index of the interface for receiving packets. Ipi_spec_dstIt refers to the destination address in the route table record, and Ipi_addrThe destination address in the header. If Ip_pktinfo,Then the outgoing packet will pass through Ipi_ifindexIn Ipi_spec_dstSet as the destination address.

 

Ip_recvtos
If this option is enabled Ip_tos,The secondary information is transmitted together with the incoming packet. It contains a byte used to specify the type of the service/priority field in the header. This byte is a Boolean integer identifier.

 

Ip_recvttl
When this identifier is set, a message containing the time to live field of the received packet expressed in one byte is sent. Ip_recvttlControl information. This option is not supported yet Sock_streamSocket.

 

Ip_recvopts
Use one Ip_optionsControl information transfer all access IP options to the user. Route Header ID and other options have been filled for the local host. This option is not supported Sock_streamSocket.

 

Ip_retopts
Equivalent Ip_recvoptsHowever, the original unprocessed option with a timestamp and the route record item that is not entered in the route section are returned.

 

Ip_tos
Set or receive the type-of-Service (TOS service type) field of each IP package originating from this socket. it is used to subscribe the priority in the network. TOS is a single-byte field. some standard TOS identifiers are defined: Iptos_lowdelayUsed to minimize latency for interactive communication, Iptos_throughputUsed to optimize throughput, Iptos_reliabilityUsed for reliability optimization, Iptos_mincostIt should be used as "fill data". For this data, low-speed transmission is irrelevant. at most, only one of these TOS values can be declared. others are invalid and should be cleared. lack of time-saving, Linux first sends Iptos_lowdelayBut the exact method depends on the configured queuing rules. A valid user ID 0 or Cap_net_adminCapability. Priority can also be passed in protocol-independent mode ( Sol_socket, so_priority) Socket options (see Socket(7.

 

Ip_ttl
Sets or retrieves the current survival time field of the packet sent from this socket.

 

Ip_hdrincl
If yes, you can provide an IP header before user data. Sock_rawValid. See Raw(7) to obtain more information. When this identifier is activated, its value is Ip_optionsSet and Ip_tosIgnored.

 

Ip_recverr
Reliable error messages that allow passing extensions. if this identifier is activated in the data report, all generated errors will be queued up in an error queue for each socket. when you receive an error from the socket operation, you can set it by calling Msg_errqueueIdentified Recvmsg(2) To receive. The description is incorrect. Sock_extended_errThe structure is Ip_recverr,Level: The auxiliary information of sol_ip is transmitted.This option is useful for reliably handling errors on unconnected sockets. the received data section of the error queue contains the error packet.
Use the IP address as follows: Sock_extended_errStructure: ICMP packet receipt Error Ee_originSet So_ee_origin_icmp,Set the local error So_ee_origin_local. Ee_typeAnd Ee_codeSet as the type and code field of the ICMP header identity. Ee_infoInclude EmsgsizeMTU. Ee_dataNot used currently. When the error comes from the network, all IP options on the socket are activated ( Ip_options, Ip_ttl, Etc.) and as a control information containing the transfer in the error package. The payload of the packet that causes the error will be returned with normal data.
In Sock_streamSocket, Ip_recverrThere will be slight differences in semantics. it does not save the next time-out error, but immediately transmits all incoming errors to the user. this is useful when the TCP connection time is short, because it requires fast error handling. be careful when using this option: because it does not allow proper recovery from route transfers and other normal conditions, it makes TCP unreliable and undermines protocol specifications. note that there is no error queue in TCP; Msg_errqueueFor Sock_streamThe socket is invalid. Therefore, all errors will be returned by the socket function, or only So_error.
For raw sockets, Ip_recverrAllow all received ICMP errors to be passed to the application. Otherwise, the error is reported only on the connected socket.
It sets or retrieves an integer Boolean ID. Ip_recverrThe default value is off ).

 

Ip_pmtu_discover
Set or receive path MTU Discovery setting for the socket (path MTU Discovery setting ). when allowed, Linux will execute the path MTU discovery (path MTU found) defined in rfc1191 on this socket ). the don't segment identifier is set in all outgoing data reports. system-level default values are as follows: Sock_streamSocket Ip_no_pmtu_discSysctl control, and all other sockets are blocked. Sock_streamFor sockets, the user has the responsibility to block the data according to the MTU size and re-transmit the data if necessary. Emsgsize), The kernel will reject packages larger than the known path MTU.

Path MTU discovery (path MTU discovery) identifier Description
Ip_pmtudisc_want Set each path.
Ip_pmtudisc_dont No path MTU discovery (path MTU found ).
Ip_pmtudisc_do Path MTU discovery (path MTU discovery ).

 

When PMTU (path MTU) is allowed for search, the kernel automatically records the path MTU (path MTU) of each target host.Connect(2) It is convenient to connect to a specified peer machine.Ip_mtuThe socket option retrieves the currently known path MTU (path MTU) (for example, whenEmsgsizeAfter an error occurs). It may change over time. For a non-connected socket with many destinations, the new MTU of a specific destination can also use the error Queue (seeIp_recverr) To access the access. New errors will be queued for each incoming MTU update.

When MTU is searched, the initial packet from the datagram socket may be discarded. UDP-enabled applications should be aware of this and consider the packet relay transfer policy.

To boot the path MTU to discover the process on an unconnected socket, we can start it with a large datagram (with a header size greater than 64 KB) and gradually contract it by updating the path MTU.

To obtain the initial estimation of the path MTU connection, you can useConnect(2) connect a datagram socket to the destination address and callIp_mtu Option Getsockopt(2) retrieve the MTU.

 

Ip_mtu
Retrieves the current known path MTU of the current socket. It is valid only when the socket is connected. An integer is returned. Getsockopt(2) Valid.
Ip_router_alert
Set the IP router warning (IP routeralert option) option for all packets to be forwarded on the socket. valid only for raw socket, which is useful for RSVP backend daemon in user space. the decomposed packages cannot be forwarded by the kernel. You have the responsibility to forward them. the socket binding is ignored. These packets are only filtered by protocol. an integer ID is required.
Ip_multicast_ttl
Set or read the survival time value of the Multi-Point broadcast package of the socket. this is important for setting the possible minimum TTL for multicast packets. the default value is 1, which means that the multicast packet does not exceed the bandwidth segment unless explicitly required by the user program. the parameter is an integer.
Ip_multicast_loop
Sets or reads a Boolean integer parameter to determine whether the multicast broadcast packet sent should be sent back to the local socket.
Ip_add_membership
Add a multicast group. The parameter is Struct ip_mreqnStructure.

 

Struct ip_mreqn {struct in_addr imr_multiaddr;/* IP multicast group address */struct in_addr imr_address;/* IP address of the Local interface */INT imr_ifindex;/* interface Index */};
Imr_multiaddrThe address of the multicast group to which the application wants to join or exit. It must be a valid multicast address. Imr_addressThe Local interface address used by the system to add multicast groups. Inaddr_anyConsistent, then the system selects an appropriate interface. Imr_ifindexIndicates to join/detach Imr_multiaddrGroup interface index, or set to 0 to indicate any interface.
Because of compatibility, the old Ip_mreqThe interface is still supported. Ip_mreqnThere is only one difference, that is, it does not include Imr_ifindexField. Setsockopt(2.
Ip_drop_membership
Disconnects from a multicast group. The parameter is Ip_mreqnOr Ip_mreqStructure, which corresponds Ip_add_membershipSimilar to. t p Ip_multicast_ifSet the local device for the multicast socket. The parameter is Ip_mreqnOr Ip_mreqStructure, which corresponds Ip_add_membershipSimilar.
When an invalid socket option is passed Enoprotoopt.

 

Sysctls

The IP protocol supports the sysctl interface to configure some global options. sysctl can be read or written/Proc/sys/NET/IPv4 /*File or useSysctl(2) interface for access.

Ip_default_ttl
Set the default survival time value of the packet outside. This value can be used for each socket Ip_ttlOption to modify.
Ip_forward
Use a Boolean flag to activate the IP forwarding function. You can also set IP Forwarding according to the interface.
Ip_dynaddr
Dynamic socket address and disguise record rewriting when the interface address is changed. this is useful for dialing interfaces with changed IP addresses. 0 indicates that the data is not overwritten. 1 enables the function, and 2 activates the redundancy mode.
Ip_autoconfig
No documentation
Ip_local_port_range
Contains two integers, defining the local port range allocated to the socket by default. the allocation starts with the first number and ends with the second number. note that these ports cannot conflict with the ports used in disguise (although this can be handled ). at the same time, random selection may cause some firewall package filter problems, they will mistakenly think that the local port is in use. the first number must be at least> 1024, preferably> 4096 to avoid conflict with the well-known port, thus minimizing firewall problems.
Ip_no_pmtu_disc
If enabled, MTU is not executed on TCP socket by default. if a firewall (used to discard all ICMP packets) or an interface is mistakenly configured on the path (for example, an end-to-end connection with different MTU ports is set ), path MTU may fail. it is better to repair the damaged vro on the path than to close the MTU throughout the whole process, because doing so will lead to high sales on the network.
Ipfrag_high_thresh, ipfrag_low_thresh
If the number of IP fragments waiting in the queue reaches Ipfrag_high_thresh,The queue is empty Ipfrag_low_thresh.This contains an integer that represents the number of bytes.
Ip_always_defrag
[New Feature in kernel 2.2.13; in earlier kernel versions, this feature was used during compilation Config_ip_always_defragOption to control]

When the Boolean identifier is activated (not equal to 0, this is generated when some host identification packages between the source and target are too large to be split into many fragments.) It will be combined (fragment) before processing ), even if they are to be forwarded immediately.

This is done only when a firewall or transparent proxy server with a single network connection is running. For normal routers or hosts, never open it. otherwise, communication between fragments in different connections may be disrupted. in addition, fragment reorganization also takes a lot of memory and CPU time.

This is automatically enabled when camouflage or transparent proxy is configured.

Neigh /*
See ARP(7)

 

IOCTLs

AllSocket(7) The description of IOCTL can be applied to IP addresses.

The IOCTL used to configure the firewall application is recorded inIpchainsPackageIpfw(7.

IOCTL used to configure common device parametersNetdevice(7) There is a description.

Notes)

UseSo_broadcastOption-it does not have permission requirements in Linux. an accidental broadcast can easily overload the network. for new application protocols, it is best to use multicast groups instead of broadcast. we do not encourage the use of broadcast.

Some other BSD socket implementations provideIp_rcvdstaddrAndIp_recvifSocket options to obtain the destination address and interface for receiving data packets. Linux has a more commonIp_pktinfoTo complete the same task.

Errors (error)

Enobufs, eperm for eacces, etc .)

Enotconn
The operation only defines the connected socket, but the socket is not connected.
Einval
Invalid parameters are passed. For sending operations Blackhole)Caused by routing.
Emsgsize
The datagram is greater than the MTU in the path and cannot be split into fragments.
Eacces
Users who do not have the necessary permissions attempt to perform an operation that requires certain permissions, including So_broadcastSend a packet to the broadcast address when the ID is set. ProhibitedRoute sending package. Cap_net_adminOr, if the valid user ID is not 0, modify the firewall settings. Cap_net_bind_serviceWhen the capability or valid user ID is not zero, bind a reserved port.

 

Eaddrinuse
Try to bind to an existing address.
EnomemAnd Enobufs
Insufficient memory available.
EnoprotooptAnd Eopnotsupp
Invalid socket option passed.
Eperm
The user does not have the permission to set a high priority, modify the configuration or send signals to the request process or group.
Eaddrnotavail
Request an interface that does not exist or the source address of the request is not local.
Eagain
Operations on a non-blocking socket will be blocked.
Esocktnosupport
The socket is not configured or an unknown socket is requested.
Eisconn
Called on a connected socket Connect(2) .
Ealready
The connection operation on a non-blocking socket is in progress.
Econnaborted
Once Accept(2) The connection is closed during execution.
Epipe
The connection is accidentally closed or the connection is closed by the peer.
Enoent
It is called on a socket that has not been reported Siocstamp.
Ehostunreach
No valid route table record matches the destination address. This error can be caused by ICMP messages from the remote router or because of the local route table.
Enodev
The network device is unavailable or is not suitable for sending IP addresses.
Enopkg
The kernel subsystem is not configured.
Enobufs, enomem
There is not enough idle memory. This often means that the memory allocation is limited by the socket buffer limit, not because of the system memory, but this is not 100% correct.

Other errors may be generated by overlapping protocol families. SeeTCP(7 ),Raw(7 ),UDP(7) andSocket(7 ).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.