Go Language TCP network programming (verbose)

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

I. Preamble
One of the main design goals of Golang is to face large-scale back-end service programs, which is an essential part of the service-side process. In everyday applications, we can also see that the net in go and the packages under its subdirectories are "high-frequency + just-needed", while the TCP socket is the mainstream of network programming, even if you do not directly use the interface for TCP sockets in net, but net/ HTTP is always used, the HTTP underlying is still implemented with a TCP socket

Network programming, we most commonly used is TCP socket programming, in the POSIX standard comes out, socket in the major mainstream OS platform has been very good support. The best information about TCP programming is W. Richard Stevens's web Programming Bible, "UNIX Network Programming Volume 1: Socket Networking API," the book on the TCP socket interface of various uses, behavior patterns, exception handling explained very carefully. Go is a cross-platform programming language with runtime, and the TCP socket API that is exposed to language users in Go is built on the OS native TCP socket interface. Due to the need of Go runtime scheduling, the Golang TCP socket interface differs from the OS native interface in terms of behavioral characteristics and exception handling. The goal of this blog post is to sort out how to use the go TCP socket in each scenario, the behavior characteristics, and the considerations

Second, the model

Since the birth of TCP sockets, the Network programming architecture model has evolved, roughly: "One connection per process" –> "one connection per thread" –> "Non-block + I/O multiplexing (Linux epoll/windows Iocp/freebsd Darwin Kqueue/solaris Event Port) ". With the evolution of the model, the service program is more powerful, can support more connections and get better processing performance.

At present, the mainstream Web server is generally used "non-block + I/O multiplexing" (some also combine multi-threaded, multi-process). But I/O multiplexing also brings a lot of complexity to the user, and there are many high-performance I/O multiplexing frameworks, such as Libevent, Libev, LIBUV, to help developers simplify development complexity and reduce mental burdens. But the designers of go seem to think that the way the I/O multiplexing, which separates the control flow through the callback mechanism, is still complex and runs counter to the "General logic" design, which hides the "complexity" in runtime for this go language : Go developers don't have to focus on whether the socket is Non-block, nor need to personally register the file descriptor callback, only in each connection corresponding to the Goroutine "block I/O" approach to socket processing , which can be said to greatly reduce the burden of the developer's mind. A typical go server-side program is roughly as follows

//go-tcpsock/server.gofunc HandleConn(conn net.Conn) {    defer conn.Close()    for {        // read from the connection        // ... ...        // write to the connection        //... ...    }}func main() {    listen, err := net.Listen("tcp", ":8888")    if err != nil {        fmt.Println("listen error: ", err)        return    }    for {        conn, err := listen.Accept()        if err != nil {            fmt.Println("accept error: ", err)            break        }        // start a new goroutine to handle the new connection        go HandleConn(conn)    }}

The "block socket" in the goroutine seen in the user layer is actually "simulated" through the netpoller of the non-block socket + I/O multiplexing mechanism through the go runtime, the real underlying The socket is actually non-block, but runtime intercepts the underlying socket system call error code, and through Netpoller and goroutine scheduling let Goroutine "block" on the user layer to get the socket FD. For example: When the user layer initiates a read operation for a socket FD, if there is no data in the socket FD, then runtime will add the socket FD to the Netpoller and the corresponding Goroutine is suspended. Until runtime receives a ready notification of the socket FD data, the runtime will wake up waiting for the goroutine to be read on the socket FD. And this process from a goroutine point of view, like a read operation has been block on the socket FD. Specific implementation details will be added in the following scenario

Third, the establishment of TCP connection

It is well known that the establishment of a TCP socket connection requires a three-time handshake process between the client and the server. During the connection establishment process, the server is a standard listen + accept structure (refer to the code above), while the client go language uses net. Dial () or net. Dialtimeout () for connection establishment

Blocking dial:

    conn, err := net.Dial("tcp", "www.baidu.com:80")    if err != nil {        //handle error    }    //read or write on conn

Dial of the timeout mechanism:

    conn, err := net.DialTimeout("tcp", "www.baidu.com:80", 2*time.Second)    if err != nil {        //handle error    }    //read or write on conn

For clients, connection creation encounters the following scenarios:

1. The network is unreachable or the other service is not started
If the addr passed to dial can be immediately determined that the network is unreachable, or the service in the addr port is not started and the port is not being monitored, dial will return an error almost immediately, such as:

//go-tcpsock/conn_establish/client1.go... ...func main() {    log.Println("begin dial...")    conn, err := net.Dial("tcp", ":8888")    if err != nil {        log.Println("dial error:", err)        return    }    defer conn.Close()    log.Println("dial ok")}

If the 8888 port does not have a service program to listen to, then execute the above program, dial will soon return error:

$go run client1.go2015/11/16 14:37:41 begin dial...2015/11/16 14:37:41 dial error: dial tcp :8888: getsockopt: connection refused

2. The listen of each other's service is full
Another scenario is that the other server is busy, there are a large number of client connection attempts to establish, the server side of the Listen backlog queue full, server accept not timely (even if not accept, then in the backlog number category , connect will be successful because new Conn has been added to the Listen queue in Server side, and accept simply takes a conn out of the queue, which will cause client-side dial to block. We still use examples to feel dial's behavioral characteristics:
Service-Side code:

//go-tcpsock/conn_establish/server2.go... ...func main() {    l, err := net.Listen("tcp", ":8888")    if err != nil {        log.Println("error listen:", err)        return    }    defer l.Close()    log.Println("listen ok")    var i int    for {        time.Sleep(time.Second * 10)        if _, err := l.Accept(); err != nil {            log.Println("accept error:", err)            break        }        i++        log.Printf("%d: accept a new connection\n", i)    }}

Client code:

//go-tcpsock/conn_establish/client2.go... ...func establishConn(i int) net.Conn {    conn, err := net.Dial("tcp", ":8888")    if err != nil {        log.Printf("%d: dial error: %s", i, err)        return nil    }    log.Println(i, ":connect to server ok")    return conn}func main() {    var sl []net.Conn    for i := 1; i < 1000; i++ {        conn := establishConn(i)        if conn != nil {            sl = append(sl, conn)        }    }    time.Sleep(time.Second * 10000)}

From the program can be seen, the service side after listen success, every 10s clock to accept once. The client is a serial attempt to establish a connection. The results of these two programs under Darwin's execution:

$go run server2.go2015/11/16 21:55:41 listen ok2015/11/16 21:55:51 1: accept a new connection2015/11/16 21:56:01 2: accept a new connection... ...$go run client2.go2015/11/16 21:55:44 1 :connect to server ok2015/11/16 21:55:44 2 :connect to server ok2015/11/16 21:55:44 3 :connect to server ok... ...2015/11/16 21:55:44 126 :connect to server ok2015/11/16 21:55:44 127 :connect to server ok2015/11/16 21:55:44 128 :connect to server ok2015/11/16 21:55:52 129 :connect to server ok2015/11/16 21:56:03 130 :connect to server ok2015/11/16 21:56:14 131 :connect to server ok... ...

It can be seen that the client initially successfully established 128 connections at a time, and then each block close to 10s to successfully establish a connection. In other words, when the server side is full (not in time), the client will block on dial until the server has an accept. As for why 128, this is related to the default settings under Darwin:
If I run the above server program on Ubuntu 14.04, our client side can initially successfully establish 499 connections.

If the server has not accept,client end will always block? The result of our removal of the Accept: Under Darwin, the client will block for about 1 minutes before returning to timeout:
And if the server is running on Ubuntu 14.04, the client seems to be blocking, and I've waited more than 10 minutes and still haven't returned. Blocking or not appears to be related to server-side network implementations and settings

3, network latency is large, dial blocking and timeout
If the network latency is large, the TCP handshake process will be more difficult and bumpy (various drops), time consumption will naturally be longer. Dial is blocked at this point, and dial returns "Getsockopt:operation timed out" error if the connection cannot be established for a long time

In the connection setup phase, most of the time, dial is able to meet demand, even if it is blocked for a little while. However, for some programs, a strict connection time limit is required, and if a connection is not successfully established within a certain period of time, the program may need to execute an "exception" processing logic, which we need to dialtimeout. In the following example, the maximum blocking time for dial is limited to 2s, exceeding this time, and dial will return the timeout error:

//go-tcpsock/conn_establish/client3.go... ...func main() {    log.Println("begin dial...")    conn, err := net.DialTimeout("tcp", "104.236.176.96:80", 2*time.Second)    if err != nil {        log.Println("dial error:", err)        return    }    defer conn.Close()    log.Println("dial ok")}

The execution results are as follows, and you need to simulate a large network latency environment

$go run client3.go2015/11/17 09:28:34 begin dial...2015/11/17 09:28:36 dial error: dial tcp 104.236.176.96:80: i/o timeout

Iv. socket Reading and writing

After the connection is established, we are going to read and write on the conn to complete the business logic. As I said before, Go runtime hides the complexities of I/O multiplexing. language users only need to use Goroutine+block I/O mode to meet most scenario requirements . After the dial succeeds, the method returns a net. Conn interface type variable value, the dynamic type of this interface variable is a *tcpconn:

//$GOROOT/src/net/tcpsock_posix.gotype TCPConn struct {    conn}

Tcpconn has a unexported type embedded in it: Conn, so Tcpconn "inherits" Conn's Read and write methods, and subsequent write and read methods that are called by dial return values are Net.conn methods:

//$GOROOT/src/net/net.gotype conn struct {    fd *netFD}func (c *conn) ok() bool { return c != nil && c.fd != nil }// Implementation of the Conn interface.// Read implements the Conn Read method.func (c *conn) Read(b []byte) (int, error) {    if !c.ok() {        return 0, syscall.EINVAL    }    n, err := c.fd.Read(b)    if err != nil && err != io.EOF {        err = &OpError{Op: "read", Net: c.fd.net, Source: c.fd.laddr, Addr: c.fd.raddr, Err: err}    }    return n, err}// Write implements the Conn Write method.func (c *conn) Write(b []byte) (int, error) {    if !c.ok() {        return 0, syscall.EINVAL    }    n, err := c.fd.Write(b)    if err != nil {        err = &OpError{Op: "write", Net: c.fd.net, Source: c.fd.laddr, Addr: c.fd.raddr, Err: err}    }    return n, err}

1, Conn. Behavior characteristics of Read

1.1. No data in socket
After the connection is established, if the other party does not send the data to the socket, the receiver (Server) blocks on the read operation, which is consistent with the "model" principle mentioned earlier. The goroutine that performs the read operation will also be suspended. Runtime monitors the socket until it has data to re-
Dispatch the socket corresponding to the Goroutine complete read. For space reasons, there is no code, examples of the corresponding code files: Go-tcpsock/read_write under the Client1.go and Server1.go.

1.2, the socket has some data
If there is part of the data in the socket, and the length is less than the length of the data expected to be read by the read operation, then read will successfully read this part of the data and return it instead of waiting for all expected data to be read and returned.

1.3. There is enough data in the socket
If there is data in the socket, and the length is greater than or equal to the length of the data that is expected to be read by the read operation, then read will succeed in reading this part of the data and returning. This scenario is most consistent with what we expect from read: Read will fill our incoming slice with the data in the socket to return: n = ten, err = Nil

1.4. Socket closed
If the client side actively shuts down the socket, what will the server read?
This is divided into "with data off" and "no data off".

data shutdown means that there is no data read by the server side of the socket when the client shuts down. When the client side close socket exits, the server still does not start read,10s after the first read successfully reads all the data, when the second read, because the client side socket is closed, read returns EOF error

The result of no data shutdown scenario is that read returns EOF error directly

1.5. Read Operation timeout
There are occasions when the blocking time of read is strictly limited, in this case, what is the behavior of read? When returning a time-out error, did you read some of the data at the same time?
There is no "read-out partial data and return time-out error" condition

2, Conn. The behavioral characteristics of write

2.1. Successful writing
The previous example focuses on the read,client end does not determine the return value of write when it is write. The so-called "successful write" means that the write call returns n equal to the length of data expected to be written, and error = nil. This is the most common scenario we encountered when calling write, which is no longer an example.

2.2, write blocking
The OS on both sides of the TCP connection retains a data buffer for the connection, and after the write is called by one end, the data is actually buffered to the protocol stack written to the OS. TCP is a full-duplex communication, so there is a separate data buffer for each direction. Write blocks when the sender fills the recipient's receive buffer with its own send buffer

2.3. Write some data
A write operation exists where some data is written. All data is not written as expected. This is where the cyclic writing is

For example, although go provides us with the convenience of blocking I/O, we still need to synthesize the results of the N and err required by the method when calling read and write to make the correct processing. The Net.conn implements the Io.reader and Io.writer interfaces, so you can try some wrapper packages for socket read and write, such as writer and reader under Bufio package, functions under Io/ioutil, etc.

Five, Goroutine safe

based on the Goroutine network architecture model, there is a case of sharing conn between different goroutine, so Conn Read and write is Goroutine safe? Before we dive into this problem, we first look at the Goroutine-safe necessity of read operations and write operations in the sense of application .

For a read operation, because TCP is a byte stream oriented, Conn. Read does not correctly differentiate the business boundaries of the data, so it is not meaningful for multiple goroutine to read the same conn, and Goroutine reading the incomplete business package adds to the difficulty of business processing. For the write operation, there are multiple goroutine concurrent writes.

each write operation is protected by lock until the data is all write-out . So at the application level, to ensure that more than one goroutine on a conn write operation of Safe, need to write a write complete a "business package", once the write of the business package is split into multiple write, there is no guarantee that a goroutine a "business package" The continuity of data sent in Conn.

It is also possible to see that even the read operation is lock protected . Multiple Goroutine concurrent reads to the same conn do not appear to overlap the read content, but the content breakpoints are randomly determined according to runtime scheduling. There is a business package data, 1/3 content is read goroutine-1, the other 2/3 by another goroutine-2 read the situation. For example, a full package: world, when Goroutine's read slice size < 5 o'clock, there is a possibility: one goroutine reads "Worl" and the other goroutine reads "D".

Six, Socket properties
The native socket API provides a rich set of sockopt interfaces, but Golang has its own network architecture model, and the socket options interface provided by Golang is also a necessary property setting based on the model described above. Including
Setkeepalive
Setkeepaliveperiod
Setlinger
Setnodelay (default no delay)
Setwritebuffer
Setreadbuffer

But the method above is Tcpconn, not conn, to use the method above, you need type assertion:

tcpConn, ok := conn.(*TCPConn)if !ok {    //error handle}tcpConn.SetNoDelay(true)

For listener sockets, Golang uses SO_REUSEADDR by default, so that when you restart the listener program, it does not start failing because of an address in use error. The default value of the listen backlog is worthwhile by getting the system's settings. Different systems: Mac, Linux 512, etc.

Seven, close the connection
Closing the connection is the simplest operation compared to the previous method. Because the socket is full-duplex, the client and server side have different results on their own closed socket and the other side of the socket closed. Look at the following example:

Go-tcpsock/conn_close/client1.go ... func main () {log. PRINTLN ("Begin dial ...") conn, err: = Net. Dial ("TCP", ": 8888") if err! = Nil {log. PRINTLN ("Dial error:", err) return} conn. Close () log. Println ("close OK") var buf = make ([]byte, +) n, err: = conn. Read (BUF) if err! = Nil {log. PRINTLN ("Read error:", err)} else {log. Printf ("read% bytes, content is%s\n", N, String (Buf[:n]))} N, err = conn. Write (BUF) if err! = Nil {log. Println ("Write Error:", err)} else {log. Printf ("write% bytes, content is%s\n", N, String (Buf[:n]))} time. Sleep (time. Second * +}//go-tcpsock/conn_close/server1.go ... func handleconn (c net). Conn) {defer c.close ()//read from the connection var buf = make ([]byte, ten) log. Println ("Start to read from Conn") n, Err: = C.read (BUF) if err! = Nil {log. PRINTLN ("conn Read error:", err)} else {log. Printf ("read%d bytes, content is%s\n ", N, String (Buf[:n]))} N, err = C.write (BUF) if err! = Nil {log. PRINTLN ("Conn Write error:", err)} else {log. Printf ("write%d bytes, content is%s\n", N, String (Buf[:n]))}} ...

Execution results are as follows

$go run server1.go2015/11/17 17:00:51 accept a new connection2015/11/17 17:00:51 start to read from conn2015/11/17 17:00:51 conn read error: EOF2015/11/17 17:00:51 write 10 bytes, content is$go run client1.go2015/11/17 17:00:51 begin dial...2015/11/17 17:00:51 close ok2015/11/17 17:00:51 read error: read tcp 127.0.0.1:64195->127.0.0.1:8888: use of closed network connection2015/11/17 17:00:51 write error: write tcp 127.0.0.1:64195->127.0.0.1:8888: use of closed network connection

From the client's results, the "use of closed network connection" error will be obtained after the read and write operations are performed on the socket that has been closed.

From the server's execution results, the read operation on the closed socket will get EOF error, but the write operation will succeed because the data will be successfully written to its own kernel socket buffer, even if the other socket buffer is eventually not sent. Because your socket is not closed. Therefore, when the other side of the socket is closed, you should properly handle their own socket, and then continue to write no meaning.

viii. Summary  
This article is relatively basic, but very important, after all, Golang is oriented to large-scale service backend, in-depth understanding of the details of the communication links will be beneficial. In addition, the network communication model of goroutine+ blocking communication in go reduces the developer's mental burden and simplifies the complexity of communication, which is particularly important

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.