Network Programming Guide--Network Socket Interface (Internet Sockets)

Source: Internet
Author: User
Tags ack rfc socket blocking telnet program htons
introducehey! Does Socket programming make you frustrated. Is it hard to get useful information from man pages? You want to keep up with the times to do an Internet program, but frown on the structure of BIND () before you call Connect (). ...

Well, I'm here now, and I'm going to share my knowledge with everyone. If you understand the C language and want to go through the Web programming swamp, then you come to the right place.

The Reader 's document is written as a guide, not as a reference book. If you have just started to socket programming and want to find a primer, then you are my reader. This is not a complete socket programming book.

platform and compiler most of the code in this article was successfully compiled with GNU gcc on a Linux PC. And they were successfully compiled with GCC on a single HPUX. Note, however, that not every piece of code has been tested independently.

directory: What is a socket interface. Two types of network theory of Internet socket interfaces struct--either know them or wait for the alien to invade the Earth Convert the natives! IP addresses and how to handle them socket ()--Get the file descriptor. Bind ()--which port we are on. Connect ()--hello. Listen ()-Did someone call me? Accept ()--"Thank for calling Port 3490." Send () and recv ()--talk to me, baby! SendTo () and Recvfrom ()--talk to me, Dgram-style close () and shutdown ()-Go away. Getpeername ()-who are you? GetHostName ()--who I am. dns--you say "White House", I Say "198.137.240.100" client-server background simple server simple client datagram Socket blocking select ()-Multiple sync I/O, cool. Reference disclaimer and call for help

what is a socket. you always hear people talking about "sockets," and you don't know exactly what he means. So, now I'm telling you: he's using Unix file descriptors (Fiel descriptor) and other programs to communicate.

What the.

ok--you might hear some Unix gurus (hacker) say, "Yes, everything in Unix is a file." "That guy may be talking about the fact that a Unix program is reading or writing a file descriptor when executing any form of I/O." A file descriptor is just an integer associated with an open file. But (note the latter), this file may be a network connection, FIFO, pipe, terminal, file on disk or something else. All the things in Unix are files. Therefore, when you want to communicate with other programs on the Internet, you will be using a file descriptor. You'd better believe what you just said.

Now you may have this idea in your head: "Where do I get the file descriptor for network communications, smart people?" "Anyway, I'm going to answer this question: You use the system to invoke the socket (). He returns the set interface descriptor (socket descriptor), and then you call Send () and recv () again through him.

"But ...", you might call it now, "if he's a file descriptor, then why not use the generic call read () and write () to communicate through the socket interface." "The simple answer is:" You can use a normal function. ”。 The detailed answer is: "You can, but using Send () and recv () allows you to better control the data transfer." ”

There is a fact that there are many kinds of interfaces in our world. There are DARPA Internet addresses (Internet sockets), path names for local nodes (Unix socket interfaces), CCITT X.25 addresses (you can completely ignore X.25 sockets). Maybe there's something else on your Unix machine. We only talk about the first type here: Internet sockets.

two types of Internet socket interfaces What do you mean. There are two kinds of Internet socket interfaces. Yes. No, I'm lying. There's a lot more, but I don't want to scare you. We only talk about two kinds here. Except for this sentence, where I am going to the "Raw Sockets" are also very powerful and you should look them u P.

All right, all right. What are the two types of these? One is "Stream Sockets" and the other is "Datagram Sockets". We will also use "Sock_stream" and "Sock_dgram" when we talk about them in the future. The datagram socket interface is sometimes called "connectionless Socket" (Connect () If you do have a connection. )

The flow sleeve interface is a reliable two-way communication data stream. If you output "1,2" to the set interface, then they will order "1,2" to the other side. They are also transmitted without errors, with their own error control.

Who is using the streaming socket interface. You've probably heard of Telnet, haven't you? He uses a streaming socket interface. You need the characters you have entered in order to arrive, aren't you. Similarly, the HTTP protocol used by WWW browsers also uses them. In fact, when you telnet to a WWW site via port, and then enter "Get PageName", you can also obtain HTML content.

Why the streaming socket interface can achieve high quality data transfer. He used "Transmission Control Protocol (the Transmission Control Protocol)", also known as "TCP" (please refer to RFC-793 for more information.) TCP controls your data to arrive in order and without errors. You may have heard "TCP" because you heard "TCP/IP." IP here means "Internet protocol" (refer to RFC-791.) IP only deals with Internet routing.

So what about the datagram socket interface. Why is he called no connection? Why is he unreliable? Well, there is the fact that if you send a datagram, he may arrive, he may be in reverse order. If he arrives, then there is no error in the interior of the package.

Datagrams also use IP for routing, but he does not select TCP. He uses "User Datagram Protocol (username Datagram Protocol)", also known as "UDP" (refer to RFC-768.)

Why are they not connected? The main reason is because he does not maintain a connection like a streaming sleeve interface. You just create a package, construct an IP header in the target information, and send it out. No connection is required. Applications are: TFTP, BOOTP, and so on.

"That's enough." "You might think," if the data is missing, how the program works. "My friend, each program has its own protocol on UDP." For example, each time a packet is sent by the TFTP protocol, the receiving person sends back a package "I received it." "(a command correct answer is also called" ACK "package). If the sender does not receive an answer for a certain amount of time (for example, 5 seconds), he will resend it until an ACK is received. This is important when implementing a SOCK_DGRAM application.

network theory Now that I've mentioned the protocol layer, it's time to discuss how the network works and demonstrates Sock_dgram. Of course, you can also skip this paragraph if you think you already know it.

Friends, it's time to learn about Data Encapsulation (encapsulation) . This is very, very important. It's so important it's you might just learn about it if you take the networks course this here in Chico State;--). The main content is: A package, the first protocol (here is TFTP) is packaged ("encapsulated"), and then the entire data (including the TFTP header) is encapsulated by another protocol (here is UDP), and then the next (IP) is repeated until the hardware (physical) layer (Ethernet).

When another machine received the packet, the hardware first stripped Ethernet head, the core stripped IP and UDP head, TFTP program and then stripped TFTP head, the final data.

Now we are finally talking about the notorious network layering model (layered network models). This network model has many advantages over other models in describing network systems. For example, you can write a set of interface programs without having to care about the physical transmission of the data (serial port, Ethernet, Connection Unit Interface (AUI) or other media. Because the underlying program handles them for you. The actual network hardware and topology are transparent to programmers.

No more nonsense, I now list the entire hierarchy model. If you want to take an online exam, be sure to keep in mind: Application layer (application) Presentation Layer (presentation) session layer (sessions) transport Layer (transport) network layer (network) Data link layer (database link) Physical layer (physical)

The physical layer is hardware (serial port, Ethernet, etc.). The application layer is the furthest away from the hardware layer-he is where users and networks interact.

This model is so generic that if you want, you can take him as a guide to fixing cars. Apply it to Unix, as a result: the application layer (application Layer) (telnet, FTP, etc.) Transport layer (Host-to-host transport Layer) (TCP, UDP) Internet layer (intern ET Layer) (IP and routing) Network Access layer (Network access Layer) (Network layer, data link layer and physical layer)

Now, you may see how these layers are coordinated to encapsulate the original data.

See how much work is done to build a simple packet. Alas, you will have to use "cat" to complete them. It's a joke. What you want to do with the streaming socket is send () sending the data. For the datagram socket you encapsulate the data in the way you choose and then use SendTo (). The kernel will create a transport layer and an Internet layer for you, and the hardware completes the network access layer. This is modern technology.

Now end our crash course in Network theory. Oh, I forgot to tell you about the routing thing. But I'm not going to talk about him. If you really want to know, then refer to the IP RFC. If you never knew him, it doesn't matter, you're still alive, aren't you?

structs finally arrived here, finally talked about programming. In this chapter, I'll talk about the various types of data that are used for the quilt interface. Because some of them are too important.

First is the simple one: socket descriptor. He is the following type:

    Int
is just a common int.

From now on, things are going to be amazing. Please come along with me to endure the distress. Note the fact that there are two byte-order sequences: The important byte is in front (sometimes called "octet"), or the unimportant byte is in front. The former is called "Network byte order (network byte)." Some machines store data in this order internally, while others do not. When I say that a data must be in Nbo order, you call a function (such as htons ()) to convert it from native byte order (Host byte). If I didn't mention Nbo, then let him be the native byte order.

My first structure (TM)--struct sockaddr. This data structure stores socket address information for many types of socket interfaces:

    struct SOCKADDR {
        unsigned short    sa_family;    /* Address family, af_xxx       * *
        char              sa_data[14];  /* bytes of protocol address *
    /};
Sa_family can be a variety of things, but in this article is "Af_inet". Sa_data stores Destination address and port information for the socket interface. It looks awkward, doesn't it?

To deal with struct sockaddr, programmers created a parallel structure: struct sockaddr_in ("in" is "Internet".)

    struct SOCKADDR_IN {short
        int          sin_family;  /* Address Family               *
        /unsigned short int sin_port;    /* Port Number                  * *
        struct in_addr     sin_addr;    /* Internet             address
        /unsigned char      sin_zero[8];/* Same size as struct sockaddr
    /};
This data structure makes it easy to handle the basic elements of a set of interface addresses. Note that Sin_zero (he is added to this structure, and the same length as struct sockaddr) should use the function bzero () or memset () to zero all.  Also, and this are the important bit, a pointer to a struct SOCKADDR_IN can being cast to a pointer to a struct SOCKADDR and Vice-versa. In this case, even if the socket () wants struct sockaddr *, you can still use struct sockaddr_in, and cast it at the last minute! Also, note that the sa_family in sin_family and struct sockaddr are consistent and can be set to "Af_inet". Finally, the Sin_port and sin_addr must be network byte order (network byte orders).

You might disagree: "But how do you get the entire data structure struct IN_ADDR sin_addr in network byte order?" to know the answer to this question, we need to take a closer look at this data structure: struct IN_ADDR, there is such a union (unions) :

    /* Internet Address (a structure for historical reasons) * *
    struct IN_ADDR {unsigned
        long s_addr;
    };
He used to be the worst union, but now those days are gone. If you declare "INA" to be an instance of the data structure struct sockaddr_in, then "Ina.sin_addr.s_addr" stores a 4-byte IP address (network byte order). If your unfortunate system uses the dreaded federated struct IN_ADDR, you can still rest assured that the 4-byte IP address is the same as what I said above (this is because #define. )

Convert the natives! We are now at the next chapter. We've talked a lot about network to native byte order, and now it's time to take action.

You can convert two types: short (two bytes) and long (four bytes). This function is also true for variable type unsigned. Suppose you want to convert short from native byte order to network byte order. "H" means "native", followed by "to", then "n" for "Network (Network)", and "s" for "short": H-to-n-s, or htons () ("host to Network short") 。

It's too easy ...

If not too silly, you must have thought of the combination of "n", "H", "s", and "L". But there is no stolh () ("Short to Long Host") function, but here are: htons ()-"host to Network Short" htonl ()-"host to Network Long" Ntohs ()--"network to host short" Ntohl ()--"Network to host Long"

Now, you may think you already know them. You might also think, "What if I change the order of char?" My 68000 machine already uses the network byte order, I do not need to call htonl () to convert the IP address. "You may be right, but when you transplant your program to another machine, your program will fail." Portability. This is the Unix world. Remember: When you put data on the network, make sure they are the network byte order.

Last but not least: why in the data structure struct sockaddr_in, sin_addr and Sin_port need to convert to network byte order, and sin_family do not need? The answer is: SIN_ADDR and Sin_port are encapsulated in the packet's IP and UDP layers, respectively. Therefore, they must be network byte order. But the sin_family domain is only used by the kernel (kernel) to determine what type of address to include in the data structure, so he should be native byte order. Also that sin_family is not sent to the network, they can be native byte order.

IP addresses and how to deal with them now we are lucky because we have a lot of functions to easily manipulate IP addresses. There is no need to manually compute them, and there is no need to use the << operator to manipulate long.

First, if you use struct sockaddr_in ina, you want to store the IP address "132.241.5.10" in it. The function you want to use is inet_addr (), which converts the IP address in the numbers-and-dots format to unsigned long. This work can be done in this way:

    INA.SIN_ADDR.S_ADDR = inet_addr ("132.241.5.10");
Note: The address returned by INET_ADDR () is already in the network byte order, and you do not need to call HTONL () again.

The code above is not very robust (robust) because there is no error checking. INET_ADDR () returns 1 when an error occurs. Remember the binary number? When the IP address is 255.255.255.255, it returns (unsigned)-1. This is a broadcast address. Remember to use error checking correctly.

Well, you can now convert the string form of the IP address to long. So you have a data structure struct IN_ADDR, how do you print in numbers-and-dots format? At this point, you may want to use the function Inet_ntoa () ("Ntoa" means "network to ASCII"):

    printf ("%s", Inet_ntoa (INA.SIN_ADDR));
He will print the IP address. Note that the parameter of the function Inet_ntoa () is struct in_addr, not long. Also note that he is returning a pointer to a character. The pointer inside the INET_NTOA stores the character array statically, so every time you call Inet_ntoa () he overwrites the previous content. For example:
    Char *a1, *a2;
    A1 = Inet_ntoa (ina1.sin_addr);  /* This is 198.92.129.1 * *
    a2 = Inet_ntoa (INA2.SIN_ADDR);
    /* is 132.241.5.10/printf ("Address 1:%s\n", a1);
    printf ("Address 2:%s\n", A2);
The results of the operation are:
    Address 1:132.241.5.10 address
    2:132.241.5.10
If you want to save the address, use strcpy () to save it in your own character array.

This is the content of this chapter. Later, we will learn to convert the "whitehouse.gov" form of the string to the correct IP address (see the following DNS chapter.) )

socket ()--Gets the file descriptor. I guess I'm not going to go further-I have to say socket () This system is called. Here is the detailed definition:

    #include <sys/types.h> 
    #include <sys/socket.h> 

    int socket (int domain, int type, int protocol);
But how do they use the parameters? First, domain should be set to "Af_inet", just as the above data structure struct sockaddr_in. The parameter type then tells the kernel whether it is a sock_stream type or a sock_dgram type. Finally, set the protocol to "0". (Note: There are many types of domain, type, I can not list, see the socket () man page. There is, of course, a "better" way to get protocol. Please look at the man page of Getprotobyname (). )

The socket () simply returns the socket descriptor that you might use later in the system call, or returns 1 at the wrong time. The error value is stored in the global variable errno. (Refer to the man page of Perror ().) )

bind ()-which port am I on? Once you get the socket, you may want to associate the socket with a certain port on the machine. (If you want to use listen () to listen for data on a certain port, this is a necessary step--mud often tells you to use the command "Telnet x.y.z 6969".) If you only want to use Connect (), then this step is not necessary. But anyway, please read on.

Here is the approximate system call bind ():

    #include <sys/types.h> 
    #include <sys/socket.h> 

    int bind (int sockfd, struct sockaddr *my_addr, int Addrlen);
SOCKFD is the file descriptor returned by calling the socket. MY_ADDR is a pointer to the data structure struct sockaddr, and he saves your address (that is, port and IP address) information. Addrlen is set to sizeof (struct sockaddr).

It's simple, isn't it? Take another look at the example:

    #include <string.h> 
    #include <sys/types.h> 
    #include <sys/socket.h> 

    #define MyPort 3490

    Main ()
    {
        int sockfd;
        struct sockaddr_in my_addr;

        SOCKFD = socket (af_inet, sock_stream, 0); /* Do some error checking! * *

        my_addr.sin_family = af_inet;     /* Host byte
        order */My_addr.sin_port = htons (MyPort);/* short, network byte/
        my_addr.sin_addr.s_addr = Inet_addr ("132.241.5.10");
        Bzero (& (My_addr.sin_zero), 8);    /* Zero

        the rest of the struct///* don t forget your error checking for bind ():/
        bind (SOCKFD, struct SOCKAD Dr *) &my_addr, sizeof (struct sockaddr));
        

There are also a few things to note here. My_addr.sin_port is the network byte order, and My_addr.sin_addr.s_addr is also. The other thing to notice is that the system is different and the header files are not the same, please refer to your own man page.

The last thing to say in the bind () topic is that some of the work can be done automatically when dealing with your own IP address and/or port.

        My_addr.sin_port = 0; /* Choose a unused port at random * *
        my_addr.sin_addr.s_addr = inaddr_any;  /* Use I IP address * *
By assigning 0 to My_addr.sin_port, you tell bind () to choose the appropriate port. Again, set the my_addr.sin_addr.s_addr to Inaddr_any, and you tell him to automatically fill in the IP address of the machine he's running.

If you have always been cautious, you may notice that I did not convert Inaddr_any to network byte order. This is because I know something inside: Inaddr_any is actually 0. Even if you change the order of bytes, 0 is still 0. But the perfectionist says safety first, then look at the following code:

        My_addr.sin_port = htons (0); /* Choose a unused port at random * *
        my_addr.sin_addr.s_addr = htonl (inaddr_any);  /* Use I IP address * *
You may not believe that the code above will be portable.

Bind () still returns-1 at the wrong time and sets the global variable errno.

The other thing you have to be careful about when you call bind () is not to use a port number less than 1024. All port numbers less than 1024 are reserved by the system. You can choose from 1024 to 65535 (if they are not used by another program).

The other thing you should be aware of is that sometimes you don't need to call him. If you use Connect () to communicate with a remote machine, you don't care about your local port number (as you use Telnet), you simply call Connect (), he checks whether the socket is bound, and if not, he binds himself to an unused local port.

Connect ()--hello. Now let's assume you're a telnet program. Your users command you (as in the movie TRON) to get the file descriptor for the socket. You follow the command to invoke the socket (). Next, your users tell you to connect to "132.241.5.10" via port 23 (standard Telnet port). What are you going to do about it?

Luckily, you're frantically reading connect ()--How to connect to the remote host chapter. You don't want your users to be disappointed.

The Connect () system call is like this:

    #include <sys/types.h> 
    #include <sys/socket.h> 

    int connect (int sockfd, struct sockaddr *serv_addr , int addrlen);
SOCKFD is the set interface file descriptor returned by the system call socket (). SERV_ADDR is the data structure struct sockaddr that holds the destination port and IP address. Addrlen is set to sizeof (struct sockaddr).

Let's take a look at an example:

    #include <string.h> 
    #include <sys/types.h> 
    #include <sys/socket.h> 

    #define Dest_ip   "132.241.5.10"
    #define Dest_port

    Main ()
    {
        int sockfd;
        struct sockaddr_in dest_addr;   /* would hold the destination addr * *

        SOCKFD = socket (af_inet, sock_stream, 0); * Do some error checking!/

        des t_addr.sin_family = af_inet;        /* Host byte
        order */Dest_addr.sin_port = htons (dest_port);//* short, network byte/
        dest_addr.sin_addr. S_ADDR = inet_addr (DEST_IP);
        Bzero (& (Dest_addr.sin_zero), 8);       /* Zero

        the rest of the struct/* don t forget to error check the Connect ()!/
        Connect (sockfd, struct Socka DDR *) &dest_addr, sizeof (struct sockaddr));
        


Again, you should check the return value of Connect ()-he returns 1 at the wrong time and sets the global variable errno.

At the same time, you may see that I did not call bind (). In addition, I do not have the local port number. I only care about me in connection. The kernel will select a suitable port number for me, and the place where we are connected will automatically get this information.

Listen ()--will Somebody please call me? Ok, time for a change of pace. What If you don ' t want to connect to a remote host. Say, just for the kicks, that's you want to wait for incoming connection

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.