Byte order in Network Communication

Source: Internet
Author: User
Tags comparison table intel pentium
Communication between programs is actually sending and receiving data streams. We generally regard bytes as the smallest unit of data. Of course, a byte also contains 8 bits ). In a 32-bit processor, the "font length" is 32 bits, that is, 4 bytes. In such a CPU, the memory is always read or written in 4-byte alignment. So in what order are the data of these 4 bytes stored in the memory? This is the issue of byte order.

 

I. byte order

As the name implies, the byte order is more than the storage order of a byte type data in the memory (of course, there is no need to talk about the order of a byte data ).

The following table lists common data types and their lengths:

Length of the unmanaged C-language. Net-Managed class name in wtypes. h
Handle void * system. intptr 32 bit
Byte unsigned char system. byte 8-bit (each byte = 4 to the power of 2*4 to the power of 2, that is, 0x6a .)
Short short system. int16 bit (4 byte .)
Word unsigned short system. uint16 16-bit
Int INT system. int32 32-bit
Uint unsigned INT system. uint32 32-bit
Long long system. int32 32-bit
Bool long system. int32 32-bit
DWORD unsigned long system. uint32 32-bit
Ulong unsigned long system. uint32 32-bit
Char char system. byte 8-bit.
Float float system. Single 32-bit
Double double system. Double 64-bit (8 bytes)

 

Note:

If it is a Linux-based program, when the data types of C/C ++ are long, long double, and pointer, the length of 32-bit and 64-bit is different. For example, long corresponds to 4 bytes and 8 bytes respectively under x64 and amd64.
If it is a Windows program, the C ++ data type is only pointer in 32-bit and 64-bit length is different.

 

Ii. Sort byte

There are three types: Big-Endian, little-Endian, and Middle-Endian. The common types are big-Endian and little-Endian. Chinese are high-byte and Low-byte (or large-byte and small-byte ). The definitions of reference standard Big-Endian and little-Endian are as follows:
A) Little-Endian is the low-byte emission at the low-address end of the memory, and the high-byte emission at the High-address end of the memory.
B) Big-Endian refers to the low address of the memory where the high byte is discharged, and the low byte is discharged to the high address of the memory.

The concepts of host and network are also introduced. The network's byte order must be big-Endian's high byte order, while the host's byte order is basically little-Endian's low byte order, but the host's byte order is related to the processor's CPU type, CPU in the CISC architecture, such as x86 (including most intel, AMD, and other PC processors), is in low byte order, while CPU in the widely used embedded CPU architecture such as arm, PowerPC, alpha, iSCSI V9, MIPS, etc. are in high byte order.

 

Iii. Description of byte order

Take a hexadecimal variable defined in C as an example: Unsigned int value = 0x6a7b8c9d. According to the above type comparison table, we know that the value variable of the unsigned int in C is 32bit, that is, 4 bytes. The value here is also equivalent:

Unsigned char Buf [4] = {0x6a, 0x7b, 0x8c, 0x9d}: // The syntax is incorrect, which is equivalent to such initialization.

The initial memory address of the Buf variable is:
A) Little-Endian: Low address storage low level:
(High address )--------------------------
A + 3 ----- Buf [3] (0x6a) -- high (equivalent to a 10th digit)
A + 2 ----- Buf [2] (0x7b)
A + 1 ----- Buf [1] (0x8c)
A -------- Buf [0] (0x9d) -- low (equivalent to a decimal bit)
(Low address )--------------------------

 

B) Big-Endian: high storage for low addresses:
(High address )--------------------------
A + 3 ----- Buf [3] (0x9d) -- high (equivalent to a 10th digit)
A + 2 ----- Buf [2] (0x8c)
A + 1 ----- Buf [1] (0x7b)
A -------- Buf [0] (0x6a) -- low (equivalent to a decimal bit)
(Low address )--------------------------

Generally, the high address of the processor corresponds to the bottom of the stack, and the low address corresponds to the top of the stack.

 

Iv. high/low byte order conversion

If you need network communication between different operating systems or based on different languages, such as C/C ++ and C, you need to consider how to convert the byte order.
If the conversion is not possible, here is an example of communication between the C client and the server.

Client (low-byte host, such as Intel Pentium series processor) Definition: Short x = 1 is 2 bytes, in the memory, [1] [0] (the low address is at the front) is sent to the server. This server (high-byte host, such as an Embedded ARM processor) receives [1] [0] (low address in front ), in this case, x = 256 is parsed based on the high-Byte "low-address high-order" principle. Does not match the actual situation.

In this case, you need to consider the byte sequence conversion when writing cross-platform or cross-language programs. Before sending data, you need to convert the host's byte order little-Endian to the network's byte order big-Endian. before receiving the data, you need to convert big-Endian to little-Endian. net and C/C ++ common methods.

4.1. In. Net:

Host byte to network byte: Short/INT/long IPaddress. hosttonetworkorder (short/INT/long) network byte order to host byte order: Short/INT/long IPaddress. networktohostorder (short/INT/Long)

4.2. C/C ++ has the following methods based on different types:

Ntohs = net to host short int 16-bit
Htons = host to net short int 16-bit
Ntohl = net to host long int 32-bit
Htonl = host to net long int 32-bit

Briefly describe one of the methods.

Converts an unsigned short integer from the network byte sequence to the host byte sequence.
 # Include <Winsock. h>
 U_short Pascal far ntohs (u_short netshort );
 Netshort: A 16-digit number expressed in bytes of the network.
Note:
  This function converts a 16-digit string from the network byte sequence to the host byte sequence.
Return Value:
   Ntohs () returns the number of bytes expressed in the host sequence.

 

5. Notes and problem sets for the actual conversion process.

5.1: What type is the same as the unsigned char type in C ++?
A: The char in C # Is a Unicode Character of 16bits, while the character in C ++ is generally 8 characters, therefore, the "unsigned char" in C ++ can be either converted to char in C # or replaced by byte. The former is suitable for storing character-type unsigned char, the latter applies to integer-type unsigned char. Specific procedures and methods. For example, if the declarative variable in C ++ is unsigned char para = 0x4a, it indicates that the hexadecimal value is 4 to the power of 2x4 to the power of 2, that is, 8 digits. The value range of uchar is 0-0xff, that is, 0-255); unsigned char para [4] = 0x6789abcd, which indicates 32 bits.

5.2: why should we consider the issue of byte order in network programming. For data types such as double, float, and string, the conversion between the host sequence and the network sequence is not required?
A: float and double are irrelevant to the CPU. Generally, the compiler interprets float/double as an array of 4/8 characters according to the IEEE Standard. Therefore, as long as the compiler supports the IEEE floating point standard, you do not need to consider the byte sequence.

5.3: binarywriter and binaryreader
Binaryreader and binarywriter read and write data in small byte order (that is, low byte order.
For example:
VaR stream = new memorystream (New byte [] {4, 1, 0, 0}); // equivalent to applying for byte [4], and each byte is equivalent to 256 hexadecimal
VaR reader = new binaryreader (Stream );
Int I = reader. readint32 (); // I = 260
// Because binaryreader reads data in the low byte order, all I = 4 + 256 × 1 = 260;

5.4, bitconverter and asciiencoding. ASCII. getbytes
Bitconverter is mainly used to convert byte [] In. NET and other types, and does not involve the string data type. Network Communication is often used.
Asciiencoding. ASCII is usually used for conversion between strings and byte.

 

5. X: we can regard the 4-byte data as a 32-bit integer, 2 Unicode, or 4 ASCII characters ." Extended:
The biggest difference between Unicode and UTF-8 is storage. Unicode is the storage of wide characters (all characters are stored in 2 or 4 bytes), while UTF-8 is a multi-character storage.

Storage, the number of characters is uncertain (for example, English characters are expressed in 1 byte, and Chinese characters can be expressed in 2 to 6 ), the first few digits of the character indicate its bytes.

Number. For example, a 3 byte Chinese Character uft-8 encoding (Binary) is as follows:
1110 XXXX 10 xxxxxx 10 xxxxxx
3 In the first byte indicates that the Chinese character is expressed in 3 bytes.

The UTF-16 always uses two bytes to represent a character, and the Unicode encoding is usually the UTF-16.

The ASCII characters are the same in UTF-8. one byte is used to indicate that if the ASCII character range is exceeded, it is expressed in multiple bytes. The number of bytes is determined by the first byte, up to 6

Bytes. As follows:
UTF-8:
1 byte: 0 xxxxxxx (ASCII)
2 bytes: 110 XXXXX 10 xxxxxx
3 bytes: 1110 XXXX 10 xxxxxx 10 xxxxxx
4 Bytes: 11110xxx 10 xxxxxx 10 xxxxxx 10 xxxxxx
5 bytes :...
UTF-16: All: XXXXXXXX
ASCII: XXXXXXXX 00000000

 

The ASCII code is expressed in hexadecimal notation, that is, it is two bytes byte. For example, if the ASCII code is 31 (0x31), the corresponding character "1"; if the ASCII code is 32 (0x31 ),

The character "2 ".

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.