In various computer architectures, the storage mechanisms for bytes and words are different, which leads to a very important issue in Computer Communication domains, that is, the order in which information units (bit, byte, word, double-word, etc.) of communication between the communication parties should be transmitted. If no agreement is reached, the communication fails due to the fact that the communication fails. At present, there are two main types of byte storage mechanisms used in computers of various systems: Big-Endian and little-Endian. Let's start with the byte sequence.
1. What is byte order?
Byte order, as the name implies, the byte order. The two more statements are the storage sequence of Data larger than one byte type in the memory. (Of course, there is no need to talk about the sequence of data in one byte). In fact, most people rarely deal with the byte sequence directly in actual development. OnlyIn cross-platform and network programs, the byte sequence is a concern.
In all the articles about the byte sequence, we will mention that the word collation can be divided into two categories: Big-Endian and little-Endian. The definitions of the standard Big-Endian and little-Endian are as follows:
A) Little-Endian isLow byte emissions are at the low address end of the memory, and high byte emissions are at the high address end of the memory..
B) Big-Endian isHigh byte emissions are at the low address end of the memory, and low byte emissions are at the high address end of the memory..
C) network byte sequence:TCP/IP Protocols define the byte sequence as big-Endian. Therefore, the byte sequence used in TCP/IP is usually called the network byte sequence.
1.1 What is high/low address?
First, we need to know the memory space layout in our C program image: the memory space layout is described in C expert programming or Unix environment advanced programming, for example:
----------------------- Maximum memory address 0 xffffffff
Stack bottom
Stack
Stack top
-----------------------
NULL (empty)
-----------------------
Heap
-----------------------
Uninitialized data
----------------------- Collectively referred to as the data segment
Initialized data
-----------------------
Text section (code segment)
----------------------- Minimum Memory Address 0x00000000
For example, if we allocate an unsigned char Buf [4] on the stack, how does this array variable layout on the stack? See:
Stack bottom (high address)
----------
Buf [3]
Buf [2]
Buf [1]
Buf [0]
----------
Stack top (low address)
1.2 What is high/low byte
Now weFind out the high/low address, and then consider the high/low byte. In some articles, the low byte is the lowest valid bit, and the high byte is the highest valid bit. If we have a 32-bit unsigned integer 0x12345678, what is the high position and what is the low position?
It is actually very simple.In decimal, we all say that the value on the left is high, and the value on the right is low. This is also true in other hexadecimal formats.. Take 0x12345678 as an example,Bytes from high to low are 0x12, 0x34, 0x56, and 0x78 in sequence..
The high/low address end and the high/low byte are all clarified. Let's review the definitions of big-Endian and little-Endian, and illustrate the two types of byte order with the illustration:
Taking unsigned int value = 0x12345678 as an example, we can use unsigned char Buf [4] to show the storage conditions of the two types of bytes respectively:
Big-Endian: high storage for low addressesSuch:
Stack bottom (high address)
---------------
Buf [3] (0x78) -- low
Buf [2] (0x56)
Buf [1] (0x34)
Buf [0] (0x12) -- high
---------------
Stack top (low address)
Little-Endian: Low address storage lowSuch:
Stack bottom (high address)
---------------
Buf [3] (0x12) -- high
Buf [2] (0x34)
Buf [1] (0x56)
Buf [0] (0x78) -- low
--------------
Stack top (low address)
Ii. Various endian
2.1 big-Endian
Computer ArchitectureTerms used to describe the order of Multi-byte StorageIn this mechanismThe most important byte (MSB) is stored at the lowest end address.. The ibm3700 series, PDP-10, mortolora microprocessor series and the vast majority of them are processors using this mechanism.
+ ---------- +
| 0x34 | <-- 0x00000021
+ ---------- +
| 0x12 | <-- 0x00000020
+ ---------- +
Figure 1: the double byte 0x1234 exists in the starting address 0x00000020 in the big-Endian Mode
In big-Endian, for bit sequences
The sequence number orchestration method in is as follows:(In double-byte 0x8b8aFor example ):
Bit 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15
+ ----------------------------------------- +
Val | 1 0 0 0 1 0 1 1 | 1 0 0 0 1 0 |
+ ---------------------------------------- +
Figure 2:Big-EndianBit sequence encoding method
2.2 little-Endian
A term used to describe the order of Multi-byte storage in the computer architecture.Least important bytes (LSB) are stored on the lowest-end address.. The processor with this mechanism has PDP-11, VAX,Intel Series MicroprocessorAnd some network communication devices. In addition to the multi-byte storage sequence, this term is often used to describe the emission sequence of each bit in a single byte.
+ ---------- +
| 0x12 | <-- 0x00000021
+ ---------- +
| 0x34| <--Zero X 00000020
+ ---------- +
Figure 3: double-byte 0x1234Little-EndianThe starting address 0x00000020
In little-Endian,The sequence number arrangement in the bit sequence is the opposite to that in the big-Endian sequence.The method is as follows (Take the dual-byte 0x8b8a as an Example):
Bit
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
+ ----------------------------------------- +
Val | 1 0 0 0 1 0 1 1 | 1 0 0 0 1 0 |
+ ----------------------------------------- +
Figure 4: bit sequence encoding method of little-Endian
NOTE 2:Generally, the host order follows the little-Endian rule.. So when twoWhen a host needs to communicate through the TCP/IP protocol, it needs to call the corresponding function to convert the host Order (little-Endian) and network Order (big-Endian )..
NOTE 3: this is becauseThe two mechanisms have the opposite sequence number Arrangement Method for the same bit sequence.Therefore, the translation of MSB in the modern English-Chinese dictionary is poor, so this article defines it as "the most important bit/Byte ".
2.3 middle-Endian
In addition to big-Endian and little-Endian, the multi-byte storage sequence is middle-Endian.Take 4 bytes as an example:Middle-Endian is stored in the order of 3-4-1-2 or 2-1-4-3. This storage sequence occasionally appears in the compression format of decimal numbers in some minicomputers..
Embedded system developers should be familiar with the little-Endian and big-Endian modes. Use
In little-Endian mode, the number of CPU operations is stored from low bytes to high bytes., AndThe big-Endian mode stores the operands from high bytes to low bytes.. 32-bit-width 0x12345678 storage method in the little-Endian mode CPU memory (assuming it starts from address 0x4000:
Memory Address |
Zero X 4000 |
Zero X 4001 |
Zero X 4002 |
Zero X 4003 |
Store content |
0x78 |
0x56 |
0x34 |
0x12 |
In big-Endian mode, the CPU memory is stored as follows:
Memory Address |
Zero X 4000 |
Zero X 4001 |
Zero X 4002 |
Zero X 4003 |
Store content |
0x12 |
0x34 |
0x56 |
0x78 |
Iii. Advantages and Disadvantages of big-Endian and little-Endian
Big-Endian advantages:By first extracting the high byte, you can always check whether the number is positive or negative in the byte with the offset of 0.You don't have to know how long the value is, or you don't have to take some bytes to see if the value contains a symbol..These values are stored in the order they are printed., SoFunctions from binary to decimal are particularly effective.. Therefore, different access methods are designed for machines with different requirements.
Little-Endian advantages: extract one, two, four or more bytes of data assembly instructions in the same way as all other formats: FirstWhen the offset address is 0, the byte of the second digit is extracted. Because the address offset and number of cell lines are one-to-one, the mathematical functions with multiple precision are relatively easy to write..
If you increase the value of a number, you may add a number on the left (more numbers are required for a high non-exponential function ). Therefore, it is often necessary to add two digits and move all the numbers in the big-Endian sequence in the memory to the right, which will increase the workload of the computer. However, the unimportant bytes in the memory using little-Endian can exist in its original location, and the new number can exist in the high address on its right. This means that some computing in the computer can become simpler and faster.
4. How do I check whether the processor is big-Endian or little-Endian?
BecauseThe order in which union members are stored is that all members are stored from low addresses.With this feature, you canIt is easy to obtain whether the CPU reads and writes data in the little-Endian or big-Endian mode to the memory.. For example:
Int checkcpuendian (){
Union {
Unsigned int;
Unsigned char B;
} C;
C. A = 1;
Return (C. B = 1 );
}/* Return 1: Little-Endian, return 0: Big-Endian */
5. Switch between big-Endian and little-Endian
IntelX86 uses little-EndianFor example, Sun uses big-Endian. How can we convert the byte order in a cross-platform or network program? This can be easily implemented through the C language shift operation, such as the following macro:
# If defined (big_endian )&&! Defined (little_endian)
# Define htons ()
# Define htonl ()
# Define ntohs ()
# Define ntohl ()
# Elif defined (little_endian )&&! Defined (big_endian)
# Define htons (A) (uint16) (a) & 0xff00)> 8) | \
(Uint16) (a) & 0x00ff) <8 ))
# Define htonl (A) (uint32) (a) & 0xff000000)> 24) | \
(Uint32) (a) & 0x00ff0000)> 8) | \
(Uint32) (a) & 0x0000ff00) <8) | \
(Uint32) (a) & 0x000000ff) <24 ))
# Define ntohs htons
# Define ntohl htohl
# Else
# Error "either big_endian or little_endian must be # defined, but not both ."
Network byte sequence
1,BITs in bytes are not affected by this order.
For example, a byte of 1000 0000 (or expressed as hexadecimal 80 h) is represented in whatever order in its memory.
2,Only data types larger than 1 byte have the byte sequence problem.
For example, for byte A, this variable has only one byte length, so there is no byte order problem according to the previous one. Therefore, byte order refers to the relative order between bytes.
3,There are two types of data with a size greater than 1 byte:
For example, short B, which is a two-byte data type, then there is a problem with the relative sequence between bytes.
The byte sequence of the network is "What you see is what you get ".. The bytes of Intel-type CPU are in the opposite order.
For example, the preceding short B = 0102 H (in hexadecimal notation, each two represents the width of a byte ). What we see is "0102". According to general mathematical knowledge, the number axis increases from left to right, that is, if the memory address increases from left to right, in the memory, the short B's byte order is:
01 02
This is the network byte sequence. The order we see is the same as the order in the memory!
Assume that the two byte streams of network data obtained through packet capture are: 01 02
The byte order is different. The order in the memory is 02 01.
If this represents two bytes, you do not need to consider the byte sequence. If this represents a short variable, you need to consider the byte sequence. According to the "WYSIWYG" rule in the byte sequence of the network, the value of this variable is 0102.
Assume that the local host is of the Intel type, it is a little troublesome to indicate this variable:
Defines the variable short X, the byte stream address is: PT, And the sequential READ memory is X = * (short *) pt );
Then X'sMemory SequenceOf course yes01 02 rules based on "WYSIWYG"The memory sequence is obviously incorrect, so we need to change the locations of these two bytes. The change method can be defined by yourself, but it is more convenient to use existing APIs.
Network byte sequence and host byte sequence
Nbo and HBO network byte order (Network byte order): stores data in a sequence from high to low, and uses a unified network byte order on the network to avoid compatibility issues. Host byte order (HBO, host byte order): different machines HBO are different. There are two types of computer data storage priorities related to CPU design: high byte priority and low byte priority..Data on the internet is transmitted over the network in high byte precedence. Therefore, for machines that store data in low byte precedence mode, data transmission over the Internet requires conversion.
Htonl ()
Brief description:
Converts the unsigned Length Integer of the host to the network byte sequence.
# Include <Winsock. h>
U_long Pascal far htonl (u_long hostlong );
Hostlong: 32-digit host bytes.
Note:
This function converts a 32-bit string from the host byte sequence to the network byte sequence.
Return Value:
Htonl () returns the value of a network byte sequence.
Inet_ntoa ()
Brief description:
Converts a network address to a string format separated.
# Include <Winsock. h>
Char far * Pascal far inet_ntoa (struct in_addr in );
In: An Internet host address structure.
Note:
This function converts an Internet address structure represented by the In parameter to a string separated by ".", such as "A. B. C. D. Note that the strings returned by inet_ntoa () are stored in the memory allocated by the Windows interface. The application should not assume how the memory is allocated. Before calling the next Windows interface of the same thread, the data will be valid.
Return Value:
If no error occurs, inet_ntoa () returns a character pointer. Otherwise, null is returned. The data should be copied before the next Windows interface call.
Some of the data transmitted over the network is in the same order as the local bytes, while others are completely different. To ensure data consistency, you must convert the local data to the format used on the network, then, the data is sent and received in the same way. After conversion, the data is used. The basic library function provides such a function for byte conversion, for example, with htons () htonl () ntohs () ntohl (), here n represents network, H represents host, htons () htonl () is used to convert local bytes to network bytes, s indicates short, that is, 2-byte operation, and l indicates long, that is, 4-byte operation. Same ntohs () ntohl (
) Is used to convert network bytes to the local format.