Transferred from: http://blog.csdn.net/aklixiaoyao/article/details/7548860
In various computer architectures, the storage mechanism of byte, word and so on is different, which leads to a very important problem in the field of communication, that is, the information unit (bits, bytes, words, double words, etc.) communicated by both parties should be transmitted in what order. If no consistent rules are reached, both parties will not be able to properly encode/decode and cause communication failure. There are two main types of byte storage mechanisms commonly used in computers in various systems: Big-endian and Little-endian, which begin with the byte sequence.
First, what is the byte order
BYTE-order, as the name implies the order of bytes, and then say two more than a byte type of data in memory storage order (a byte of data of course there is no need to talk about the order of the problem). In fact, most people in the actual development of the very few direct and byte-order dealings. byte order is a problem that should be considered only in cross-platform and network programs.
In all the articles that introduce the byte order, the byte order is mentioned in two categories: Big-endian and Little-endian, and the reference standard Big-endian and Little-endian are defined as follows:
A) The Little-endian is the low-bit bytes emitted at the lower address of the memory, high-bit bytes emitted in the memory of the higher address .
b) The Big-endian is the high-bit byte emitted at the low address of the memory, and the low byte is discharged at the upper address of the memory .
c) Network byte order: TheTCP/IP layer protocol defines the byte order as Big-endian, so the byte order used in the TCP/IP protocol is often referred to as the network byte order.
1.1 What is the high/low address end
First we need to know the spatial layout of memory in our C program image: In c expert programming or advanced Programming for UNIX environments, there is a description of the layout of the memory space, roughly as follows:
-----------------------Maximum memory address 0xFFFFFFFF
Bottom of Stack
Stack
Top of Stack
-----------------------
NULL (void)
-----------------------
Heap
-----------------------
Uninitialized data
-----------------------collectively referred to as data segments
Initialized data
-----------------------
Body segment (Code snippet)
-----------------------Minimum memory address 0x00000000
For example, if we allocate a unsigned char buf[4] on the stack, how is this array variable laid out on the stack? See:
Bottom of stack (high address)
----------
BUF[3]
BUF[2]
BUF[1]
BUF[0]
----------
Top of stack (low address)
1.2 What is high/low byte
clear the high/low address, then consider the high/low byte . Some articles say that the lower byte is the least significant bit and the high byte is the most significant bit. If we have a 32-bit unsigned integer 0x12345678, then what is high and what is low? It's actually very simple. In decimal we say that the left side is high, the right side is low, and so is the other system . Take 0x12345678, the bytes from high to low are 0x12, 0x34, 0x56, and 0x78 in turn.
The high/low address end and high/low byte are all clear. Let's review the definitions of Big-endian and Little-endian and illustrate the two byte-sequences with illustrations:
Take unsigned int value = 0x12345678 as an example, and look at its storage in two byte sequences, we can use unsigned char buf[4] to represent value:
Big-endian: Low address holds high , such as:
Bottom of stack (high address)
---------------
BUF[3] (0x78)--Low
BUF[2] (0x56)
BUF[1] (0x34)
Buf[0] (0x12)--high
---------------
Top of stack (low address)
Little-endian: Low address storage , such as:
Bottom of stack (high address)
---------------
BUF[3] (0x12)--high
BUF[2] (0x34)
BUF[1] (0x56)
Buf[0] (0x78)--Low
--------------
Top of stack (low address)
Second, various endian
2.1 Big-endian
A term in computer architecture that describes the order of multi-byte storage in which the most important byte (MSB) is stored at the lowest-end address . The processors with this mechanism are the IBM3700 series, the PDP-10, the Mortolora microprocessor family and the vast majority of RISC processors.
+----------+
| 0x34 |<--0x00000021
+----------+
| 0x12 |<--0x00000020
+----------+
Figure 1: Double-byte number 0x1234 in Big-endian mode with start address 0x00000020
In Big-endian, the sequence number in the bit sequence is as follows (in the case of double-byte 0x8b8a ):
Bit 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+-----------------------------------------+
Val | 1 0 0 0 1 0 1 1 | 1 0 0 0 1 0 1 0 |
+----------------------------------------+
Figure 2: Bit sequence encodingfor Big-endian
2.2 Little-endian
A term in computer architecture that describes the order of multi-byte storage in which the least significant byte (LSB) is stored at the lowest-end address . The processors with this mechanism are PDP-11, VAX,Intel series microprocessors , and some network communication devices. This term is often used to describe the order of emissions for each bit in a byte, in addition to the multi-byte storage sequence.
+----------+
| 0x12 |<--0x00000021
+----------+
| 0x |<-- 0x00000020
+----------+
Figure 3: Double-byte number 0x1234 in Little-endian mode with start address 0x00000020
In Little-endian, the sequence and Big-endian of the bit sequence are exactly the opposite , as follows ( in the case of double-byte 0x8b8a ):
Bit 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
+-----------------------------------------+
Val | 1 0 0 0 1 0 1 1 | 1 0 0 0 1 0 1 0 |
+-----------------------------------------+
Figure 4:little-endian Bit Sequence encoding method
Note 2: Usually the host order we say is followed by the Little-endian rule . Therefore, when the two hosts to communicate through the TCP/IP protocol, it is necessary to call the corresponding function for host order (Little-endian) and network Order (Big-endian) conversion .
Note 3: It is precisely because these two mechanisms for the same bit sequence ordinal arrangement is the opposite, so the "Modern English-Chinese dictionary" in the translation of the MSB is "the most effective bit" defect, so this article is defined as "the most important bit/byte."
2.3 Middle-endian
In addition to Big-endian and Little-endian, multi-byte storage sequences are Middle-endian, for example, in 4 bytes: like Middle-endian is stored in the order of 3-4-1-2 or 2-1-4-3. This storage order occasionally occurs in the compressed format of decimal numbers in some minicomputer architectures .
Embedded system developers should be well aware of the Little-endian and Big-endian models. CPUs in Little-endian mode are stored in the operands from low to high byte , while the Big-endian mode stores the operands from high to low bytes.
The 32bit wide number of 0x12345678 in Little-endian mode CPU memory (assuming that it is stored from the address 0x4000) is as follows:
Memory address |
0x4000 |
0x4001 |
0x4002 |
0x4003 |
Store content |
0x78 |
0x56 |
0x34 |
0x12 |
In Big-endian mode, the CPU memory is stored in the following way:
Memory address |
0x4000 |
0x4001 |
0x4002 |
0x4003 |
Store content |
0x12 |
0x34 |
0x56 |
0x78 |
Iii. advantages and disadvantages of Big-endian and Little-endian
Big-endian Pros: by first extracting high-order bytes, you can always determine whether the number is positive or negative by looking at the byte at offset 0. you don't have to know how long this number is, or you don't have to go through some bytes to see if the value contains a sign bit . This value is stored in the order in which they are printed , so functions from binary to decimal are particularly effective . Therefore, for different requirements of the machine, in the design of access mode will be different.
Little-endian Advantages: Extracting one, two, four, or longer byte data assembly instructions in the same way as all other formats : First, the lowest bit byte is fetched at the offset address of 0, because the address offset and the number of bytes are one-to-two relationships, The mathematical function of multiple precision is relatively easy to write .
If you increase the value of the number, you may increase the number on the left (the high-level non-exponential function requires more numbers). Therefore, it is often necessary to increase the number of two digits and move all Big-endian in the memory, moving all numbers to the right, which increases the workload of the computer. However, the non-important bytes in the memory using Little-endian can exist in its original position, and the new number can exist in its right high address. This means that some computations in the computer can become simpler and faster.
Iv. How to check whether the processor is Big-endian or Little-endian?
Because the union union is stored in the order that all members are stored from the low address , this feature makes it easy to obtain CPU-to-memory Little-endian or Big-endian mode reads and writes . For example:
int Checkcpuendian ()
{
Union
{
unsigned int A;
unsigned char b;
}c;
C.A = 1;
return (C.B = = 1);
}
V. Big-endian and Little-endian Conversions
Intel's X86 on the existing platform is Little-endian, and Big-endian, like Sun's SPARC, uses it. So how do you implement byte-order conversions in a cross-platform or network program? This shift operation through C is easy to implement, such as the following macro:
#if defined (Big_endian) &&!defined (Little_endian)
#define Htons (a) (a)
#define HTONL (a) (a)
#define NTOHS (a) (a)
#define NTOHL (a) (a)
#elif defined (Little_endian) &&!defined (Big_endian)
#define Htons (A) (((UInt16) (a) & 0xff00) >> 8) |
(((UInt16) (A) & 0X00FF) << 8))
#define HTONL (A) (((UInt32) (a) & 0xff000000) >> 24) |
(((UInt32) (A) & 0x00ff0000) >> 8) | \
(((UInt32) (A) & 0x0000ff00) << 8) | \
(((UInt32) (A) & 0x000000ff) << 24))
#define NTOHS htons
#define NTOHL Htohl
#else
#error "Either Big_endian or Little_endian must be #defined, and not both."
Linux-big-endian and Little-endian Conversions