Big End (Big-endian and Little-endian)

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In a variety of computer architectures, the storage mechanism of byte, word and so on is different, which leads to a very important problem in the field of computer communication, that is, the Communication Information Unit (BITS, bytes, word, double word, etc.) should be transmitted in order. If no consistent rules are reached, the two sides of the communication will not be able to properly encode/decode the communication to fail. At present, there are two kinds of byte storage mechanisms commonly used in computers of various systems: Big-endian and Little-endian, which are first mentioned in byte order.

First, what is the byte order

byte order, as the name implies byte order, say two more words is greater than a byte type of data in the memory of the order (a byte of the data of course there is no need to talk about the order of the problem). In fact, most people rarely deal directly with byte-order in actual development. Only the byte-order in Cross-platform and network programs is a problem that should be considered.

In all the articles that introduce the byte sequence, the byte order is divided into two categories: Big-endian and Little-endian, and the reference standard Big-endian and Little-endian are defined as follows:
A) Little-endian is the low byte emissions in the memory of the lower address end, high byte emissions at the high address of memory.
b Big-endian is the high byte emissions in the memory of the low address end, low byte emissions in the memory of the higher address.
c) Network byte order: TCP/IP Each layer protocol defines the byte order as Big-endian, so the byte order used in the TCP/IP protocol is usually called network byte order.

1.1 What is high/low address end

First we need to know our C program image in the space layout of memory: In the "c expert programming" or "UNIX environment Advanced Programming" in the memory space layout of the description, roughly the following figure:
-----------------------Maximum memory address 0xFFFFFFFF
Bottom of Stack
Stack
Top of Stack
-----------------------

NULL (empty)
-----------------------
Heap
-----------------------
Uninitialized data
-----------------------collectively known as data segments
Initialized data
-----------------------
Body section (Code snippet)
-----------------------Minimum memory address 0x00000000

For example, if we assign a unsigned char buf[4 on the stack, how does the array variable work on the stack? Look at the picture below:
Bottom of stack (high address)
----------
BUF[3]
BUF[2]
BUF[1]
BUF[0]
----------
Top of stack (low address)

1.2 What is high/low byte

Now we have a high/low address, then consider high/low byte. In some articles, the Low-order byte is the least significant bit, and the high byte is the most significant bit. If we have a 32-bit unsigned integer 0x12345678, what is the high position and what is the low level? It's actually very simple. In the decimal we all say that the left side is high, the right side is low, in other systems as well. Take 0x12345678, the bytes from high to low are 0x12, 0x34, 0x56, and 0x78 in turn.
High/low address end and high/low byte are all clear. Let's review the definitions of Big-endian and Little-endian and illustrate the two byte sequences graphically:
Take the unsigned int value = 0x12345678 as an example, to see the storage situation in both byte order, we can use unsigned char buf[4] to represent value:

Big-endian: Low address storage high, as shown below:
Bottom of stack (high address)
---------------
BUF[3] (0x78)--Low
BUF[2] (0x56)
BUF[1] (0x34)
Buf[0] (0x12)--high
---------------
Top of stack (low address)

Little-endian: Lower address storage low, as shown below:
Bottom of stack (high address)
---------------
BUF[3] (0x12)--high
BUF[2] (0x34)
BUF[1] (0x56)
Buf[0] (0x78)--Low
--------------
Top of stack (low address)

Ii. various endian

2.1 Big-endian

In computer architecture, a term that describes the sequence of multibyte storage in which the most important byte (MSB) resides on the lowest end of the address. Processors with this mechanism are IBM3700 series, PDP-10, Mortolora microprocessor series and most RISC processors.
+----------+
| 0x34 |<--0x00000021
+----------+
| 0x12 |<--0x00000020
+----------+
Figure 1: Double-byte number 0x1234 Big-endian in the start address 0x00000020

In Big-endian, the ordinal arrangement for a bit sequence is as follows (in the case of a double-byte number 0x8b8a):
Bit 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+-----------------------------------------+
Val | 1 0 0 0 1 0 1 1 | 1 0 0 0 1 0 1 0 |
+----------------------------------------+
Bit sequence encoding method of graph 2:big-endian

2.2 Little-endian

In computer architecture, a term that describes the sequence of multibyte storage in which the least important byte (LSB) resides on the lowest end of the address. The processors that use this mechanism are PDP-11, VAX, Intel series microprocessors and some network communications devices. This term is often used to describe the order in which the bits in a byte are emitted, in addition to the byte-storage order.

+----------+
| 0x12 |<--0x00000021
+----------+
| 0x34 |<--0x00000020
+----------+

Figure 3: Double-byte number 0x1234 Little-endian in the start address 0x00000020

In Little-endian, the ordinal choreography and Big-endian in a bit sequence are just the opposite, as in the case of a double byte number 0x8b8a:

Bit 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
+-----------------------------------------+
Val | 1 0 0 0 1 0 1 1 | 1 0 0 0 1 0 1 0 |
+-----------------------------------------+
Bit sequence encoding method of graph 4:little-endian

Note 2: Normally we say host order is to follow Little-endian rules. So when the two hosts to communicate through the TCP/IP protocol, we need to call the corresponding functions for host order (Little-endian) and network Order (Big-endian) conversion.

Note 3: Because these two mechanisms are opposite to the ordinal arrangement of the same bit sequence, the translation of MSB in modern English-Chinese dictionary is a defect of "most significant position", so this article is defined as "the most important bit/byte".

2.3 Middle-endian

In addition to Big-endian and Little-endian, the order of multibyte storage is Middle-endian, for example, in 4 bytes: 2-1-4-3 is stored in an order such as 3-4-1-2 or Middle-endian. This storage sequence occasionally appears in the compressed format of decimal numbers in some minicomputer systems.

Embedded system developers should have a good understanding of Little-endian and Big-endian patterns. The CPU in Little-endian mode stores the operands from low byte to high byte, whereas the Big-endian mode stores the operands from high byte to low byte. The 32bit-wide number of 0x12345678 in Little-endian mode CPU memory (assuming starting from address 0x4000) is:

Memory address	0x4000	0x4001	0x4002	0x4003
Storage content	0x78	0x56	0x34	0x12

In Big-endian mode CPU memory is stored in the following way:

Memory address	0x4000	0x4001	0x4002	0x4003
Storage content	0x12	0x34	0x56	0x78

Iii. advantages and disadvantages of Big-endian and Little-endian

Big-endian Advantage: By first extracting the High-order byte, you can always determine whether the number is positive or negative by looking at the byte at the offset position of 0. You don't have to know how long this number is, or you don't have to check the number of bytes to see if the value contains a symbol bit. This value is stored in the order in which they are printed, so functions from binary to decimal are particularly effective. Thus, for machines of different requirements, the design of access mode will be different.

Little-endian Advantage: Extracts one, two, four, or more bytes of data assembly instructions in the same way as all other formats: first extract the lowest byte at the offset address of 0, because the address offset and the number of bytes are one-to-one relationships, Mathematical functions of multiple precision are relatively easy to write.

If you add a number to the value, you may add a number to the left (the high non exponential function requires more numbers). As a result, it is often necessary to add two digits and move all the Big-endian in the memory, moving all the numbers to the right, which increases the workload of the computer. However, the less important byte in the Little-endian's memory can exist in its original location, and the new number can exist on its right side of the high address. This means that some calculations in the computer can become simpler and faster.

How to check if the processor is Big-endian or Little-endian?

Because union union is stored in the order that all members are stored from a low address, this feature makes it easy for CPU to use Little-endian or Big-endian mode to read and write. For example:
int Checkcpuendian () {
Union {
unsigned int A;
unsigned char b;
}c;
C.A = 1;
return (C.B = 1);

/*return 1:little-endian, return 0:big-endian*/

V. Big-endian and Little-endian Conversions

Intel's X86 on the existing platform is Little-endian, and the Sun-SPARC uses Big-endian. So how do you implement byte-order conversions across platforms or Web programs? The C-language shift operation is easy to implement, such as the following macro:

#if defined (Big_endian) &&!defined (Little_endian)

#define Htons (a) (a)
#define HTONL (a) (a)
#define NTOHS (a) (a)
#define NTOHL (a) (a)

#elif defined (Little_endian) &&!defined (Big_endian)

#define Htons (A) (((UInt16) (a) & 0xff00) >> 8 | \
(((UInt16) (A) & 0X00FF) << 8))
#define HTONL (A) (((UInt32) (a) & 0xff000000) >> 24 | \
(((UInt32) (A) & 0x00ff0000) >> 8) | \
(((UInt32) (A) & 0x0000ff00) << 8) | \
(((UInt32) (A) & 0x000000ff) << 24))
#define NTOHS htons
#define NTOHL Htohl

#else

#error "Either Big_endian or Little_endian must is #defined, but not both."

Network byte order
1, the byte within the bit is not affected by this order
such as a byte 1000 0000 (or hexadecimal 80H), regardless of the order in which it is represented in memory.

2, more than 1 bytes of data types have byte order problem
For example, byte A, this variable has only one byte length, so there is no byte order problem based on the previous one. So the byte order is the meaning of "relative order between bytes."

3, byte order of data types greater than 1 bytes

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More