Big-endian small end (Big-endian and Little-endian) [turn]

Last Update:2016-02-09 Source: Internet

Author: User

Tags binary to decimal htons

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original source:

BYTE-order (Endian), big-endian (Big-endian), small end (Little-endian)

Http://www.cppblog.com/tx7do/archive/2009/01/06/71276.html

In various computer architectures, the storage mechanism of byte, word and so on is different, which leads to a very important problem in the field of communication, that is, the information unit (bits, bytes, words, double words, etc.) communicated by both parties should be transmitted in what order. If no consistent rules are reached, both parties will not be able to properly encode/decode and cause communication failure. There are two main types of byte storage mechanisms commonly used in computers in various systems: Big-endian and Little-endian, which begin with the byte sequence.

First, What is a byte order

BYTE-order, as the name implies the order of bytes, and then say two more than a byte type of data in memory storage order (a byte of data of course there is no need to talk about the order of the problem). In fact, most people in the actual development of the very few direct and byte-order dealings. byte order is a problem that should be considered only in cross-platform and network programs.

In all the articles that introduce the byte order, the byte order is mentioned in two categories: Big-endian and Little-endian, and the reference standard Big-endian and Little-endian are defined as follows:
A) The Little-endian is the low-bit bytes emitted at the lower address of the memory, high-bit bytes emitted in the memory of the higher address .
b) The Big-endian is the high-bit byte emitted at the low address of the memory, and the low byte is discharged at the upper address of the memory .
c) Network byte order: TheTCP/IP layer protocol defines the byte order as Big-endian, so the byte order used in the TCP/IP protocol is often referred to as the network byte order.

1.1 What is high / Low Address End

First we need to know the spatial layout of memory in our C program image: In c expert programming or advanced Programming for UNIX environments, there is a description of the layout of the memory space, roughly as follows:
-----------------------Maximum memory address 0xFFFFFFFF
Bottom of Stack
Stack
Top of Stack
-----------------------

NULL (void)
-----------------------
Heap
-----------------------
Uninitialized data
-----------------------collectively referred to as data segments
Initialized data
-----------------------
Body segment (Code snippet)
-----------------------Minimum memory address 0x00000000

For example, if we allocate a unsigned char buf[4] on the stack, how is this array variable laid out on the stack? See:
Bottom of stack (high address)
----------
BUF[3]
BUF[2]
BUF[1]
BUF[0]
----------
Top of stack (low address)

1.2 What is high / Low byte

Now we've figured out the high/low address, then the high/low byte . Some articles say that the lower byte is the least significant bit and the high byte is the most significant bit. If we have a 32-bit unsigned integer 0x12345678, then what is high and what is low? It's actually very simple. In decimal we say that the left side is high, the right side is low, and so is the other system . Take 0x12345678, the bytes from high to low are 0x12, 0x34, 0x56, and 0x78 in turn.
The high/low address end and high/low byte are all clear. Let's review the definitions of Big-endian and Little-endian and illustrate the two byte-sequences with illustrations:
Take unsigned int value = 0x12345678 as an example, and look at its storage in two byte sequences, we can use unsigned char buf[4] to represent value:

Big-endian: Low address holds high , such as:
Bottom of stack (high address)
---------------
BUF[3] (0x78)--Low
BUF[2] (0x56)
BUF[1] (0x34)
Buf[0] (0x12)--high
---------------
Top of stack (low address)

Little-endian: Low address storage , such as:
Bottom of stack (high address)
---------------
BUF[3] (0x12)--high
BUF[2] (0x34)
BUF[1] (0x56)
Buf[0] (0x78)--Low
--------------
Top of stack (low address)

Second, various Endian

2.1 Big-endian

A term in computer architecture that describes the order of multi-byte storage in which the most important byte (MSB) is stored at the lowest-end address . The processors with this mechanism are the IBM3700 series, the PDP-10, the Mortolora microprocessor family and the vast majority of RISC processors.
+----------+
| 0x34 |<--0x00000021
+----------+
| 0x12 |<--0x00000020
+----------+
Figure 1: Double-byte number 0x1234 in Big-endian mode with start address 0x00000020

　In Big-endian, the sequence number in the bit sequence is as follows (in the case of double-byte 0x8b8a ):
Bit 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+-----------------------------------------+
Val | 1 0 0 0 1 0 1 1 | 1 0 0 0 1 0 1 0 |
+----------------------------------------+
Figure 2: Bit sequence encodingfor Big-endian

2.2 Little-endian

A term in computer architecture that describes the order of multi-byte storage in which the least significant byte (LSB) is stored at the lowest-end address . The processors with this mechanism are PDP-11, VAX,Intel series microprocessors , and some network communication devices. This term is often used to describe the order of emissions for each bit in a byte, in addition to the multi-byte storage sequence.

+----------+
| 0x12 |<--0x00000021
+----------+
| 0x |<-- 0x00000020
+----------+

Figure 3: Double-byte number 0x1234 in Little-endian mode with start address 0x00000020

In Little-endian, the sequence and Big-endian of the bit sequence are exactly the opposite , as follows ( in the case of double-byte 0x8b8a ):

Bit 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
+-----------------------------------------+
Val | 1 0 0 0 1 0 1 1 | 1 0 0 0 1 0 1 0 |
+-----------------------------------------+
Figure 4:little-endian Bit Sequence encoding method

Note 2: Usually the host order we say is followed by the Little-endian rule . Therefore, when the two hosts to communicate through the TCP/IP protocol, it is necessary to call the corresponding function for host order (Little-endian) and network Order (Big-endian) conversion .

Note 3: It is precisely because these two mechanisms for the same bit sequence ordinal arrangement is the opposite, so the "Modern English-Chinese dictionary" in the translation of the MSB is "the most effective bit" defect, so this article is defined as "the most important bit/byte."

2.3 Middle-endian

In addition to Big-endian and Little-endian, multi-byte storage sequences are Middle-endian, for example, in 4 bytes: like Middle-endian is stored in the order of 3-4-1-2 or 2-1-4-3. This storage order occasionally occurs in the compressed format of decimal numbers in some minicomputer architectures .

Embedded system developers should be well aware of the Little-endian and Big-endian models. CPUs in Little-endian mode are stored in the operands from low to high byte , while the Big-endian mode stores the operands from high to low bytes. The 32bit wide number of 0x12345678 in Little-endian mode CPU memory (assuming that it is stored from the address 0x4000) is as follows:

Memory address	0x4000	0x4001	0x4002	0x4003
Store content	0x78	0x56	0x34	0x12

In Big-endian mode, the CPU memory is stored in the following way:

Memory address	0x4000	0x4001	0x4002	0x4003
Store content	0x12	0x34	0x56	0x78

Third, Big-endian and the Little-endian Pros and cons

Big-endian Pros: by first extracting high-order bytes, you can always determine whether the number is positive or negative by looking at the byte at offset 0. you don't have to know how long this number is, or you don't have to go through some bytes to see if the value contains a sign bit . This value is stored in the order in which they are printed , so functions from binary to decimal are particularly effective . Therefore, for different requirements of the machine, in the design of access mode will be different.

Little-endian Advantages: Extracting one, two, four, or longer byte data assembly instructions in the same way as all other formats : First, the lowest bit byte is fetched at the offset address of 0, because the address offset and the number of bytes are one-to-two relationships, The mathematical function of multiple precision is relatively easy to write .

If you increase the value of the number, you may increase the number on the left (the high-level non-exponential function requires more numbers). Therefore, it is often necessary to increase the number of two digits and move all Big-endian in the memory, moving all numbers to the right, which increases the workload of the computer. However, the non-important bytes in the memory using Little-endian can exist in its original position, and the new number can exist in its right high address. This means that some computations in the computer can become simpler and faster.

Four, How to check if the processor is Big-endian or is Little-endian?

Because the union union is stored in the order that all members are stored from the low address , this feature makes it easy to obtain CPU-to-memory Little-endian or Big-endian mode reads and writes . For example:
int Checkcpuendian () {
Union {
unsigned int A;
unsigned char b;
}c;
C.A = 1;
return (C.B = = 1);

}/*return 1:little-endian, return 0:big-endian*/

Five, Big-endian and the Little-endian Conversion

Intel's X86 on the existing platform is Little-endian, and Big-endian, like Sun's SPARC, uses it. So how do you implement byte-order conversions in a cross-platform or network program? This shift operation through C is easy to implement, such as the following macro:

#if defined (Big_endian) &&!defined (Little_endian)

#define Htons (a) (a)
#define HTONL (a) (a)
#define NTOHS (a) (a)
#define NTOHL (a) (a)

#elif defined (Little_endian) &&!defined (Big_endian)

#define Htons (A) (((UInt16) (a) & 0xff00) >> 8) |
(((UInt16) (A) & 0X00FF) << 8))
#define HTONL (A) (((UInt32) (a) & 0xff000000) >> 24) |
(((UInt32) (A) & 0x00ff0000) >> 8) | \
(((UInt32) (A) & 0x0000ff00) << 8) | \
(((UInt32) (A) & 0x000000ff) << 24))
#define NTOHS htons
#define NTOHL Htohl

#else

#error "Either Big_endian or Little_endian must be #defined, and not both."

Network byte order
1. bits in bytes are not affected by this order
For example, a byte 1000 0000 (or hexadecimal 80H) regardless of the order in which its in-memory notation is true.

2, byte order problem with data type greater than 1 bytes
For example byte A, this variable has only one byte length, so there is no byte order problem based on the previous one. So byte order is the meaning of "the relative order between bytes".

3. There are two byte orders for data types that are greater than 1 bytes
For example, short B, which is a two-byte data type, there is a relative order between bytes.
The network byte order is the "WYSIWYG" order . The byte order of the Intel type CPU is the opposite.
For example, the short b=0102h above (16 binary, each of which represents the width of a byte). What you see is "0102", according to General mathematical knowledge, the number of axes from left to right increase, that is, memory address from left to right increase, in memory this short B byte order is:
01 02
This is the network byte order. The order you see is consistent with the order in memory!
Assume that the two-byte stream of network data is obtained by grabbing packets: 01 02

The opposite byte order is different, in memory order: 02 01

If this represents a variable of two byte type, then nature does not need to consider the problem of byte order. If this represents a short variable, then you need to consider the byte order problem. According to the network byte order "WYSIWYG" rule, the value of this variable is: 0102

Assuming that the local host is of the Intel type, it is a bit cumbersome to represent this variable:
Define variable short X, byte stream address: PT, sequentially read memory is for x=* ((short*) PT);
So X'sMemory OrderOf course it is.01 02 by non-"WYSIWYG" rule, this memory order is obviously not the same as what you see, so swap the two-byte position. The swap method can be defined by itself, but it is more convenient to use the existing API.

Network byte order vs. host byte order
Nbo and HBO network byte Order Nbo (network byte order): Store in order from high to low, and use uniform network byte order on the network to avoid compatibility issues. Host byte sequence (hbo,host byte order): Different machine HBO is not the same, and CPU design about computer data storage has two byte precedence: High byte precedence and low byte priority。data on the Internet is transmitted over the network in high order byte precedence, so for machines that store data internally as a low-byte priority, a conversion is required to transfer data over the Internet.

Htonl ()
Briefly:
Converts the unsigned long-shaped number of hosts to network byte order.
#include <winsock.h>
U_long PASCAL far htonl (U_long Hostlong);
Hostlong: 32 digits of the host byte order expression.
Comments:
This function converts a 32-digit number from host byte order to network byte order.
return value:
HTONL () returns the value of a network byte order.

Inet_ntoa ()
Briefly:
Convert the network address to "." The string format of the dots.
#include <winsock.h>
Char far* PASCAL far inet_ntoa (struct in_addr in);
In: A structure that represents an Internet host address.
Comments:
This function converts an Internet address structure represented by an in parameter into a string of "." intervals, such as "a.b.c.d". Note that the string returned by Inet_ntoa () is stored in the memory allocated by the Windows socket implementation. The application should not assume how the memory is allocated. The data is guaranteed to be valid before the next Windows socket interface call on the same thread.
return value:
If no error occurs, Inet_ntoa () returns a character pointer. Otherwise, NULL is returned. The data should be copied before the next Windows socket interface call.

The data transmitted in the network and local bytes stored in the same order, and some are very different, in order to data consistency, it is necessary to convert the local data into the format used on the network, and then send out, receive the same time, after conversion and then to use the data, The basic library functions provide such functions as byte conversions, such as htons () htonl () Ntohs () Ntohl (), where n means Network,h () host,htons () for local byte-to-network byte conversions, s represents short, that is, for a 2-byte operation, l means Long is the 4-byte operation. similarly ntohs () Ntohl () is used to convert network bytes to local format.

Big-endian small end (Big-endian and Little-endian) [turn]

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More