Big-end and small-end

Source: Internet
Author: User

In the past few days, I have started to apply the network. For network programming, the byte sequence must be kept in mind. Post an article to be noted!

Different CPUs have different sort of bytes. These sort of bytes refer to the order in which integers are stored in the memory. This is called the host order.
There are two most common
1. little endian: stores low-order bytes at the starting address
2. Big endian: stores High-Order bytes at the starting address.

Le little-Endian
The byte sequence that best fits people's thinking
Low-level address storage value
High address storage value
This is the byte sequence that best fits people's thinking, because it is from the perspective of human first.
If the low value is small, it should be placed where the memory address is small, that is, the low value of the memory address.
Otherwise, the high value should be placed in the place where the memory address is large, that is, the memory address is high.

Be big-Endian
The most intuitive byte order
High level of the low-level address storage value
The low storage value of the high address
Why is it intuitive? Do not consider mappings.
Write the memory address from left to right in ascending order.
Write the value in the order of high to low.
By contrast, one byte and one byte are filled in.

Example: Memory dual-word 0x01020304 (DWORD) Storage Method

Memory Address
4000 4001 4002 4003
Le 04 03 02 01
Be 01 02 03 04

Example: If we write 0x1234abcd to the memory starting from 0x0000, the result is
Big-Endian little-Endian
0x0000 0x12 0xcd
0x0001 0x23 0xab
0x0002 0xab 0x34
0x0003 0xcd 0x12
X86 series CPUs are in the byte order of little-Endian.

The Network byte sequence is a data representation format specified in TCP/IP. It has nothing to do with the specific CPU type and operating system, this ensures that data can be correctly interpreted during transmission between different hosts. The Network byte sequence adopts the big endian sorting method.

The following four conversion functions are provided for BSD socket conversion:
Htons converts the unsigned short type from host to Network
Htonl converts the unsigned long type from the host sequence to the network Sequence
Ntohs converts the unsigned short type from the network sequence to the host Sequence
Ntohl converts the unsigned long type from the network sequence to the host Sequence

In systems using little endian, these functions convert the byte order.
In systems using the big endian type, these functions are defined as empty macros.

During network program development or cross-platform development, you should also ensure that only one byte sequence is used. Otherwise, different interpretations of the two parties may cause bugs.

Note:
1. Network and host byte Conversion Function: htons ntohs htonl ntohl (s means that short L is long H is host N is network)
2. different operating systems run on different CPUs in different bytes. See the following table.
Processor operating system byte sorting
Alpha all little endian
HP-PA nt little endian
UNIX big endian HP-PA
Intelx86 all little endian <----- x86 systems are small-end bytecode Systems
Motorola680x () All big endian
MIPs nt little endian
MIPs UNIX big endian
PowerPC nt little endian
PowerPC non-nt big endian <----- PPC system is a large-end bytecode System
RS/6000 UNIX big endian
Linux UNIX big endian
IXP1200 ARM core all little endian

2.

I. Definition of byte order

The byte order, as the name implies, is the order in which data of a byte type is stored in the memory. (Of course, there is no need to talk about the order of data of a byte ).

In fact, most people rarely deal with the byte sequence directly in actual development. Only cross-platform and network programs should be considered in the byte sequence.

In all the articles about the byte sequence, we mentioned that the byte sequence is divided into two types: Big-Endian and little-Endian. The definitions of reference standard Big-Endian and little-Endian are as follows:
A) Little-Endian is the low-byte emission at the low-address end of the memory, and the high-byte emission at the High-address end of the memory.
B) Big-Endian refers to the low address of the memory where the high byte is discharged, and the low byte is discharged to the high address of the memory.
C) network byte order: the 32-bit values of four bytes are transmitted in the following order: the first is 0 ~ 7 bit, followed by 8 ~ 15bit, then 16 ~ 23bit, last 24 ~ 31bit. This transmission order is called the Large-end byte order. Because all the binary integers in the TCP/IP Header must be transmitted in this order, it is also called the network byte order. For example, the two-Byte "Ethernet frame type" in the Ethernet header indicates the type of the subsequent data. For the Ethernet frame type of ARP requests or responses, the order of transmission over the network is 0 x, 0x06. Shows the image in the memory:
Stack bottom (high address)
---------------
0x06 -- low
0x08 -- high
---------------
Stack top (low address)
The value of this field is 0x0806. Store data in the memory as a large client.

Ii. high/low addresses and high/low bytes

First, we need to know the memory space layout in our C program image: the memory space layout is described in "C expert programming" or "Unix environment advanced programming, for example:
----------------------- Maximum memory address 0 xffffffff
| Stack bottom
.
Stack
.
Stack top
-----------------------
|
|
/|/

NULL (empty)

/|/
|
|
-----------------------
Heap
-----------------------
Uninitialized data
---------------- (Collectively referred to as data segment)
Initialized data
-----------------------
Body section (code segment)
----------------------- Minimum Memory Address 0x00000000

For example, if we allocate an unsigned char Buf [4] on the stack, how do we arrange this array variable on the stack? [NOTE 1] See:
Stack bottom (high address)
----------
Buf [3]
Buf [2]
Buf [1]
Buf [0]
----------
Stack top (low address)

Now we have figured out the high and low addresses, and then we can figure out the high/low bytes. If we have a 32-bit unsigned integer 0x12345678 (haha, the above four-byte Buf is regarded as an integer. So what is the high position and what is the low position? It is actually very simple. In decimal, we all say that the value on the left is high, and the value on the right is low. This is also true in other hexadecimal formats. Take 0x12345678 as an example. The bytes from high to low are 0x12, 0x34, 0x56, and 0x78.

Both the High and Low addresses and the high and low bytes are clarified. Let's review the definitions of big-Endian and little-Endian, and illustrate the two types of byte order with the illustration:
Taking unsigned int value = 0x12345678 as an example, we can use unsigned char Buf [4] to show the storage conditions of the two types of bytes respectively:
Big-Endian: high storage for low addresses, for example:
Stack bottom (high address)
---------------
Buf [3] (0x78) -- low
Buf [2] (0x56)
Buf [1] (0x34)
Buf [0] (0x12) -- high
---------------
Stack top (low address)

Little-Endian: Low-address storage, such:
Stack bottom (high address)
---------------
Buf [3] (0x12) -- high
Buf [2] (0x34)
Buf [1] (0x56)
Buf [0] (0x78) -- low
---------------
Stack top (low address)

On the existing platform, intel X86 uses little-Endian, while Sun's iSCSI uses big-Endian.

Iii. Example

Embedded system developers should be familiar with the little-Endian and big-Endian modes. In little-Endian mode, the number of CPUs is stored in bytes from low to high, while in big-Endian mode, the number is stored in bytes from high to low.

For example, the storage method of 16-bit 0x1234 in little-Endian mode CPU memory (assuming that it starts from address 0x4000) is:

Memory Address storage content
0x4001 0x12
0X4000 0x34

In big-Endian mode, the CPU memory is stored as follows:

Memory Address storage content
0x4001 0x34
0X4000 0x12

32-bit-width 0x12345678 storage method in the little-Endian mode CPU memory (assuming it starts from address 0x4000:

Memory Address storage content
0x4003 0x12
0x4002 0x34
0x4001 0x56
0X4000 0x78

In big-Endian mode, the CPU memory is stored as follows:

Memory Address storage content
0x4003 0x78
0x4002 0x56
0x4001 0x34
0X4000 0x12
 
 

 

 

Value: 0x1245
Stored in x86 systems like this
Memory Address Data
00 12
01 45

Read data from a low address when sending
Therefore, the sending order is 12 45.

In other systems, the storage may be as follows:
Memory Address Data
00 45
01 12
If it is not converted to the network byte order and sent at 45 12, the x86 system will be understood as 0x4512.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.