The origins of endian and big-edian and little-Endian

Source: Internet
Author: User
Tags htons

Reprinted: http://www.eygle.com/digest/2007/01/whats_mean_endian.html

I. Introduction
In various computer architectures, the storage mechanisms for bytes and words are different, leading
A very important issue in the field of computer communication, that is, the information units (bit, byte,
Characters, double characters, etc.) in what order should be transmitted. If no agreement is reached, the communications Parties
Communication will fail due to the failure of correct encoding/decoding. Currently, in the computer of various systems
There are two main types of byte storage mechanisms used:
Big-edian and little-Endian. This article briefly describes the origins, features, and differences between the two storage mechanisms.

For the sake of convenience, the following briefly defines the two terms to be used in this article.
1. MSB
MSB is the abbreviation of most significant bit/byte.
Important bytes. It is usually used to indicate that in a bit sequence (for example, a byte is a sequence consisting of 8 bits)
Column) or a byte sequence (for example, word is a sequence composed of two bytes ).
The bit/byte with the largest impact.
2. LSB
LSB is the abbreviation of least significant bit/byte.
The least important byte. It is usually used to indicate that in a bit sequence (for example, a byte is composed of 8 bits)
A sequence) or a byte sequence (for example, word is a sequence composed of two bytes)
The bit/byte with the least impact on the column value.

Ii. Origins of endian
1. Definition
Endian: the ordering of bytes in a multi-byte number.
Definition: used to describe the storage order of each byte in the Multi-byte number in the computer system architecture.

2. Etymology
The term comes from Swift's "Gulliver's Travels" via the famous paper
"On holy wars and a plea for peace" by Danny Cohen, USC/ISI ien 137,
1980-04-01.
The Lilliputians, being very small, had correspondingly small political
Problems. The big-Endian and little-Endian parties debated over whether
Soft-boiled eggs shoshould be opened at the big end or the little end. [from:
Free on-line Dictionary of computing or jargon file]
Source: According to jargon file, the word endian comes from Jonathan
Swift wrote the ironic novel "Gulliver's Travels" in 1726. This novel
When describing Gulliver's Changyou Minor people's country, we encountered the following scenario. The villain in the country is very
Small (6 inch tall), so I always encounter unexpected problems. Once, because the boiled eggs should be big
The debate over whether one end (big-end) is stripped or a small end (little-end) leads to a war,
Two opposing teams were formed: Swift, the one that supports big-end exploitation, is called Big-endians.
The people who support Stripping from Little-end are called Little-endians ...... (The suffix Ian indicates support.
A certain point of view :-). The term endian comes from this.

In 1980, Danny Cohen published his paper "on holy wars and a plea for peace"
In order to calm down a debate about the order in which bytes should be transmitted in a message, the term is referenced.
In this article, Cohen refers to the group that supports message sequence transmission starting from MSB.
Big-endians, which is called Little-endians. After that, endian
And is widely used in this paper.

Iii. Various endian
1. Big-Endian
A computer architecture in which, within a given multi-byte numeric
Representation, the most significant byte has the lowest address (
Word is stored "big-end-First ").
Most processors, including the IBM 370 family, the PDP-10,
Motorola microprocessor families, and most of the various RISC Designs
Current in mid-1993, are big-Endian. [from: free on-line Dictionary
Computing or jargon file]
Big-Endian: a term in the computer architecture that describes the order of Multi-byte storage.
The most important byte (MSB) is stored on the lowest-end address. This mechanism is applicable to ibm3700 processors.
Columns, PDP-10, mortolora microprocessor series and the vast majority of RISC processors.

 

+ ---------- +
| 0x34 | <-- 0x00000021
+ ---------- +
| 0x12 | <-- 0x00000020
+ ---------- +
Figure 1: the double byte 0x1234 exists in the starting address 0x00000020 in the big-Endian Mode

In big-Endian, the sequence number in the bit sequence is organized as follows (with the double byte 0x8b8a
Example ):
Bit 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+ ---------------------------------------- +
Val | 1 0 0 0 1 0 1 1 | 1 0 0 0 1 0 |
+ ---------------------------------------- +
^ 0x8b 0x8a ^
MSB LSB
Figure 2: bit sequence encoding method of big-Endian

Note 1: Generally, the network order in the TCP/IP protocol stack follows the big-Endian
Rules. In TCP/IP network communication, both parties encode the message in the form of 2, and then press
The sequence from MSB (bit0) to LSB is transmitted over the network.
2. Little-Endian
A computer architecture in which, within a given
16-or 32-bit word, bytes at lower addresses have lower significance (
Word is stored "Little-end-First"). The PDP-11 and VAX families
Computers and Intel microprocessors and a lot of communications and
Networking hardware are little-Endian.
The term is sometimes used to describe the ordering of Units other
Than bytes; most often, bits within a byte. [from: free on-line dictionary
Of computing or jargon file]
Little-Endian: a term in the computer architecture that describes the order of Multi-byte storage.
The least important bytes (LSB) are stored on the lowest-end address. The processor using this mechanism has PDP-11,
Vax, Intel series microprocessor, and some network communication devices. This term not only describes the multi-byte storage sequence
It is often used to describe the emission sequence of each bit in a byte.

+ ---------- +
| 0x12 | <-- 0x00000021
+ ---------- +
| 0x34 | <-- 0x00000020
+ ---------- +
Figure 3: The Double Byte 0x1234 exists in the starting address 0x00000020 as little-endian

In little-Endian, sequence number Orchestration in bit sequence is the opposite of big-Endian.
Format ):

Bit 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
+ ----------------------------------------- +
Val | 1 0 0 0 1 0 1 1 | 1 0 0 0 1 0 |
+ ----------------------------------------- +
^ 0x8b 0x8a ^
MSB LSB
Figure 4: bit sequence encoding method of little-Endian

NOTE 2: Generally, the host order follows the little-Endian rule. So
When the two hosts need to communicate through the TCP/IP protocol, they need to call the corresponding function for the host
The conversion between the order (little-Endian) and the network Order (big-Endian.
NOTE 3: because these two mechanisms have the opposite sequence number arrangement mode for the same bit sequence
The translation of MSB in the English-to-English dictionary is not proper. Therefore, this article defines it as "the most important bit
/Byte ".

3. Middle-Endian:
Neither big-Endian nor little-Endian. Used
Perverse byte orders such as 3-4-1-2 or 2-1-4-3, occasionally found in
The packed decimal formats of some minicomputer manufacturers. [from:
Free on-line Dictionary of computing or jargon file]
Middle-Endian: In addition to big-Endian and little-Endian, the multi-byte storage order is
For example, four bytes are used as an example.
It is middle-Endian. This storage order occasionally compresses the number of decimal digits in some minicomputers.
.
Iv. Final Stage
The two encoding sequences have exceeded the content in this article. If you are interested
To refer to Danny Cohen's paper ("On holy wars and a plea for peace "),
This paper describes in detail the history of these two encoding sequences, the mathematical theories based on and the arguments of their respective advocates.
The focus and other knowledge can definitely satisfy your inner needs.

what is the byte order?
the byte order, as the name implies, the second sentence is the order in which data of the same byte type is stored in the memory (the sequence of data of the same byte does not need to be discussed.
). In fact, most people rarely deal with the byte sequence directly in actual development. Only cross-platform and Network Program are in the byte sequence. In all the articles
about the byte sequence, the byte sequence is divided into big-Endian and little-Endian. The definition of the reference standard Big-Endian and little-Endian
is as follows:
A) Little-Endian is the low address that low byte emissions in the memory, high bytes are discharged at the high address of the memory.
B) Big-Endian is the low address of the memory where the high byte is discharged, and the low byte is discharged to the high address of the memory.
C) network byte sequence: TCP/IP Protocols define the byte sequence as big-Endian. Therefore, the byte sequence used in TCP/IP is usually called the network byte sequence.
PS: in some articles, low byte is the lowest valid bit, and high byte is the highest valid bit.
big endian
means that the most significant byte of any multibyte data field is
stored at the lowest memory address, which is also the address of the
larger field.
little endian means that the least
significant byte of any multibyte data field is stored at the lowest
memory address, which is also the address of the larger field.

What is high/low address? What is high/low byte?
First, we need to know the memory space layout in our C program image: the memory space layout is described in "C expert programming" or "Unix environment advanced programming, for example:
----------------------- Maximum memory address 0 xffffffff
| Stack bottom
.
Stack
.
Stack top
-----------------------
|
|
\ |/

NULL (empty)

/| \
|
|
-----------------------
Heap
-----------------------
Uninitialized data
---------------- (Collectively referred to as data segment)
Initialized data
-----------------------
Body section (CodeSegment)
----------------------- Minimum Memory Address 0x00000000

For example, if we allocate an unsigned char Buf [4] on the stack, how does this array variable layout on the stack? See:
Stack bottom (high address)
----------
Buf [3]
Buf [2]
Buf [1]
Buf [0]
----------
Stack top (low address)
Now
After we have figured out the high/low address, we will consider the high/low byte. If we have a 32-bit unsigned integer 0x12345678, what is the high position and what is the low position? It is actually very simple. In 10
In the hexadecimal system, we all say that the left side is the high level, and the right side is the low level. This is also true for other hexadecimal systems. Take
For 0x12345678, the bytes from high to low are 0x12, 0x34, 0x56, and 0x78.
The high/low address end and the high/low byte are all clarified. Let's review the definitions of big-Endian and little-Endian, and illustrate the two types of byte order with the illustration:
Taking unsigned int value = 0x12345678 as an example, we can use unsigned char Buf [4] to show the storage conditions of the two types of bytes respectively:

Big-Endian: high storage for low addresses, for example:
Stack bottom (high address)
---------------
Buf [3] (0x78) -- low
Buf [2] (0x56)
Buf [1] (0x34)
Buf [0] (0x12) -- high
---------------
Stack top (low address)

Little-Endian: Low-address storage, such:
Stack bottom (high address)
---------------
Buf [3] (0x12) -- high
Buf [2] (0x34)
Buf [1] (0x56)
Buf [0] (0x78) -- low
--------------
Stack top (low address)

On the existing platform, intel X86 uses little-Endian, while Sun's iSCSI uses big-Endian. How can we convert the byte order in a cross-platform or network program? This can be easily implemented through the C language shift operation, such as the following macro:

# If defined (big_endian )&&! Defined (little_endian)

# Define htons ()
# Define htonl ()
# Define ntohs ()
# Define ntohl ()

# Elif defined (little_endian )&&! Defined (big_endian)

# Define htons (A) (uint16) (a) & 0xff00)> 8) | \
(Uint16) (a) & 0x00ff) <8 ))
# Define htonl (A) (uint32) (a) & 0xff000000)> 24) | \
(Uint32) (a) & 0x00ff0000)> 8) | \
(Uint32) (a) & 0x0000ff00) <8) | \
(Uint32) (a) & 0x000000ff) <24 ))
# Define ntohs htons
# Define ntohl htohl

# Else

# Error "either big_endian or little_endian must be # defined, but not both ."

# Endif

 

How do I check whether the processor is big-Endian or little-Endian?
Because the Union storage sequence is that all members are stored from a low address, this feature allows you to easily read and write the memory in little-Endian or big-Endian mode. For example:
Int checkcpuendian (){
Union {
Unsigned int;
Unsigned char B;
} C;
C. A = 1;
Return (C. B = 1 );
}/* Return 1: Little-Endian, return 0: Big-Endian */

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.