Speaking of the byte sequence, there must be two major CPU factions. That is, Motorola's PowerPC series CPU and Intel's X86 series CPU. The PowerPC series uses the big endian method to store data, while the x86 series uses the little endian method to store data. So what is big endian and little endian?
In fact, big endian is used to store the highest valid byte (MSB) at a low address, while little endian is used to store the lowest valid byte (LSB) at a low address ).
Text descriptions may be abstract. The following uses images to describe them. For example, the storage Order of the number 0x12345678 in two different bytes of CPU is as follows:
Big endian
Low address and high address
----------------------------------------->
+-+
| 12 | 34 | 56 | 78 |
+-+
Little endian
Low address and high address
----------------------------------------->
+-+
| 78 | 56 | 34 | 12 |
+-+
From the above two figures, we can see that using the big endian method to store data is in line with our human thinking habits. And little endian ,! @ # $ % ^ & *, Go to hell-_-|
Why should we pay attention to the issue of byte order? You may ask this question. Of course, if the program you write runs only in a single-host environment and does not deal with other programs, you can ignore the existence of the byte sequence. But what if your program needs to interact with other programs? Here I want to talk about two languages. In C/C ++ programming, the data storage sequence is related to the CPU of the compilation platform, while in Java programming, the only way to store data is big endian. Imagine what will happen if you use a program written on the x86 Platform in C/C ++ to communicate with other Java programs? Take the above 0x12345678 as an example. The pointer pointing to 0x12345678 is passed to the Java program, because Java uses the big endian method to store data, naturally, it translates your data into 0x78563412. What? Is it actually another number? Yes, that's the consequence. Therefore, it is necessary to convert the byte order before your C program passes on to the Java program.
Coincidentally, all network protocols use big endian to transmit data. So sometimes we call the big endian method the byte order of the network. When two hosts communicate in different bytes, data must be converted to network bytes before transmission. Ansi c provides the following four macros for converting the byte order.
Big endian: the highest byte is in the second bit of the address, and the lowest byte is in the highest bit of the address, which is arranged in sequence.
Little endian: the lowest byte is in the lowest Bit, the highest byte is in the highest bit, and is arranged in reverse order.
Endian refers to the arrangement of logical to physical units when the minimum physical unit is smaller than the minimum logical unit hour. The minimum physical unit we come into contact with is byte. In the field of communication, it is often bit, but the principle is similar.
Example:
If we write 0x1234abcd to the memory starting with 0x0000, the result is
Big-Endian little-Endian
0x0000 0x12 0xcd
0x0001 0x34 0xab
0x0002 0xab 0x34
0x0003 0xcd 0x12
Currently, little endian is the mainstream, because address issues are not considered during data type conversion (especially pointer conversion.
PS:
These two terms come from Jonathan Swift's "garifo Travelogue", where the two factions involved cannot reach an agreement on which end-small or big-end-should open a half-cooked egg. :)
In that era, Swift was a constant conflict between Britain and France. Danny Cohen, an early pioneer in network protocols, used these two terms for the first time to refer to the byte sequence, later, this term was widely accepted.
From deep understanding of computer systems
A good book :) An Essay on endian order
Copyright (c) DR. William T. Verts, limit l 19,199 6
Depending on which computing system you use, you will have to consider the byte order in which multibyte numbers are stored, especially when you are writing those numbers to a file. the two orders are called "little endian" and "Big endian ".
The basics
"Little endian" means that the low-order byte of the number is stored in memory at the lowest address, and the high-order byte at the highest address. (The little end comes first .) for example, a 4 byte longint
Byte3 Byte2 Byte1 Byte0
Will be arranged in memory as follows:
Base Address+0 Byte0 Base Address+1 Byte1 Base Address+2 Byte2 Base Address+3 Byte3
Intel processors (those used in PC's) use "little endian" byte order.
"Big endian" means that the high-order byte of the number is stored in memory at the lowest address, and the low-order byte at the highest address. (the big end comes first .) our longint, wocould then be stored:
Base Address+0 Byte3 Base Address+1 Byte2 Base Address+2 Byte1 Base Address+3 Byte0
Motorola processors (those used in Mac's) use "Big endian" byte order.
Which is better?
You may see a lot of discussion about the relative merits of the two formats, mostly religious arguments based on the relative merits of the PC versus the Mac. Both formats have their advantages and disadvantages.
In "little endian" form, assembly language instructions for picking up a 1, 2, 4, or longer byte number proceed in exactly the same way for all formats: first pick up the lowest order byte at offset 0. also, because of the 1:1 Relationship Between Address offset and byte number (offset 0 is byte 0), multiple precision math routines are correspondingly easy to write.
In "Big endian" form, by having the high-order byte come first, you can always test whether the number is positive or negative by looking at the byte at offset zero. you don't have to know how long the number is, nor do you have to skip over any bytes to find the byte containing the sign information. the numbers are also stored in the order in which they are printed out, So binary to decimal routines are special efficient.
What does that mean for us?
What endian order means is that any time numbers are written to a file, you have to know how the file is supposed to be constructed. if you write out a graphics file (such as. BMP file) on a machine with "Big endian" integers, you must first reverse the byte order, or a "standard" program to read your file won't work.
The windows. BMP format, since it was developed on a "little endian" architecture, insists on the "little endian" format. you must write your save_bmp code this way, regardless of the platform you are using.
Common file formats and Their endian order are as follows:
- Adobe Photoshop-- Big endian
- BMP (Windows and OS/2 bitmaps)-- Little endian
- DXF (AutoCAD)-- Variable
- GIF-- Little endian
- IMG (GEM raster)-- Big endian
- JPEG-- Big endian
- Fli (Autodesk animator)-- Little endian
- Macpaint-- Big endian
- PCX (PC paintbrush)-- Little endian
- Postscript-- Not applicable (Text !)
- POV (persistence of vision ray-tracer)-- Not applicable (Text !)
- Qtm (QuickTime movies)-- Little endian (on a Mac !)
- Microsoft riff (. wav &. AVI)-- Both
- Microsoft RTF (Rich Text Format)-- Little endian
- SGI (Silicon Graphics)-- Big endian
- Sun raster-- Big endian
- TGA (Targa)-- Little endian
- Tiff-- Both, endian identifier encoded into File
- WPG (WordPerfect Graphics Metafile)-- Big endian (on a PC !)
- Xwd (X Window dump)-- Both, endian identifier encoded into File
Correcting for the non-native order
It is pretty easy to reverse a multibyte integer if you find you need the other format. A single function can be used to switch from one to the other, in either direction. A simple and not very efficient version might look as follows: function reverse (N: longint): longint;
VaR B0, B1, B2, B3: byte;
Begin
B0: = n mod 256;
N: = N Div 256;
B1: = n mod 256;
N: = N Div 256;
B2: = n mod 256;
N: = N Div 256;
B3: = n mod 256;
Reverse: = (B0*256 + B1) * 256 + b2) * 256 + B3 );
End;
A more efficient version that depends on the presence of hexadecimal numbers, bit masking operators and, or, and not, and shift operators SHL and SHR might look as follows: function reverse (N: longint): longint;
VaR B0, B1, B2, B3: byte;
Begin
B0: = (N and $ 000000ff) SHR 0;
B1: = (N and $0000ff00) SHR 8;
B2: = (N and $00ff0000) SHR 16;
B3: = (N and $ ff000000) SHR 24;
Reverse: = (B0 SHL 24) or (b1 SHL 16) or (B2 SHL 8) or (B3 SHL 0 );
End;
Network byte and host byte
Different CPUs have different sort of bytes. These sort of bytes refer to the order in which integers are stored in the memory. This is called the host order.
There are two most common
1. little endian: stores low-order bytes at the starting address
2. Big endian: stores High-Order bytes at the starting address.
Le little-Endian
The byte sequence that best fits people's thinking
Low-level address storage value
High address storage value
This is the byte sequence that best fits people's thinking, because it is from the perspective of human first.
If the low value is small, it should be placed where the memory address is small, that is, the low value of the memory address.
Otherwise, the high value should be placed in the place where the memory address is large, that is, the memory address is high.
Be big-Endian
The most intuitive byte order
High level of the low-level address storage value
The low storage value of the high address
Why is it intuitive? Do not consider mappings.
Write the memory address from left to right in ascending order.
Write the value in the order of high to low.
By contrast, one byte and one byte are filled in.
Example: Memory dual-word 0x01020304 (DWORD) Storage Method
Memory Address
4000 4001 4002 4003
Le 04 03 02 01
Be 01 02 03 04
Example: If we write 0x1234abcd to the memory starting with 0x0000, the result is
Big-Endian little-Endian
0x0000 0x12 0xcd
0x0001 0x23 0xab
0x0002 0xab 0x34
0x0003 0xcd 0x12
X86 series CPUs are in the byte order of little-Endian.
The Network byte sequence is a data representation format specified in TCP/IP. It has nothing to do with the specific CPU type and operating system, this ensures that data can be correctly interpreted during transmission between different hosts. The Network byte sequence adopts the big endian sorting method.
The following four conversion functions are provided for BSD socket conversion:
Htons converts the unsigned short type from host to Network
Htonl converts the unsigned long type from the host sequence to the network Sequence
Ntohs converts the unsigned short type from the network sequence to the host Sequence
Ntohl converts the unsigned long type from the network sequence to the host Sequence
In systems using little endian, these functions convert the byte order.
In systems using the big endian type, these functions are defined as empty macros.
During network program development or cross-platform development, you should also ensure that only one byte sequence is used. Otherwise, different interpretations of the two parties may cause bugs.
Note:
1. Network and host byte Conversion Function: htons ntohs htonl ntohl (s means that short L is long H is host N is network)
2. different operating systems run on different CPUs, And the byte order is also different. For more information, see