int i=1;
Char *p= (char *) &i;
if (*p==1)
printf ("1");
Else
printf ("2");
Size-End storage problem, if the small-end mode (I accounted for at least two bytes of length) then I allocated memory of the smallest address that byte stored 1, The other bytes are 0. The big Word 1 is stored at the highest address byte of I, Char is a byte, so forcing the char p to point to I is bound to the lowest address of I, then you can determine whether the value in P is one of the small ends.
Please write a C function, if the processor is Big_endian, then return 0, if Little_endian, then return 1 answer: int checkcpu () { { Union w { int a; Char b; } c; C.A = 1; return (c.b ==1); }: Embedded system developers should be well aware of Little-endian and Big-endian patterns. CPUs in Little-endian mode are stored in the operands from low to high byte, while the Big-endian mode stores the operands from high to low bytes. For example, the 16bit wide number of 0x1234 in Little-endian mode CPU memory (assuming that it is stored from the address 0x4000) is as follows:
Memory address |
0x4000 |
0x4001 |
Store content |
0x34 |
0x12 |
In Big-endian mode, the CPU memory is stored in the following way:
Memory address |
0x4000 |
0x4001 |
Store content |
0x12 |
0x34 |
The 32bit wide number of 0x12345678 in Little-endian mode CPU memory (assuming that it is stored from the address 0x4000) is as follows:
Memory address |
0x4000 |
0x4001 |
0x4002 |
0x4003 |
Store content |
0x78 |
0x56 |
0x34 |
0x12 |
In Big-endian mode, the CPU memory is stored in the following way:
memory address |
|
0x4001 |
0x4002 |
|
store contents |
0x12 |
0x34 |
0x56 |
0x78 |
The Union union is stored in the order that all members are stored from the low address, and the interviewer's solution takes advantage of this feature to easily get the CPU to read or write to memory in Little-endian or Big-endian mode. If anyone can give this answer on the spot, it's a genius programmer. Supplement: The so-called big-endian mode, refers to the data low (that is, the lower weight of the few) stored in the high address of the memory, and the data high, stored in the memory of the low address, such a storage mode is a bit similar to the data as a string order Processing: address from small to large increase, And the data from high to low, the so-called small-end mode, refers to the low-level data stored in the memory of the lower address, and the high data stored in the memory address, this storage mode will address the height and data of the right combination of high-value, high-address part of the weight is low, low value of the address part, is consistent with our logical approach. Why do have a size-and-end model? This is because in the computer system, we are in bytes, each address unit corresponds to a byte, one byte is 8bit. But in the C language in addition to 8bit of Char, there are 16bit short type, 32bit long (to see the specific compiler), in addition, for the number of bits greater than 8 bits of the processor, such as 16-bit or 32-bit processor, because the register width is greater than one byte, Then there must be a problem if multiple bytes are scheduled. The result is a big-endian storage mode and a small-end storage mode. For example, a 16bit short x, where the value of address 0x0010,x in memory is 0x1122, then 0x11 is a high byte and 0x22 is a low byte. For big-endian mode, put 0x11 in the low address, that is, 0x0010, 0x22 placed in the high address, that is, 0x0011. Small-end mode, just the opposite. The X86 structure we commonly use is the small-end mode, while the Keil C51 is the big-endian mode. Many of the arm,dsp are in the small-end mode. Some arm processors can also be hardware to choose between big-endian or small-end mode. The following code can be used to test whether your compiler is big-endian or small-end mode: short int X;char x0,x1;x=0x1122;x0= ((char*) &x) [0]; Low Address Unit x1= ((char*) &x) [1]; High address unit if x0=0x11, it is big-endian; If x0=0x22, it is small end ... The above program also shows that when the data is addressed, the address of the low byte is used. -----------------------------------------------------------------------------------------------
When do you want to convert the size-end byte-order?
A short or long data is best formed when communicating:
1. Use when sending: Htons (L)
2, accept the time use: Ntohs (L) and do not pay attention to the two sides of the communication whether it is necessary to do so ~ ~ Of course, I do not use the int type of data communication, is always a string communication, the sender uses the sprintf organization, the receiver uses atoi to convert ~ ~
-----------------------------------------------------------------------------------------------
End Mode (Endian) is the word "Gulliver's Travels" written by Jonathan Swift. The book is based on the way the eggs are broken into two categories of people, from the beginning of the round the egg to open the person is classified as big Endian, starting from the tip of the egg to open the person is classified as Littile Endian (this sentence most image). The Civil war in the small country originates from the big-endian of the Big Head (or Little-endian) when eating eggs. In the computer industry big endian and little endian also almost caused a war. In the computer industry, endian represents the order in which data is stored in memory. The following examples illustrate the differences in the size-end mode in the computer.
If a 32-bit integer 0x12345678 is stored in an integer variable (int), this integer variable is stored in memory by the big-endian or small-end pattern, as shown in the following table. For simplicity, this article uses OP0 to represent the highest byte MSB (most significant byte) of a 32-bit data, using OP3 to represent a 32-bit data minimum byte LSB (Least significant byte).
Address offset |
Big-endian mode |
Small terminal mode |
0x00 |
(OP0) |
(OP3) |
0x01 |
(OP1) |
(OP2) |
0x02 |
(OP2) |
(OP1) |
0x03 |
(OP3) |
(OP0) |
Small end: A higher valid byte is stored at a higher memory address, and a lower valid byte is stored at a lower memory address.
Big-endian: a higher valid byte is stored at a lower memory address, and a lower valid byte is stored at a higher memory address.
If a 16-bit integer 0x1234 is stored in a short integer variable (shorter). This short integer variable is stored in memory in the size end pattern as shown in the following table.
Address offset |
Big-endian mode |
Small terminal mode |
0x00 |
(OP0) |
(OP1) |
0x01 |
(OP1) |
(OP0) |
As is known from the above table, the main difference between the data storage in the size mode is the byte order in which the big end is stored in the low address, and the small-ended method stores the high-level address. The use of big-endian data storage in line with human normal thinking, and the use of small-end method of data storage is conducive to computer processing. So far, the use of big-endian or small-end data storage, the merits of the pros and cons are inconclusive.
Some processor systems use a small-end approach for data storage, such as Intel's Pentium. Some processor systems use a big-endian approach for data storage, such as IBM Semiconductors and Freescale PowerPC processors. Not only for the processor, some peripheral design also has the use of big-endian or small-end data storage options.
Therefore, in a processor system, there may be a big-endian and small-end mode simultaneously exist phenomenon. This phenomenon for the system hardware and software design brings a big trouble, which requires the system design engineer, must deeply understand the big-endian and small-end mode differences. The difference between the big end and the small terminal mode is embodied in a processor register, instruction set, system bus and other levels.
"Using the function to determine whether the system is big Endian or Little Endian"
Returns true if the byte order is Big-endian;
Reverse is Little-endian, return false
BOOL Isbig_endian ()
{
unsigned short test = 0x1234;
if (* (unsigned char*) &test) = = 0x12)
Returntrue;
Else
return FALSE;
}//isbig_endian ()
Report:
The scale value of the size end is byte, that is, each byte is in the normal order, but byte is assembled into an int or a long, and each byte is placed in a different position.
------------------------------------------------------------------------------------------------
As we all know, the same set of data, storage and presentation of the order can be varied, that is, the storage and presentation format is diverse [1]. The same data, converted into binary, in different computer memory in the internal representation is also a difference, which will have a certain impact on programming. This paper mainly discusses the influence of size-end storage mode on programming and the countermeasures to be taken in programming.
1 Size End Storage Mode overview
Memory (memory) in a computer consists of a large number of storage elements. A storage element is the smallest physical unit of memory used to hold one binary number 0 or 1. These storage elements are divided into groups by the same number of bits (usually 1 bytes 8 bits 1, 2, 4, 8 times times), and all the stored elements in the group read or write the information at the same time, which is the storage unit [2]. Each storage unit has a unique number called the cell address. The storage unit is the basic unit of CPU access memory, and the CPU accesses (reads or writes) the corresponding storage unit through the unit address. Different computers, storage cell addresses are organized differently. If the minimum unit for addressing is a word, referred to as word-addressing, 1 (a) is shown. If the smallest unit of the address is a byte, it is called byte-addressable. Therefore, the CPU accesses one storage unit at a time to access several separate addresses of bytes, figure 1 (b) is the PDP-11 machine, a storage unit holds 2 bytes, a low byte with a even address, a high byte with an odd address, a word address is a multiple of 2, that is, its low-byte address. Figure 1 (c) for the IBM-370 machine, a storage unit holds 4 bytes, the word address is an integer multiple of 4, that is, its high-byte address (as shown in Figure 1 (b)). Figure 1 The different addressing methods of memory are the mainstream of the current memory addressing mode by byte addressing, because the smallest unit of data processing is bytes. From the software point of view, memory is a large byte array. Typically, CPUs and compilers use different formats to encode data, such as integers and floating-point numbers of different lengths, to support multiple data types. In the program, these types of variables are defined to allocate byte space in memory, and when the CPU reads and writes these variables, that is, when accessing memory, it is often possible to read and write several bytes at a time depending on the data type. For more than one byte of data (usually 2N times the byte length, N=1, 2, 3), there are two kinds of storage in memory, that is, the definition of half-word, word, double-word and byte correspondence between the two mapping mechanism [3]. One is that the low-byte portion of the data is stored at the low memory address, the high-byte portion is stored at the high memory address, called the small-endian sequential storage method, also known as the small-end storage mode; the other is that high-byte data is stored at a low address, and low-byte data is stored at a high address, called the It is not difficult to see that figure 1 (b) shows the small-end mode, and Figure 1 (c) is the big-endian mode. Support for big-endian storage mode or small-end storage mode, there is no technical reasons, but related to the processor manufacturer's position and habits.
2 problems caused by different storage modes
For most programmers, the order in which a machine's bytes are stored is completely invisible, regardless of which storage mode the processor compiles will get the same result. That is, for the same piece of source code, separate on the small-end machine compile and run, the result and the result of a separate run on the big-endian machine, although the same data in the size of the memory representation of the format is different, but in the application programmer and the user's eyes, participate in arithmetic logic operations, write read data is not different. However, in some cases, the byte order becomes an issue.
2.1 UNIX problem ——— program portability issues
When porting an earlier version of the UNIX operating system from PDP-11 to an IBM machine, the data "UNIX" is represented as 2 characters and 4 bytes on the PDP-11 of the 16-bit word-length format, and becomes "Nuxi" when ported to an IBM machine in the big-endian storage mode. This is called a UNIX problem. Therefore, it is important to pay special attention to the impact of storage mode when porting programs between processors in different storage sequences.
2.2 reading, interpreting, and sharing binary data issues
Disassembly of the same executable program, when using the disassembler to read machine-level binaries, you see different results on processors of different storage modes; When interpreting, sharing data stored in binary format and using masks, different storage sequences produce different results, for example, a 32-bit small-end machine, Store a constant 0xe48623a0 to a binary file, read in big-endian mode is 0XA02386E4, if the data as IPV4 address, then store and use the appropriate mask for bitwise AND operation also takes into account the impact of storage mode.
2.3 problems with network data transmission
When binary data is transmitted over the network between processors of different storage modes, high-low-byte rollover occurs, for example, from a 32-bit small-end machine, sending a constant 0x01234567, sending and receiving buffers (addresses from low to high) in byte order to 0x67, 0x45, 0x23, 0x01, The other machine receiving this data is a 32-bit big-endian mode, the readout value is 0x67452301, compared to the original value, high and low byte interchange position.
3 Strategies you can take when programming
When programming, consider whether the app is related to storage mode, and if you need to port programs, share data, and network traffic between processors in different storage modes, you can try the following countermeasures. If the program is ported, add the following pre-compilation conditions to the program for different storage modes, and then define the current processor and the compiler-supported storage modes in the header file, such as: #define Little-end #ifdefBig_End ... #endif # Ifdeflittle_end ... #endif if information is shared, there are two solutions: (1) sharing data in a single storage order, simply explaining a format, so decoding is simple; (2) Allow each host to share data in different storage order, but it is necessary to mark out which mode, do not need to convert the original order of data, so the coding is easy, when the sender and receiver decoding the same storage mode, no need to change byte order, can improve communication efficiency.
In the case of network communication, it is possible to refer to and follow the standard network byte order defined by the TCP/IP protocol, where the sender processor first converts the data sent to the network standard, and the receiving processor converts the network standard to its internal representation. The Berkeley application interface defines a set of conversion functions, such as functions htonl and htons, which convert 32-bit long and 16-bit short integer values from host byte order to network byte order, while functions Ntohl and ntohs Convert network byte order to host byte order.
4 Case Study
ZLG/IP can be run on the big-end machine, can also run in the small terminal machine, involving the portability of the program; ZLG/IP is an embedded network communication protocol, which inevitably involves data sharing and network transmission among multiple hosts, and the problem caused by the different storage modes of the size end, ZLG/IP will encounter, how is it solved? Limited to space, only the IP is extracted here. C in the IP header of the send, check, receive function part of the source code. The different storage modes do not affect the read and write of the byte variable members of the IP header or the read and write of the IP address (the IP address is stored in byte arrays in the endian order), so only the 16-bit variable members of the IP header are selected for case analysis.
4.1 Send function
The original version defines a local byte array, regardless of which storage mode is compiled and run, the half-character variable is split into two bytes, high-byte write low address, low-byte write high address, that is, directly in the big-endian order of byte array, to ensure that the transmit and receive buffer memory representation consistency, That is, the consistency of the IP header of the network transmission.
EIP e_ip;
Uint8 ipheaduint8[20];
... e_ip. Totallen= (*txddata). length+20; Ipheaduint8[2]= (e_ip. TOTALLEN&0XFF00) >>8;
Ipheaduint8[3]=e_ip. totallen&0x00f;
F ... e_ip. CRC=CREATEIPHEADCRC (IpHeadUint8);
Ipheaduint8[10]= (e_ip. CRC&0XFF00) >>8;
Ipheaduint8[11]=e_ip. crc&0x00f;
F...... TXDIPDATA.DAPTR=IPHEADUNIT8;
SEND_IP_TO_LLC (&txdipdata, E_ip. DestID, num);
The author's approach is to use the common body Type Union IP-RC implementation, you can save the local byte array space, the size of the case difference treatment, big-endian mode, directly write the original value; in the small-end mode, write the original value first, and then the variable member for the high and low byte conversion, that is, the final write value is different The memory representation is the same (the big-endian order of the original value).
Union IP_RC {EIP e_ip; struct {uint16 wordbuf[10];} words;};
Union IP_RC Iphead; ...... Iphead. E_ip. Totallen= (*txddata). length+20;
#ifdefLittle_End Iphead. E_ip. Totallen=swap_ int16 (iphead e_ IP. To-tallen);
#endif ... Iphead. E_ip. Crc=createipheadcrc_1 (iphead. words. Wordbuf);
#ifdefLittle_End Iphead. E_ip. Crc=swap_int16 (Iphead. E_ip. CRC);
#endif ... Txdipdata. Daptr= (uint8*) & Iphead. E_IP; SEND_IP_TO_LLC (& Txdipdata, Iphead. E_ip. DestID, num);
4.2 IP Header checksum calculation function
Regardless of the storage mode, in order to ensure the uniqueness of the checksum result, the original version of the passed 8-byte array (memory represents the default is the big-endian order) to force the half-word 16-bit big-endian sequential reading and summing.
Union W CRC://Type Union W in IP. h defined, different storage modes, different definitions
Crc. dwords=0;
for (i=0; i<10; i++) Crc. Dwords=crc. dwords+ ((UINT) ip[2* i]<<8) + (UInt32) ip[2* i+1];
The author realizes the checksum computation function is more concise and general, the formal parameter 16 bit half word group is to determine the invariable big endian byte order, the big-endian mode, reads the iph[i] is the original value, the direct accumulation can; in the small end mode, reads the iph[i] must convert the high and low byte, is originally writes the original value, then accumulates
UInt32 temp=0;
for (i=0; i<10, i++)
{
#ifdefBig_End temp=temp+ (UInt32) iph[i];
#endif
#ifdefLittle_End
Temp=temp + (UInt32) (Swap_int16 (iph[i));
#endif}
4.3 receive function
In the original version, passed over the IP header is a big endian byte array of data, in the big-endian mode read out, is the correct original value, in the small-end environment compile and run, memory representation is the great endian order, but in the order of the small-endian read, the result value and the original value of high and low byte order, so, to To get the correct original value.
#ifdefBig_End packedlength= ((eip*) recdata)->totallen;
#endif #ifdefLittle_End packedlength= ((eip*) recdata)->totallen;
ltemp=packedlength&0x00f;
F packedlength= (PACKEDLENGTH&0XFF00) >>8;
packedlength=packedlength+ (LTEMP<<8);
#endif
And the author of the Code, read operation is no matter what kind of storage mode exists, can be unified out, then for the special case of the small-end mode, high-low byte conversion, change back to the original value.
Packedlength= ((eip*) recdata)->totallen;
#ifdefLittle_End packedlength=swap_int16 (packedlength));
#endif
By reading the source, ZLG/IP convention: Regardless of the storage mode of the machine on the compilation run, the IP header in the send and receive buffer byte order is expressed as big endian byte order.
There are two benefits to such a convention:
(1) The packet header of the network communication is guaranteed to be in the same byte order, and the standard of the network byte order is formed, when it is read out in different storage modes of the communication parties, it is converted by the terminal computer system itself;
(2) To facilitate the calculation of the header checksum, regardless of the size of the storage mode, consistent in the big-endian sequence of reading, to ensure the uniqueness of the checksum.
5 Conclusion
The above analysis shows that the code can save the local array space, the code is less, the function module is more general. Therefore, you can further streamline the ZLG/IP in the ICP header, UDP header send, verify, receive source code.
---------------------------------------------
When do you want to convert the size-end byte-order?
Short or Long data in the communication of the best to develop: 1, when the use of: Htons (L) 2, accept the use of: Ntohs (L) and do not care about whether the two sides of the communication need to do so ~ ~ Of course, I do not use the INT type data communication, is always the string communication , the sender uses the sprintf organization, the receiver uses the atoi to convert ~ ~
(turn) Size-end mode detailed