The big/little problem in Java

Source: Internet
Author: User
Tags format arithmetic assert mixed

The big/little problem in problem Java
1. Solve the endian problem: a summary
Everything in the Java binaries is in the form of Big-endian, which is sometimes referred to as the network order. This is a good message, meaning if you only use Java. All files are processed in the same way on all platforms (Mac,pc,solaris, etc.). You can freely exchange binary data, electronically on the Internet, or on a floppy disk, regardless of the endian problem. The problem is that there are problems when you exchange data files with programs that are not written in Java. Because these programs use the Little-endian order, usually the C language that is used on the PC. Some platforms use Big-endian byte order (mac,ibm390), and some use Little-endian byte order (Intel). Java hides the endian problem with the user.

In binary files, there are no delimiters between fields, and files are binary, unreadable ASCII. If the data you want to read is not in a standard format, it is usually prepared by a non-Java program. You can choose from four different options:

1. Rewrite the output program that provides the input file. It can directly output Big-endian byte stream dataoutputstream or character dataoutputsream format.

2. Write a separate translation program that reads and arranges bytes. Can be written in any language.

3. Read the data in bytes and rearrange them (on the fly).

4. The easiest way to do this is to use the Ledatainputstream, Ledataoutputstream and lerandomaccessfile simulations that I have written datainputstream, DataOutputStream and Randomaccessfile, they use a Little-endian byte stream. You can read about Ledatastream. Can download the code and source free. You can get help from the File I/O amanuensis to show you the classes. Just tell it to you have Little-endian binary data.

2. You may not even have any problems.
Many Java novices from C may think they need to consider whether the big or little problem is being used within the platform on which they depend. This is not a problem in Java. Further, you cannot know how they are stored without the help of local classes. Java has no struct I/O and no unions or any of the other endian-sensitive language constructs.

Endian issues need to be considered only when communicating with a legacy C + + application. The following code will produce the same result on the big or little endian machine:

Take 16-bit short apart into two 8-bit bytes.
Short x = 0XABCD;
byte high = (byte) (x >>> 8);
byte low = (byte) x;/* cast implies & 0xFF * *
System.out.println ("x=" + x + "high=" + High + "low=" + low);

3. Read Little-endian Binary Files
The most common problem is dealing with files stored in Little-endian format.

I had to implement routines parallel to those in java.io.DataInputStream which reads raw binary, in my ledatainputstream a nd ledataoutputstream classes. Don ' t confuse this with the IO. Datainput human-readable character-based file-interchange format.

If you are wanted to does it yourself, without the overhead of the full ledatainputstream and Ledataoutputstream classes, here I s The basic technique:

Presuming your integers are in 2 ' s complement Little-endian format, shorts are easy to pretty:


--------------------------------------------------------------------------------

Short Readshortlittleendian ()

{
2 bytes
int low = ReadByte () & 0xFF;
int high = ReadByte () & 0xFF;
Return (short) (High << 8 | low);
}

Or If you are want to get clever and puzzle your readers, you can avoid one mask since the high bits would later be shaved off by conversion.

Short Readshortlittleendian ()

{
2 bytes
int low = ReadByte () & 0xFF;
int high = ReadByte ();
Avoid masking here
Return (short) (High << 8 | low);
}


--------------------------------------------------------------------------------

Longs are a little more complicated:


--------------------------------------------------------------------------------

Long Readlonglittleendian ()

{
8 bytes
Long Accum = 0;
for (int shiftby = 0; Shiftby < shiftby+ = 8)

{
Must cast to long or shift-done modulo 32
Accum |= (Long) (ReadByte () & 0xff) << Shiftby;
}

return accum;
}


--------------------------------------------------------------------------------

In a similar way we handle char and int.


--------------------------------------------------------------------------------

Char Readcharlittleendian ()

{
2 bytes
int low = ReadByte () & 0xFF;
int high = ReadByte ();
Return (char) (High << 8 | low);
}


--------------------------------------------------------------------------------

int Readintlittleendian ()

{
4 bytes
int accum = 0;
for (int shiftby = 0; Shiftby < shiftby+ = 8)

{
Accum |= (ReadByte () & 0xff) << Shiftby;
}

return accum;
}


--------------------------------------------------------------------------------

Floating is a little trickier. Presuming your The data is in the IEEE Little-endian format, you are need something like this:


--------------------------------------------------------------------------------

Double Readdoublelittleendian ()

{
Long Accum = 0;
for (int shiftby = 0; Shiftby < shiftby+ = 8)

{
Must cast to long or shift-done modulo 32
Accum |= ((Long) (ReadByte () & 0xff)) << Shiftby;
}

Return double.longbitstodouble (Accum);
}


--------------------------------------------------------------------------------

Float Readfloatlittleendian ()

{
int accum = 0;
for (int shiftby = 0; Shiftby < shiftby+ = 8)

{
Accum |= (ReadByte () & 0xff) << Shiftby;
}

Return Float.intbitstofloat (Accum);
}


--------------------------------------------------------------------------------

You don ' t need a Readbytelittleendian since the code would is identical to ReadByte, though your might create one just for Consistency:


--------------------------------------------------------------------------------

Byte Readbytelittleendian ()

{
1 byte
return ReadByte ();
}


--------------------------------------------------------------------------------

4. History
In Gulliver's travels the Lilliputians liked to break their eggs on the "small End" and "the" Blefuscudians on the "big" end. They fought wars over this. There is a computer analogy. Should numbers be stored most or least significant byte-a? This is sometimes referred to as byte sex.

Those in the Big-endian camp (most significant byte stored i) include the Java VM virtual computer, the Java binary fi Le format, the IBM 360 and follow-on mainframes such as the 390, and the Motorola 68K and most mainframes. The Power PC is endian-agnostic.

Blefuscudians (Big-endians) Assert this is the way God intended integers to be stored, most important part I. At a assembler level fields of mixed positive integers and text can is sorted as if it were one big text field key. Real programmers read Hex dumps, and Big-endian are a lot easier to comprehend.

In the Little-endian camp (least significant byte A) are the Intel 8080, 8086, 80286, Pentium and follow ons and the A MD 6502 popularised by the Apple [.

Lilliputians (Little-endians) assert that putting "low" order Part I natural because when I do arithmetic Manually, you start in the least significant part and work toward the most significant part. This is ordering makes writing multi-precision arithmetic easier since you. It made implementing 8-bit microprocessors easier. At the assembler level (don't in Java) It also lets you cheat and pass addresses of a 32-bit positive ints to a routine Cting only a 16-bit parameter and still have it work. Real programmers read Hex dumps, and Little-endian are more of a stimulating challenge.

If a machine is word addressable, with no finer addressing supported, the concept of endianness means nothing since are fetched from RAM in parallel, both ends.

5. What Sex is Your CPU?
Byte Sex Endianness of CPUs

Cpu
Endianness Notes

AMD 6502, Duron, Athlon, Thunderird
Little
6502 is used in the Apple [, the Duron, Athlon and Thunderbird in Windows 95/08/me/nt/2000/xp

Apple [6502
Little


Apple Mac 68000
Big
Uses Motorola 68000

Apple Power PC
Big
CPU is bisexual but stays the Mac OS.

Burroughs 1700, 1800, 1900
?
Bit addressable. Used different interpreter firmware instruction for each sets.

Burroughs 7800
?
Algol Machine

CDC LGP-30
Word-addressable only, hence no endianness
31½bit words. The low order bit must is 0 on the drum, but can is 1 in the accumulator.

CDC 3300, 6600
Word-addressable
?

DEC PDP, Vax
Little


IBM 360, 370, 380, 390
Big


IBM 7044, 7090
Word addressable
Bits

IBM AS-400
Big
?

Power PC
Either
The endian-agnostic power-pc ' s have a foot in both camps. They are bisexual, but the OS usually imposes one convention or the other. e.g. Mac powerpcs are Big-endian.

Intel 8080, 8080, 8086, 80286, 80386, 80486, Pentium I, II, III, IV
Little
Chips used in PCs

Intel 8051
Big


MIPS R4000, R5000, R10000
Big
Used in Silcon Graphics IRIX.

Motorola 6800, 6809, 680x0, 68HC11
Big
Early Macs used the 68000. The Amiga.

NCR 8500
Big


NCR Century
Big


Sun Sparc and UltraSparc
Big
Sun ' s Solaris. Normally used as Big-endian, but also has support for operating for Little-endian mode, including being to switch end Ianness under program control for particular loads and stores.

Univac 1100
Word-addressable
36-bit words.

Univac 90/30
Big
IBM 370 Clone

Zilog Z80
Little
Used in CPM machines.


If you are know the endianness of the other cpus/oses/platforms please email me at roedy@mindprod.com.

In theory data can have two different byte sexes but CPUs can have four. Let us give, into this world of mixed left and right hand drive, which there are not real CPUs with all four sexes to Contend with.

The Four Possible Byte sexes for CPUS

Which Byte
Is Stored in the
Lower-numbered
Address?
Which Byte
Is addressed?
Used in
Lsb
Lsb
Intel, AMD, Power PC, DEC.

Lsb
MSB
None that I know of.

MSB
Lsb
Perhaps one of the old word mark architecture machines.

MSB
MSB
Mac, IBM 390, Power PC



--------------------------------------------------------------------------------

You are visitor number 8680.



Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Tags Index: