How Java's big-endian and C # correspond

Source: Internet
Author: User

The current memory, more than a byte to access the smallest unit, when a logical address must be divided into a number of physical units when there is the problem of who put who, so the end (endian) problem arises, for different storage methods, there is a big (Big-endian) and small End ( Little-endian) of two descriptions.

Byte sorting is divided into big and small ends, the concept is as follows

Big endian: Low address holds high-efficient bytes

Small end (little endian): Low byte storage valid byte

Now the mainstream Cpu,intel series is the use of little endian format to store data, and the Motorola series of CPUs using big endian,arm while supporting big and little, network programming, tcp/ IP unification uses the big-endian way of transmitting data, so sometimes we also call the big-endian network byte order.

In particular, it is important to note that the sequence of data stored in a program written in the C + + language is related to the CPU on which the build platform resides, while Java-written programs only use the big endian to store data.

# # Why does the UTF8 string in Java have a two-byte header after a byte

-----------------------------

The recommended method for labeling byte order in the Unicode specification is the BOM. The BOM is not a BOM for "Bill of Material", but a byte Order Mark.

(Unicode is a method of character encoding, but it is a coding scheme designed by international organizations that can accommodate all languages in the world.) The scientific name for Unicode is "Universal multiple-octet Coded Character Set", referred to as UCS. UCS can be seen as an abbreviation for "Unicode Character Set". )

There is a character called "ZERO WIDTH no-break SPACE" in the UCS encoding, and its encoding is Feff. Fffe is not a character in UCS, so it should not appear in the actual transmission. The UCS specification recommends that the character "ZERO WIDTH no-break SPACE" be transmitted before the byte stream is transmitted.

This means that if the recipient receives Feff, the byte stream is Big-endian, and if Fffe is received, it indicates that the byte stream is Little-endian. So the character "ZERO WIDTH no-break SPACE" is also called a BOM.
using Unicode transcoding directly in Java is split as Utf-16le, plus a BOM. With UTF-16 splitting, the utf-16be split with a BOM is used by default in Java . (In fact, Unicode is exactly the same as UTF-8)

# # References
-----------------------------
-[byte storage sorting: Big-endian and small-end discriminant and conversion] (http://www.cnblogs.com/Romi/archive/2012/01/10/2318551.html)
-[How to Tell in Java whether the CPU is big Endian or small end (Little Endian)] (http://blog.chinaunix.net/uid-1844931-id-3022904.html)
-[Java big-endian and small-end conversion] (http://blog.csdn.net/hhbgk/article/details/50673991)

How Java's big-endian and C # correspond

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.