Representation and processing of data

Last Update:2015-07-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, the basic knowledge

1. Word

Word refers to the computer operation and transmission of the basic unit of data, which is the length of words, words indicate the size of the pointer data type. Therefore, the word length indicates the addressable range of a computer, for example: A 32-bit machine has a pointer of 4 bytes and its addressing range is 0~2^32-1.

The word length is the length of the machine, the length of the CPU, and the length of the operating system is not necessarily the same as the length of the machine (for example, 32-bit systems are installed on 64-bit machines).

2. Length of basic data type in C language

The compiler defines the length of the various data types according to the operating system word size, the following table gives the operating system length and machine word consistency when the case:

C Language Data types	32-bit machine	64-bit Machine
Char	1	1
Short int	2	2
Int	4	4
Long int	4	8
Long Long int	8	8
Pointer type	4	8
Float	4	4
Double	8	8

3. Data alignment

The order in which different machines store bytes in memory may not be the same, mainly divided into small-ended mode and big-endian mode:

The small end pattern stores the low byte of the data first, then stores the high byte (low byte at the low address, high byte at the high address);

The big-endian mode stores the high-byte data first, and then stores the low-byte (high-byte at the low address, low-byte at high address).

There is a problem when passing binary data over the network between two computers with different alignments: The byte order sent by the sender is reversed from the byte order received by the receiver. The TCP/IP protocol defines a uniform network byte order (big endian order) in which the sender converts the host byte order to network byte order before sending the data, and the receiver converts the network byte order to the host byte order before processing. The UNIX system provides the following functions for converting host byte order and network byte order:

Host byte order to network byte order

unsigned short int htons (unsigned short int);

unsigned long int htonl (unsigned long int);

Network byte order to host byte order

unsigned short int ntohs (unsigned short int);

unsigned long int ntohl (unsigned long int);

The above functions are usually only used in the delivery port and other data, and the actual data is not required, the port and other data is required to resolve the TCP/IP protocol stack, must be converted to a uniform byte order, and the order of load data is not the network protocol concerns.

4. A point to note about shift operations

A shift operation on a number that, when the displacement k exceeds the length l of the data type, actually usually moves the K-l-bit. However, the C language standard does not specify this, and the Java language explicitly requires that the actual displacement amount be calculated according to the K% L.

5. Decimal data quickly converted to hexadecimal

For decimal number d, if d = 2n, and n = k + 4m, (K < 4). D = 2k * (in) m, then D can be written as a hexadecimal number:

0xp000 ... 0, p = 2k,p followed by M 0

For example: 211 = 2048,211 = 23 * (24) 2, then 2048 = 0x800.

Then, for any decimal number d = 2n + Q,n = k + 4m, (K < 4). Then we can use the above method to convert the 2n section to hexadecimal number, and then add Q.

For example: 2067 = 2048 + 0x800 = 0x813.

Second, integer representation

1. Encoding of unsigned integers

Uint = xw-12w-1 + xw-22w-2 + ... + X020, unsigned integers encoded as W bits, Xi denotes the value of the bit I (0 or 1), 2i represents the weighted value of the I-bit

The range of unsigned integers that can be represented is 0 ~ 2w-1

2. Encoding of signed integers

Tint =-xw-12w-1 + xw-22w-2 + ... + X020, signed integers are encoded as w bits, similar to unsigned integers, except that the w-1 bit (highest bit) is the right value -2w-1

The range of signed integers that can be represented is -2w-1 ~ 2w-1-1

3. Maximum value and minimum value

Umin = 0, Umax = 2w-1

Tmin = -2w-1,tmax = 2w-1-1

We can find:

| tmin| = Tmax + 1, this is because: half of the number is negative (the highest bit is 1), half the number represents a non-negative (the highest bit is 0), and the non-negative includes 0, so the integer can be expressed less than negative one.

Umax = 2Tmax + 1, because: in a signed integer representation, all negative numbers are integers, that is, Umax = | tmin| + Tmax = 2Tmax + 1.

4. Conversion of unsigned integers and signed integers

When converting between unsigned and signed integers of the same number of digits, the underlying bit representation does not change, but the interpretation of these bits has changed.

Uint-tint available in 1 and 2: Uint-tint = xw-12w-1-(-xw-12w-1) = xw-12w

Then the conversion function between the signed and unsigned integers is as follows:

(1) unsigned integer, signed integer

t2u = T + xw-12w, i.e.:

When T > 0 o'clock, Xw-1 = 0,t2u = T, that is equal to the original value;

When T < 0 o'clock, Xw-1 = 1,t2u = t + 2w;

(2) signed integer, unsigned integer

u2t = u-xw-12w, i.e.:

When U >= 2w-1, Xw-1 = 1,u2t = u-2w;

When u < 2w-1, Xw-1 = 0,u2t = u;

5. Conversion of signed and unsigned integers in C language

The conversion of signed integers and unsigned integers is done in accordance with the rules in 4. In the C language, signed integers and unsigned integers can be cast, and more notably, signed integers are implicitly converted to unsigned integers.

In the C language, signed integers are implicitly converted to unsigned integers if the two integers that participate in the operation are signed integers and the other is unsigned integers.

The above rules may lead to some imperceptible programming errors that we need to be aware of. For example:

Float Add (float num[],unsigned int length)

{

int i = 0;

float sum = 0;

for (i = 0; I <= length-1; i++) //subtraction is handled by addition

Sum + = Num[i];

return sum;

}

The length is declared as an unsigned integer type, and 1 is the signed integer type by default, then 1 is converted to an unsigned integer, and according to the above conversion rule u ( -1) = 2w-1, the subsequent access to the array will be out of bounds.

When the conversion involves both the size conversion and the symbolic conversion, the C Language Conversion rule is to convert the size first, then the symbolic conversion, for example: unsigned int to short int, first expand the bit size, and then the symbol conversion.

When a long word number x is converted to a number with a shorter word length, the high K bit is truncated, and the truncation operation is equivalent to the X-remainder operation: x 2k.

Three, the expression of floating-point number

1. Binary representation of decimals

b = bmbm-1 ... b1b0. B-1b-2. B-n, where the weights before the decimal point are 2m, 2m-1 ... 21, 20, the weighted value after the decimal point is 2-1, 2-2 ... 2-n.

The binary representation of decimals can only be used to accurately represent numbers that can be written as X * 2y, and the other numbers can only be approximated.

2. IEEE754 floating-point notation

IEEE754 is an international standard for floating-point numbers, using ( -1) s * M * 2E to represent a number, where:

s represents the sign bit, 1 indicates negative numbers, and 0 indicates positive numbers;

M represents the Mantissa, which is a binary decimal;

E represents the order code, the floating-point number is weighted, e can be negative;

Take a look at the bit representation of the 32-bit floating-point notation:

Sign bit s (1 bits)

Order E (8-bit)

Mantissa m (23 bits)

According to the bit representation of the order E, it can be divided into the following situations:

(1) Normalization (e not all 0, not all 1)

The bit representation of the order E is interpreted according to an unsigned integer, whose value E is 1 ~ 255, then E = e-(27-1), that is, the value range of E is 126 ~ 127;

M implies starting with 1, and the bit representation of the mantissa m represents only the fraction, that is, the weighted value of the highest bit is-1.

(2) Non-normalized (all E is 0)

E = 1-(27-1);

The mantissa m no longer has an implied value of 1, and its value is the fraction of the decimal part represented.

The mantissa m in the normalized number is always greater than or equal to 1, so it cannot represent 0, and non-normalized numbers are primarily used to represent 0:

The sign bit s is 0, the order E is all 0, the mantissa m is all 0 o'clock, which means +0.0;

The sign bit S is 1, the order E is all 0, the mantissa m is all 0 o'clock, that is, 0.0;

The non-normalized number of E = 1-(27-1) can compensate for the mantissa m no longer contains 1.

(3) Special value (e all 1)

When the mantissa m is all 0, it indicates infinity, the sign bit S is 1 for negative infinity, and S is 0 for positive infinity;

Nan (not a number) is represented when the mantissa m is not 0.

3. Rounding of floating-point numbers

Floating-point numbers have a certain range of representation and precision, and for some decimals, they can only be approximated, that is, the decimals must be rounded and then expressed. The default is to round (round-to-even) to even numbers, with decimal decimals as an example, with 1 digits left after the decimal point:

When the actual value is not a positive middle value, the nearest value is rounded, for example: 1.43 rounded to 1.4, 1.47 rounded to 1.5;

When the actual value is exactly the median value, the least significant number is even, for example: 1.45 rounded to 1.4 instead of 1.5.

Representation and processing of data

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Representation and processing of data

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Representation and processing of data

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support