Information is a broad concept. It is the three elements of nature that are rooted in material and energy. It only involves information in computers;
As we all know, for the computer itself, all information is in the 0/1 binary form;
As a Java EE/Android programmer, character encoding and hexadecimal conversion are sometimes encountered in the development process. Although experience or network search can quickly implement functions, however, every time I feel that I do not have a thorough understanding of the basic concepts, it is better to record the existing understanding for future use;
Information is classified into control information and data information in computers;
The control information is the Control Command Used for the internal operation of the computer system, such as the read/write command, interrupt signal, chip selection signal, reset signal, ready signal, etc.; of course, they are all binary representation;
Data Information refers to various types of data that can be computed, stored, transmitted, collected, and output by a computer. data can be divided into numerical data, text data (character, string), and multimedia data (images, audio, video), binary data (executable files, etc );
The expression of computer information is the same as that of information processing, transmission, storage, and input/output. It is one of the most basic functions of computer systems;
Information Representation in a computer can be divided into information that can be directly expressed by a hardware system and information that can be expressed by a software system. The latter is a data structure, database table structure, and XML/JSON structure;
Here, we only discuss the Information Representation of computer hardware systems;
The abbreviation of BIT, binary digit, which is the smallest unit of information in a computer. It is related to the "electronics" of an electronic computer, and its level is high or low;
Byte, 8-bit, can represent an ascii code of English characters. It is related to the language used by Gome for the invention of the computer;
Number of points. The decimal point remains unchanged. The decimal point of a specified decimal point follows the symbol bit. the decimal point of a fixed point is after the valid value;
Floating Point Number, floating point position; tail number, level code, bottom number; more digits of tail number, the greater the density of the indicated value, the higher the precision; more digits of order code, the larger the value range is;
Unsigned number. All binary bits indicate numerical bits;
Number of symbols. The highest digit is used to represent the symbol bit;
The number of machines. It is a binary representation of a number. The highest digit represents a symbol;
True Value: the real number of machines;
The original code. The symbol bit is added with the absolute value of the true value;
Anticode: the anticode of a positive number is the same as the original code. The anticode of a negative number is the inverse of its original code by bit, except for the symbol bit;
Complement: the positive value of the complement code is the same as the original code. The negative value of the complement code is added with 1 at the end of the reverse code;
All values in the computer are indicated by supplementary codes;
Purpose:
-Enable the symbol bit and the valid value part to participate in the operation (automatic overflow) to simplify the operation rules;
-Convert subtraction to addition (same remainder, modulus, A-B is equal to a + B complement) to simplify the line design of the calculator in the computer;
Overflow: the calculated value exceeds the value range that can be expressed by an integer;
Base, base, and bit permission are two main factors. Binary, octal, and hexadecimal are commonly used in computers;
Convert n to decimal, and sum the values according to the weights;
Convert decimal to N in hexadecimal notation. The integer part is equal to N, and the decimal part is equal to n;
Generally, binary or decimal conversion can be performed;
The specific conversion principles, algorithms, and procedures are introduced in another article;
(Todo complement and floating point operations)
Character encoding and character set:
ASCII-American Standard Code for information interchange. 1 bit + 7bit; English character encoding, produced by laomei. The first bit is 0, which can represent 128 characters. The first bit of the extended set is 1, which adds up
Represents 256 characters;
This is far from enough for text representation of non-English countries, especially Asian countries. As a result, countries began to expand their character sets. Their Respective extensions resulted in incompatibility; the same binary string represents different characters in different languages;
So there is a great Unicode code, which can uniformly represent the texts of all countries in the world. Unicode only gives the character encoding value, how to express these values in a computer is implemented by UTF;
Specific: UTF-8/UTF-16/UTF-32;
The UTF-8 represents Unicode values in 1-6 bytes, The UTF-16 represents Unicode values in 2 or 4 bytes, And the UTF-32 represents Unicode values in 4 bytes;
Since it is a variable multi-byte representation, to avoid ambiguity, it is necessary to specify the length and order of the bytes (BOM );
UTF-8 encoding scheme:
For 1 byte UNICODE character: the first bit is 0, the rest 7 bits are the Unicode value of the character;
For N byte UNICODE character: the first N bits of the first byte are 1, and the N + 1 bit is 0, the first 2 bits of the rest bytes are 10, and all the rest bits are
Unicode value of the character;
Unicode symbol range | UTF-8 encoding method
(Hexadecimal) | (Binary)
------------------------------------------------------------------
0000 0000-0000 007f | 0 xxxxxxx
0000 0080-0000 07ff | 110 XXXXX 10 xxxxxx
0000 0800-0000 FFFF | 1110 XXXX 10 xxxxxx 10 xxxxxx
0001 0000-0010 FFFF | 11110xxx 10 xxxxxx 10 xxxxxx 10 xxxxxx
Other characters are encoded as follows:
Ucs-universal Character Set: UCS-2 (16bit), UCS-4 (32bit ),
Ebcdic-Extended Binary coded decimal interchange code: 8bit.
ISO 8859: 8bit,
Gb2312: 16bit, 94*94 characters, simplified Chinese.
Big5: 16bit, traditional Chinese.
About big-Endian (feff) and little-Endian (fffe ):
Big-Endian: the higher (most significant) byte stored in the lower memory address.
Little-Endian: the lower (least signaficant) byte stored in the lower memory address.
Bom-byte order mark, Zero Width no-break apace, fffe,
(Todo Multimedia Information Representation, graphics, images, animation, audio, video, virtual reality)
Simple understanding of information representation in a computer