Binary and text files

Source: Internet
Author: User

In a word, all file formats (text, EXE, video, and images) are binary files physically. Logically, Their encoding methods are different, in addition to the long history of text files, all the files except text files are called binary files.

Similarly, "binary files" are a form of plausible calling. We know that there are a wide variety of different types of files (formats), such as text files, video files, image files, database files ...., the specific definition of the file format depends on the application. One of the major differences between different file formats is that the encoding of information is different, and the encoding is always "binary". In this regard, all are "binary files ". What we often say is that "binary files" do not mean this. It is relative to "text files", that is, all non-text files (that is, non-ASCII encoding) are called binary files ". it is easy to raise questions here that text files are only in the file format, and all files in the format should be equal, why is it proposed here as a special reference? The reason is very simple, that is, no matter how long the application time is or the universality of the application, ASCII encoded files are incomparable to files in other formats.

Reprinted from below: http://www.cnblogs.com/chio/archive/2008/09/17/1292631.html

I. Definitions of text files and binary files

as we all know, the computer is physically binary stored, so the difference between a text file and a binary file is not physical, but logically. The two are only different at the encoding level.
In short, text files are character-based files. Common encodings include ASCII and Unicode. binary files are value-encoded files. You can specify the meaning of a value based on the specific application (such a process can be considered as custom encoding ).
we can see from the above that text files are basically fixed-length encoded. Based on characters, each character is fixed in the specific encoding, And the ASCII code is 8 bits encoded, unicode generally occupies 16 bits. The binary file can be regarded as a variable-length encoding, because it is a value encoding. It is up to you to decide how many bits represent a value. If you are familiar with BMP files, take it as an example. the header of the file is a fixed-length file header. The first two bytes are used to record the file in BMP format, the next 8 bytes are used to record the file length, and the next 4 bytes are used to record the length of the BMP file header... We can see that the encoding is based on the value (not long, 2, 4, 8 bytes long value), so BMP is a binary file.
2. Access to text files and binary files
what is the process of opening a file using a text tool? Taking notepad as an example, it first reads the binary bit stream corresponding to the physical file (as mentioned earlier, the storage is binary ), then explain the stream according to the decoding method you selected, and then display the interpretation result. In general, the decoding method you select will be in the ASCII code format (one character of the ASCII code is 8 bits). Next, it will explain the file stream with 8 bits and 8 bits. For example, for such a file stream "0000000_010000000000000010_0000011" (The underscore '_' is manually added to enhance readability ), if the first 8-bit '000000' is decoded Based on the ASCII code, the corresponding character is 'A'. Similarly, the other 3 8-bit can be decoded as 'bcd ', that is, the file stream can be interpreted as "ABCD", and then the "ABCD" is displayed on the screen in notepad.
In fact, everything in the world needs to communicate with other things, and there is an established protocol, an established code. People communicate with each other through text. The Chinese character "mom" represents the person who gave birth to you. This is an established code. But I noticed that the Chinese character "mom" may be the one you gave birth to in Japanese, therefore, when a Chinese user a communicates with Japanese user B using the word "Mom", it is normal to have misunderstandings.

Using notepad to open a binary file is similar to the above. No matter what files are opened in notepad, they all work according to the established character encoding (such as the ASCII code). So when he opens a binary file, garbled characters are also inevitable, decoding does not match decoding. For example, the file stream '127 _ 00000000_00000000_00000001 'may correspond to a four-byte integer int 1 in the binary file. In notepad, the four controllers "null_null_null_soh" are interpreted.
The storage and reading of text files are basically a inverse process and will not be described. The access to binary files is obviously similar to that of text files, but the encoding and decoding methods are different and will not be described.

Iii. Advantages and Disadvantages of text files and binary files
Because the differences between text files and binary files are only differences in encoding, their advantages and disadvantages are the advantages and disadvantages of encoding. It is clear to look at this encoding book. It is generally believed that the encoding of text files is based on the Character length, and the decoding is easier. The encoding of binary files is longer, so it is flexible and the storage utilization is higher, decoding is difficult (different binary file formats have different decoding methods ). About space utilization, think about it. A binary file can even use a bit to represent a meaning (bit operation), and any meaning of a text file must be at least one character.
Many books also believe that text files are easier to read, and storage takes time to convert (compiling code is required for reading and writing), while binary files are less readable. There is no conversion time for storage (not codec for reading and writing, directly write the value ). the readability here is from the perspective of software users, because we can use a general notepad tool to browse almost all text files, so the text files are quite readable; reading and writing a specific binary file requires a specific file decoder. Therefore, the readability of the binary file is poor. For example, to read a BMP file, you must use the image reading software. the storage conversion time here should be from the programming point of view, because some operating systems such as windows need to convert the carriage return line break (convert '\ n ', change to '\ r \ n'. Therefore, when reading and writing a file, the operating system needs to check whether the current character is' \ n' or '\ r \ n' one by one '). this is not required for storage and conversion in the Linux operating system. Of course, when files are shared on two different operating systems, this storage conversion may come out again (such as sharing text files in Linux and Windows ).

From the programming point of view, we treat the two files in the same way, that is, they are both 01 codes, but logically they are explained differently.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.