Comparison Between text files and binary files

Source: Internet
Author: User
1. Definitions of text files and binary files

As we all know, computers are physically stored in binary. Therefore, the difference between a text file and a binary file is not physical, but logical. The two are only different at the encoding level.

In short, text files are character-encoded files. Common encodings include ASCII and Unicode. Binary files are value-encoded files. You can specify the meaning of a value based on the specific application (such a process can be considered as custom encoding ).

From the above we can see that text files are basically Fixed Length Encoding (there are also non-Fixed Length Encoding such as UTF-8), based on characters well, each character is fixed in the specific encoding, the ASCII code is 8 bits encoded, and Unicode generally occupies 16 bits. The binary file can be regarded as a variable-length encoding, because it is a value encoding. It is up to you to decide how many bits represent a value. If you are familiar with BMP files, take it as an example. the header of the file is a fixed-length file header. The first two bytes are used to record the file in BMP format, the next 8 bytes are used to record the file length, and the next 4 bytes are used to record the length of the BMP file header. We can see that the encoding is based on the value (not long, 2, 4, 8 bytes long value), so BMP is a binary file.

2. Access to text files and binary files

What is the process of opening a file using a text tool? Taking notepad as an example, it first reads the binary bit stream corresponding to the physical file (as mentioned earlier, the storage is binary ), then explain the stream according to the decoding method you selected, and then display the interpretation result. In general, the decoding method you select will be in the ASCII code format (one character of the ASCII code is 8 bits). Next, it will explain the file stream with 8 bits and 8 bits. For example, for such a file stream "0000000_010000000000000010_0000011" (The underscore ''_'' is added manually to enhance readability ), if the first 8-bit ''1100'' is decoded Based on the ASCII code, the corresponding character is ''a '', similarly, the other three 8 bits can be decoded to ''bcd'', that is, the file stream can be interpreted as "ABCD", and then the notepad will display this "ABCD" on the screen.

In fact, everything in the world needs to communicate with other things, and there is an established protocol, an established code. People communicate with each other through text. The Chinese character "mom" represents the person who gave birth to you. This is an established code. But I noticed that the Chinese character "mom" may be the one you gave birth to in Japanese, therefore, when a Chinese user a communicates with Japanese user B using the word "Mom", it is normal to have misunderstandings. Using notepad to open a binary file is similar to the above. No matter what files are opened in notepad, they all work according to the established character encoding (such as the ASCII code). So when he opens a binary file, garbled characters are also inevitable, decoding does not match decoding. For example, the file stream '''00000000_00000000_00000001 ''may correspond to a four-byte integer int 1 in the binary file. In notepad, the four controllers" null_null_null_soh "are interpreted.

The storage and reading of text files are basically a inverse process and will not be described. The access to binary files is obviously similar to that of text files, but the encoding and decoding methods are different and will not be described.

3. Advantages and Disadvantages of text files and binary files

Because the differences between text files and binary files are only differences in encoding, their advantages and disadvantages are the advantages and disadvantages of encoding. It is clear to look at this encoding book. It is generally believed that the encoding of text files is based on the Character length, and the decoding is easier. The encoding of binary files is longer, so it is flexible and the storage utilization is higher, decoding is difficult (different binary file formats have different decoding methods ). About space utilization, think about it. binary files can even use a bit to represent a meaning (bit operation), while any meaning of a text file must be at least one character.

In Windows, text files are not necessarily stored in ASCII format, because the ASCII code can only represent the 128 mark. You can open a TXT file and save it as an encoded file, you can choose the storage format, generally UTF-8 encoding format compatibility is better. The original computer languages used in binary do not store compatibility.

Many books also believe that text files are easier to read, and storage takes time to convert (compiling code is required for reading and writing), while binary files are less readable. There is no conversion time for storage (not codec for reading and writing, directly write the value ). the readability here is from the perspective of software users, because we can use a general notepad tool to browse almost all text files, so the text files are quite readable; reading and writing a specific binary file requires a specific file decoder. Therefore, the readability of the binary file is poor. For example, to read a BMP file, you must use the image reading software.

The storage conversion time here should be from the programming point of view, because some operating systems such as windows need to convert the carriage return line break '', replace it with ''rn ''. Therefore, when reading and writing a file, the operating system needs to check whether the current character is ''n'' or ''rn'' one by one ''). this is not required for storage and conversion in the Linux operating system. Of course, when files are shared on two different operating systems, this storage conversion may come out again (such as sharing text files in Linux and Windows ).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.