Text Files and binary files (Original Author: mjgforever)

Source: Internet
Author: User
Text Files and binary files
I. Definitions of text files and binary files
As we all know, computers are physically stored in binary. Therefore, the difference between a text file and a binary file is not physical, but logical. The two are only different at the encoding level.
In short, text files are character-encoded files. Common encodings include ASCII and UNICODE. Binary files are value-encoded files. You can specify the meaning of a value based on the specific application (such a process can be considered as custom encoding ).
From the above we can see that text files are basically Fixed Length Encoding (there are also non-Fixed Length Encoding such as UTF-8), based on characters well, each character is fixed in the specific encoding, the ASCII code is 8 bits encoded, and UNICODE generally occupies 16 bits. The binary file can be regarded as a variable-length encoding, because it is a value encoding. It is up to you to decide how many bits represent a value. If you are familiar with BMP files, take it as an example. the header of the file is a fixed-length file header. The first two bytes are used to record the file in BMP format, the next 8 bytes are used to record the file length, and the next 4 bytes are used to record the length of the bmp file header... We can see that the encoding is based on the value (not long, 2, 4, 8 bytes long value), so BMP is a binary file.

Ii. Access to text files and binary files
What is the process of opening a file using a text tool? Taking notepad as an example, it first reads the binary bit stream corresponding to the physical file (as mentioned earlier, the storage is binary ), then explain the stream according to the decoding method you selected, and then display the interpretation result. In general, the decoding method you select will be in the ASCII code format (one character of the ASCII code is 8 bits). Next, it will explain the file stream with 8 bits and 8 bits. For example, for such a file stream "0000000_010000000000000010_0000011" (The underscore ''_'' is added manually to enhance readability ), if the first 8-bit ''1100'' is decoded Based on the ASCII code, the corresponding character is ''a '', similarly, the other three 8 bits can be decoded to ''bcd'', that is, the file stream can be interpreted as "ABCD", and then the notepad will display this "ABCD" on the screen.
In fact, everything in the world needs to communicate with other things, and there is an established protocol, an established code. People communicate with each other through text. The Chinese character "mom" represents the person who gave birth to you. This is an established code. But I noticed that the Chinese character "mom" may be the one you gave birth to in Japanese, therefore, when A Chinese user A communicates with Japanese user B using the word "Mom", it is normal to have misunderstandings. Using notepad to open a binary file is similar to the above. No matter what files are opened in notepad, they all work according to the established character encoding (such as the ASCII code). So when he opens a binary file, garbled characters are also inevitable, decoding does not match decoding. For example, the file stream '''00000000_00000000_00000001 ''may correspond to a four-byte integer int 1 in the binary file. In notepad, the four controllers" NULL_NULL_NULL_SOH "are interpreted.
The storage and reading of text files are basically a inverse process and will not be described. The access to binary files is obviously similar to that of text files, but the encoding and decoding methods are different and will not be described.

Iii. Advantages and Disadvantages of text files and binary files
Because the differences between text files and binary files are only differences in encoding, their advantages and disadvantages are the advantages and disadvantages of encoding. It is clear to look at this encoding book. It is generally believed that the encoding of text files is based on the Character length, and the decoding is easier. The encoding of binary files is longer, so it is flexible and the storage utilization is higher, decoding is difficult (different binary file formats have different decoding methods ). About space utilization, think about it. A binary file can even use a bit to represent a meaning (bit operation), and any meaning of a text file must be at least one character.
Many books also believe that text files are easier to read, and storage takes time to convert (compiling code is required for reading and writing), while binary files are less readable. There is no conversion time for storage (not codec for reading and writing, directly write the value ). the readability here is from the perspective of software users, because we can use a general notepad tool to browse almost all text files, so the text files are quite readable; reading and writing a specific binary file requires a specific file decoder. Therefore, the readability of the binary file is poor. For example, to read a BMP file, you must use the image reading software. the storage conversion time here should be from the programming point of view, because some operating systems such as windows need to convert the carriage return line break (convert ''\ n '', replace it with ''\ r \ n'. Therefore, when reading and writing files, the operating system needs to check each character.
Whether the current character is ''\ n' or'' \ r \ n ''). this is not required for storage and conversion in the Linux operating system. Of course, when files are shared on two different operating systems, this storage conversion may come out again (such as sharing text files in Linux and Windows ). In the next article, "Conversion between Linux text files and Windows text files", I will show "^_^"

IV. C text read/write and binary read/write
It should be said that C's text reading and writing and binary reading and writing are a programming level problem, which is related to the specific operating system. Therefore, "the File Read and writing through text must be a text file, binary-read/write files must be binary files. "This type of view is incorrect. the descriptions below do not explicitly indicate the operating system type, all of which imply windows.
The difference between the text-side reading and writing of C and the binary reading and writing is only reflected in the processing of the carriage return line break. during text writing, each time you encounter a ''\ n' (0AH line break), it will replace it with'' \ r \ n' (0D0AH, press enter to wrap the line ), then write the file. when the text is read, it changes it to ''\ r \ n' every time it encounters a'' \ n', and then sends it to the read buffer. because there is a conversion between ''\ n' --'' \ r \ n' in the text mode, the conversion time is long. when binary data is read and written, it does not have any conversion, and the data in the write buffer is directly written into the file.
In general, from the programming point of view, both text and binary read/write in C are the interaction between the buffer and the binary stream in the file, but there is a line break conversion when the text is read and written. therefore, when there is no linefeed ''\ n' (0AH) in the write buffer, the text Write result is the same as the binary Write result. Similarly, when ''\ r \ n' (0DH0AH) does not exist in the file, the text read result is the same as the binary read result.

V. Instances
5678 Storage Format: ASCII code: 00110101 00110110 00110111 00111000 (four bytes)
5678 Storage Format: Binary: 00010110 00101110 (two bytes)
The only difference between a binary file and a text file is that the former contains some non-standard output ASCII codes. 0x01 is the ASCII code of the non-standard output, and 0x61 is the ASCII code of the standard output .)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.