Differences between binary files and text files

Source: Internet
Author: User
Tags control characters

1. Definition of a text file and a binary file

As you know, computer storage is physically binary, so the difference between a text file and a binary file is not physical, but logical. The two are only different at the coding level.

In short, text files are character-encoded files, common encodings are ASCII-encoded, Unicode-encoded, and so on. Binary files are value-coded files, and you can specify what a value is (a process that can be considered a custom encoding), depending on your application.

From the above can be seen that the text file is basically fixed length encoding (also has a non-fixed length encoding such as UTF-8), based on the word Fuma, each character in the specific encoding is fixed, ASCII code is 8 bits of encoding, Unicode generally accounted for 16 bits. Binary files can be thought of as variable-length encodings, because it is the value code, and how many bits represent a value that you decide entirely. You may be familiar with BMP files, take it for example, the head is a more fixed-length file header information, the first 2 bytes used to record the file as BMP format, the next 8 bytes to record the length of the file, and then the next 4 bytes to record the length of the BMP file header. As you can see, the encoding is based on value (indefinite length, 2, 4, 8 byte long values have), so BMP is a binary file.

2. text file and binary file access

What is the process of opening a file with a text tool? With Notepad, it first reads the binary bitstream that the file physically corresponds to (previously said, the store is binary), then interprets the stream in the way you choose to decode it, and then displays the explanation. In general, the decoding method you choose will be in ASCII format (one character of ASCII code is 8 bits), Next, it is 8 bits 8 more than specifically to explain this file stream. For example, for such a file stream "01000000_01000001_01000010_01000011" (Underscore ' _ '), which I added manually for readability, the first 8-bit ' 01000000 ' was decoded by ASCII code, The corresponding character is the character ' A ', the same as the other 3 8 bits can be decoded to "BCD", that is, the file stream can be interpreted as "ABCD", and then Notepad will show this "ABCD" on the screen.

In fact, anything in the world wants to communicate with other things, there is an established protocol, an established code. Between people through the text contact, the Chinese character "Ma" represents the person who gave birth to you, this is an established code. But note that in such a situation, the Chinese character "Ma" in Japanese text may be the person you gave birth to, so when a Chinese and Japanese B with the word "Mom" to communicate, there is a misunderstanding is very normal. Opening a binary file with Notepad is similar to the situation above. Notepad no matter what files are opened according to the established character encoding work (such as ASCII code), so when he opened the binary file, garbled is also a very inevitable thing, decoding and decoding does not correspond. For example, the file stream ' 00000000_00000000_00000000_00000001 ' might correspond to a four-byte integer int 1 in a binary file, and the explanation in Notepad becomes the "Null_null_null_soh" of the four control characters.

The storage and reading of text files is basically a reverse process, no longer to be described. and the binary file access is obviously the same as the text file access, just the way the codec/decoding is different, also no longer described.

3. Advantages and disadvantages of text files and binary files

Because the difference between text files and binary files is only different from the coding, so their advantages and disadvantages are the advantages and disadvantages of the code, this code to find the book to see more clearly. It is generally believed that text file encoding is based on word keystroke length, decoding is easier; binary file encoding is variable length, so it is flexible, storage utilization is higher, portability is better, decoding is difficult (different binary file format, there are different decoding methods). With regard to space utilization, consider that binary files can even use a bit to represent a meaning (bit manipulation), and any one of the text files means at least one character.

Under Windows, the text file is not necessarily an ASCII to store, because the ASCII code can only represent 128 of the identity, you open a TXT document, and then save as, there is an option is encoded, you can choose the storage format, generally speaking UTF-8 encoding format compatibility better. While the binary uses the original language of the computer, does not store compatibility.

Many books also think that the readability of the text file is better, the storage to spend the conversion time (read and write to compile code), while the binary file readability, storage does not exist the conversion time (read and write not codec, direct write value). The readability here is from the software user's point of view, because we can almost browse all the text files with the common Notepad tool, so the text file is good readability, while reading and writing a specific binary file requires a specific file decoder, so the binary file readability is poor, such as reading BMP file, Must be used to read the graph software.

The storage conversion time here should be from a programmatic point of view, because some operating systems such as Windows need to convert the carriage return newline character ("n", replaced by "RN", so the file read and write, the operating system needs a character to check whether the current character is ' n ' or ' RN ') This is not required for storage conversions on Linux operating systems, and of course, when sharing files on two different operating systems, this kind of storage conversion is possible (such as a Linux system and a Windows system shared text file).

C Text read-write and binary read and write

It should be said that C text read and write and binary read and write is a programming level problem, and the specific operating system, so "text-to-read file must be a text file, binary read and write files must be binary files" such views are wrong. The following narrative does not explicitly indicate the operating system type, all alluding to Windows.

The difference between read and write of C and binary read and write is only reflected in the processing of carriage return newline. When the text is written, each time you encounter a ' n ' (0AH newline), it is replaced with ' RN ' (0d0ah, carriage return line), and then written to the file, and when the text is read, it encounters an ' RN ' to change its inverse to ' n ' and then sends it to the read buffer. There is a time-consuming conversion of the "N"--"RN" in the text mode. When binary reads and writes, it does not have any transformations and writes the data in the write buffer directly to the file.

In general, from a programming point of view, C or binary reading and writing are buffers and files in the binary flow of the interaction, but only when the text read and write a carriage return line conversion. So when the write buffer does not have a newline character ' n ' (0AH), the text is written in the same way as the binary write, and similarly, when there is no "RN" (0dh0ah) in the file, the text reads the same as the result of the binary read.

Instance

C Text read-write and binary read and write

It should be said that C text read and write and binary read and write is a programming level problem, and the specific operating system, so "text-to-read file must be a text file, binary read and write files must be binary files" such views are wrong. The following narrative does not explicitly indicate the operating system type, all alluding to Windows.

The difference between read and write of C and binary read and write is only reflected in the processing of carriage return newline. When the text is written, each time a "n" (0AH newline) is encountered, it is replaced with ' R n ' (0d0ah, carriage return line), and then written to the file, and when the text is read, it is changed to ' n ' every time it encounters an ' RN ' and then sent to the read buffer.

When binary reads and writes, it does not have any transformations and writes the data in the write buffer directly to the file.

For documents with the content "Ab123rn" (0D 0A),

1
PF1 = fopen ("F:\1.txt", "R");
2
or PF1 = fopen ("F:\1.txt", "RB");
3
for (int i=0;i < 6;i++) {
4
Fread (&AMP;A[I],1,1,PF1);
5
printf ("%0x", A[i]);
6
}
7
Fclose (PF1);//Close file
The results were as follows: 0 A and a 0D

5678 is stored in: ASCII code: 00110101 00110110 00110111 00111000 (four bytes)

5678 is stored as: binary: 00010110 00101110 (two bytes)

The only difference between binary and text files is that the former contains some ASCII code for non-standard output. 0X01 is an ASCII code for non-standard output, and 0x61 is the ASCII code for standard output.

Differences between binary files and text files

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.