Differences between text files and binary files

Source: Internet
Author: User

I. Definitions of text files and binary files

As we all know, computer storage is physically binary, so the difference between a text file and a binary file is not physical,

Logically. The two are only different at the encoding level.

In short, text files are character-encoded files. Common encodings include ASCII and Unicode. The binary file is

Based on the value encoding file, you can specify the meaning of a value based on the specific application (such a process can be seen as custom encoding)

.

From the above we can see that text files are basically fixed-length encoding (there are non-fixed-length encoding such as UTF-8), based on characters well, each character in

The body encoding is fixed, and the ASCII code is 8 bits. Unicode generally occupies 16 bits. Binary files can be viewed as variable-length encoding.

Because it is a value encoding, it is up to you to decide how many bits represent a value. You may be familiar with BMP files. Take it as an example.

The header is a fixed-length file header. The first two bytes are used to record the file in BMP format, and the next eight bytes are used to record the file length.

And the next 4 bytes are used to record the length of the BMP file header... We can see that the encoding is based on the value (not long,

2, 4, 8 bytes long value), so BMP is a binary file.

Ii. Access to text files and binary files

What is the process of opening a file using a text tool? Taking notepad as an example, it first reads the binary bit stream corresponding to the file physically.

(As mentioned above, the storage is binary.) then, explain the stream according to the decoding method you selected, and then display the explanation result.

. In general, the decoding method you select will be in the ASCII code format (one character of the ASCII code is 8 bits). Next, it will be 8 bits.

Bit to explain the file stream. For example, for such a file stream "0000000_010000000000000010_0000011" (move down

Line ''_'', Which I manually added to enhance readability). If the first 8-bit ''100'' is decoded by ASCII code, the corresponding

The character is ''a'. Similarly, the other three 8 bits can be decoded as ''bcd'', that is, the file stream can be interpreted as "ABCD", and then the notepad will

Display this "ABCD" on the screen.

In fact, everything in the world needs to communicate with other things, and there is an established protocol, an established code. Pass between people

Text contact. The Chinese character "mom" represents the person who gave birth to you. This is an established encoding. However, I noticed that the Chinese character "mom" is on

This text may be the person you gave birth to, so when a Chinese student a communicates with Japanese student B using the word "Mom", a misunderstanding occurs.

It is normal. Using notepad to open a binary file is similar to the above. No matter what files are opened in notepad, they all work according to the established character encoding (

Such as ASCII code), so when he opens a binary file, garbled characters are inevitable. decoding and decoding do not match. For example

The file stream ''1100_00000000_00000000_00000001 ''may correspond to a four-byte integer int 1 in the binary file.

In notepad, the four controllers "null_null_null_soh" are interpreted.

The storage and reading of text files are basically a inverse process and will not be described. The access to binary files is obviously inferior to that of text files.

Many, but the encoding/decoding methods are different.

Iii. Advantages and Disadvantages of text files and binary files

Because the differences between text files and binary files are only differences in encoding, their advantages and disadvantages are the advantages and disadvantages of encoding.

You can see it clearly. It is generally considered that the encoding of text files is based on the Character length, and the decoding is easier. The encoding of binary files is longer.

So it is flexible, the storage utilization is higher, and decoding is difficult (different binary file formats have different decoding methods ). Space Benefits

Usage rate. Think about it. A binary file can even use a bit to represent a meaning (bitwise operation), and any meaning of a text file is at least one.

Characters.

In Windows, text files are not necessarily stored in ASCII format, because the ASCII code can only represent a 128 mark. You can open a TXT document,

Then save as, there is an option is encoding, you can choose the storage format, in general, UTF-8 encoding format compatibility is better.

The original language does not store compatibility.

Many books also believe that text files are easier to read, and storage takes time to convert (compilation code to read and write), while binary files are more readable.

Poor, the storage does not have the conversion time (read/write not codec, write value directly). The readability here is from the perspective of the software user, Because I

They can use a general notepad tool to browse almost all text files, so the text files are readable, and read and write a specific binary file.

A specific file decoder is required. Therefore, binary files have poor readability. For example, to read BMP files, you must use the image reading software.

The storage conversion time here should be from the programming point of view, because some operating systems such as windows need to perform carriage return line breaks

Conversion (replace ''\ n'' with'' \ r \ n''. Therefore, when reading and writing files, the operating system needs to check each character.

Whether the current character is ''\ n' or'' \ r \ n'). This is not required for storage conversion in the Linux operating system. Of course, when two different operations

This storage conversion is possible when files are shared on the system (such as text files shared in Linux and Windows ). How is this conversion?

In the next article, convert a Linux text file to a Windows text file.

IV. C text read/write and binary read/write

It should be said that C's text reading and writing and binary reading and writing are a programming level problem, which is related to the specific operating system.

Files read and written in binary mode must be text files, and files read and written in binary mode must be binary files. This type of view is incorrect.

Indicates the operating system type, which indicates windows.

The difference between C's text-based reading and writing and binary-based reading and writing is only reflected in the processing of carriage return linefeeds.

''\ N' (0ah line break), which is replaced by'' \ r \ n' (0d0ah, press enter to wrap the line), and then write the file; when a text is read

A ''\ r \ n' reversely changes it to'' \ n' and then sends it to the read buffer. because there is a conversion between ''\ n' --'' \ r \ n' in the text mode,

It has time to convert. When binary reads and writes, there is no conversion, and the data in the write buffer is directly written to the file.

In general, from the programming point of view, both the text and binary read/write operations in C are the interaction between the buffer and the binary stream in the file, but only the text read

When writing, there is a line feed conversion by carriage return. Therefore, when there is no line break ''\ n' (0ah) in the write buffer, text writing results are the same as binary writing.

When ''\ r \ n' (0dh0ah) does not exist in the file, the text read result is the same as the binary read result.

 

V. Instances

C text read/write and binary read/write

It should be said that C's text reading and writing and binary reading and writing are a programming level problem, which is related to the specific operating system.

Files read and written in binary mode must be text files, and files read and written in binary mode must be binary files. This type of view is incorrect.

Indicates the operating system type, which indicates windows.

The difference between the text-side reading and writing of C and the binary reading and writing is only reflected in the processing of the carriage return line break.

During text writing, each time you encounter a ''\ n' (0ah line break), it will replace it with'' \ r \ n' (0d0ah, press enter to wrap the line ), then write the file

When the text is read, it returns ''\ r \ n' to the read buffer.

When binary data is read and written, it does not have any conversion, and the data in the write buffer is directly written into the file.

For files whose content is "ab123 \ r \ n" (41 62 31 32 33 0d 0a,

Pf1 = fopen ("F: \ 1.txt"," R "); or pf1 = fopen (" F: \ 1.txt", "rb ");

For (INT I = 0; I <6; I ++ ){

Fread (& A [I], 1, 1, pf1 );

Printf ("% 0x", a [I]);

}

Fclose (pf1); // close the file

The results are as follows:

41 62 31 32 33 0a and 41 62 31 32 33 0d

5678 Storage Format: ASCII code: 00110101 00110110 00110111 00111000 (four bytes)

5678 Storage Format: Binary: 00010110 00101110 (two bytes)

The only difference between a binary file and a text file is that the former contains some non-standard output ASCII codes. 0x01 is the ASCII code of non-standard output,

0x61 is the ASCII code of the standard output .)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.