Also talk about text files and binary files

Source: Internet
Author: User
Tags fread
There are many articles on text files and binary files on the Internet, but unfortunately these articles are all compared.
. Next I will talk about text files and binary files from multiple perspectives based on the information I have found.

I. Definitions of text files and binary files

As we all know, the computer is physically stored in binary, so the text file and the binary file Zone
It is not physical, but logical. The two are only different at the encoding level.

In short, text files are character-encoded files. Common encodings include ASCII and UNICOD.
E encoding. Binary files are value-encoded files. You can specify a value based on the specific application.
(Such a process can be considered as a custom encoding ).

We can see from the above that text files are basically fixed-length encoding. Based on characters, each character is encoded
The code is fixed, and the ASCII code is 8 bits encoded. UNICODE generally occupies 16 bits. The binary file can be viewed.
It is variable-length encoding, because it is a value encoding. It is up to you to decide how many bits represent a value. Everyone may
If you are familiar with BMP files, take it as an example. Its header is a fixed-length file header, the first two bytes.
The record file is in BMP format. The next 8 bytes are used to record the file length, and the next 4 bytes are used to record
The length of the bmp file header... We can see that the encoding is based on the value (not long, 2, 4,
8-byte long value), so BMP is a binary file.

Ii. Access to text files and binary files

What is the process of opening a file using a text tool? In notepad, it first reads the physical
The corresponding binary bit stream (as mentioned above, the storage is binary), and
Decode the stream and display the result. In general, the decoding method you select will be
ASCII code format (one character of an ASCII code is 8 bits). Next, it will be interpreted in 8 bits and 8 bits.
This file stream. For example, for such a file stream "0000000_010000000000000010_0000011" (move down
Line' _ ', which is manually added to enhance readability). The first 8-bit '123' is interpreted according to the ASCII code.
The corresponding character is 'A'. Similarly, the other three 8 bits can be decoded as 'bcd', that is, this file.
The stream can be interpreted as "ABCD", and the "ABCD" is displayed on the screen in notepad.

In fact, there is an established protocol for everything in the world to communicate with other things.
Encoding. People communicate with each other through text. The Chinese character "mom" represents the person who gave birth to you.
Encoding. But I noticed that the Chinese character "mom" may be the one you gave birth to in Japanese,
Therefore, when A Chinese user A communicates with Japanese user B using the word "Mom", it is normal to have misunderstandings. Use
In notepad, opening a binary file is similar to the preceding situation. No matter what files are opened in notepad, all files are compiled according to the specified characters.
Code operation (such as ASCII code), so when he opens a binary file, garbled code is also a very inevitable thing.
, Decoding does not match decoding. For example, the file stream '2017 _ 00000000_00000000_00000001 'may be in
The base file corresponds to a four-byte integer int 1, which is changed to "NULL_NULL_NU" in notepad.
LL_SOH "control operators.

The storage and reading of text files are basically a inverse process and will not be described. While binary file access is clear
It is similar to accessing text files, except that the encoding/decoding method is different and is not described.

  

Iii. Advantages and Disadvantages of text files and binary files

Because the differences between text files and binary files are only differences in encoding, their advantages and disadvantages are encoding.
For the advantages and disadvantages of this code, it is clear to look at the book. It is generally believed that text file encoding is based on Characters
Fixed Length, easier decoding; binary file encoding is longer, so it is flexible, higher storage utilization, decoding
Difficult (different binary file formats have different decoding methods ). Think about space utilization.
A base file can even use a bit to represent a meaning (bitwise operation), while a text file contains at least one meaning.
Is a character.

Many books also believe that text files are easier to read, and storage takes time to convert (compilation code is required for reading and writing)
But the binary file has poor readability, and the storage does not have the conversion time (do not codec for read/write, write the value directly). Here
The readability is from the perspective of software users, because we can use a general notepad tool to almost browse
There are text files, so the text files are readable. Reading and Writing a specific binary file requires a specific
The file decoder, so the binary file has poor readability. For example, to read BMP files, you must use the image reading software.
The storage conversion time should be from the programming point of view, because some operating systems such as windows need to press enter to wrap
(Replace '/N' with'/r/N'. Therefore, the operating system needs to check each character when reading and writing files.
Whether the current character is '/N' or'/r/N'). This is not required for storage conversion in the Linux operating system. Of course, when
When two different operating systems share files, this storage conversion may come out (such as Linux and Windows ).
S system shares text files ). How to perform this conversion? In the next article
Conversion between indows text files: ^_^

IV. C text read/write and binary read/write

It should be said that C's text reading and writing and binary reading and writing are a programming level problem, with the specific operating system
Therefore, "files read and written in text format must be text files, and files read and written in binary format must be binary files.
This type of view is incorrect. The following statements do not explicitly indicate the operating system type, all of which imply windows.

The difference between the text-based reading and writing of C and the binary-based reading and writing is only reflected in the processing of carriage return linefeeds.
Every time a '/N' (0AH line break) is encountered, it will replace it with'/r/N' (0D0AH, press enter to wrap the line), and then write
When the text is read, it changes to '/N' every time it encounters a'/r/N' and then sends it to the read buffer.
Because there is a conversion between '/N' --'/r/N' in the text mode, the conversion time is long. When binary is read and written, it is not saved.
Write Data in the write buffer directly to the file in any conversion.

In general, from the programming point of view, the text or binary read/write in C is both a buffer and a binary read/write in the file.
Interaction of the stream generation, only when the text is read and written, there is a carriage return line break conversion. Therefore, when the write buffer does not have a line break '/N' (0AH
), The result of text writing is the same as that of binary writing. Similarly, when '/r/N' (0DH0AH) does not exist in the file, the text
The read result is the same as the binary read result.

Below is a small program to prove the previous point.

1. Write the following program. This program writes the string "12/n3" to test1 and t respectively in text and binary modes.
Est2, and then in text

Read test1 and read test2.

# Include <stdio. h>

Int main ()

{

FILE * fp_text, * fp_binary;

Char write_buf [4] = {'1', '2', '/N', '3 '};

Char read_buf_text [6], read_buf_binary [6];

Int read_count_text, read_count_binary;

// Not detected whether opening failed

Fp_text = fopen ("test1", "wt + ");

Fp_binary = fopen ("test2", "wb + ");

Fwrite (write_buf, 4,1, fp_text );

Fwrite (write_buf, 4,1, fp_binary );

// Fflush (fp_text );

// Fflush (fp_binary );

Fseek (fp_text, 0L, SEEK_SET); // fseek comes with the fflush Function

Fseek (fp_binary, 0L, SEEK_SET );//

Read_count_text = fread (read_buf_text, sizeof (char), 5, fp_text );

Read_count_binary = fread (read_buf_binary, sizeof (char), 5, fp_binary );

// Add '/0' to print strings

Read_buf_text [read_count_text] = '/0 ';

Read_buf_binary [read_count_binary] = '/0 ';

Printf ("In Text Mode: read_count = % d, string = % s/n", read_count_text, read_buf
_ Text );

Printf ("In Binary Mode: read_count = % d, string = % s/n", read_count_binary, read
_ Buf_binary );

Fclose (fp_text );

Fclose (fp_binary );

Return 0;

   

}

2. The program is compiled and run in VC6.0. The result is as follows (recall "//" and the content on the right side is a note I added manually.
):

In Text Mode: read_count = 4, string = 12

3 // read test1 in text mode. The characters read are the same as those originally written to test1.
Same character

In Binary Mode: read_count = 4, string = 12

3 // read test1 in binary mode.
Same character

3. Use notepad to open test1 and test2. The result is as follows:

Test1 content:

12

3 // text writing with line breaks. For more information, see section 4 below.

Test2 content

123 // written in binary mode, with no line break effect (Notepad does not have any control strings other than "/r/n"
Display effect), see the following 4

4. Use vc6.0 to open test1 and test2 in Binary mode (Binary mode). The result is as follows:
Write software)

Test1 content

31 32 0D 0A 33 // hexadecimal, five bytes, one more byte than the write buffer, before '/N' (0AH)
Insert a '/R' (0DH)

Test2 content

31 32 0A 33 // hexadecimal format, 4 bytes, consistent with the write buffer value.

 

5. Summary

From 4, we can see that there is '/N'->'/r/N' conversion in text writing, while the binary mode does not.
It can also be introduced from 2 and 4. The conversion from '/r/N'-> to'/N' exists during text reading, while the binary mode does not.
If you are interested, you can read test1 in binary or test2 in text.

6. Additional instructions

The preceding description is only applicable to windows. In linux, there is no difference between text-based read/write and binary-based read/write.
No, there is no conversion between carriage return and line breaks. This way, when files are shared directly in windows and linux
The next article "Conversion between Linux text files and Windows text files" Will
The C program for converting Linux and Windows text files is provided.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.