What's a text file and what is a binary file:)

Source: Internet
Author: User
Tags control characters printable characters

If you aren't coming from a programming background it might not yet being clear what's really a file? What's a binary file and what makes something a text file?

In fact, a word: Files that consist exclusively of ASCII characters is known as text Les. All other? Les is known as binary Les. Files that contain only ASCII characters are text files. Any other type of file is a binary file "

is a Microsoft Word document a text file or a binary file?


is an spreadsheet a text file or a binary file?


Let's try to explain this.


A Little Disclaimer:


There is actually a lot more variation to this, but I'll focus on files of Unix/linux systems, Windows and MAC. Wikipedia has some more to say about the text files and binary files, so if the this article does not to satisfy your curiosity, t Hen check out those articles.


What is a file?

Basically every file is just a series of bytes one after the other. That is, numbers between 0 and 255. In order to facilitate the storage device they is on, a file might is spread out to several areas on this device. From we point of view, each of the file is just a series of bytes. In general every file was a binary file, but if the data in it contains only text (letter, numbers and other symbols one wo Uld use in writing, and if it consists of lines, then we consider it a text file.


What is a text file?

(I am going to simplify-a bit for clarity and for now assume the files is only use ASCII characters.)


When you open a text file with Notepad or some other, the simple text editor you'll see several lines of text. The file on the "the" and the other hand isn ' t broken up to such lines. It's a series of numbers one after the other. When your open the file using Notepad, it translates each number to a visual representation. For example if it encounters the number, it'll show the letter A. We can say that the a character are represented by the number in ASCII.


The reason that's see several lines in your editor was that some of the bytes in the file, that's called newlines, is Actually instructions to the editor to go to the beginning of next line. Thus the character that is in the file after the newline character would be displayed in the next line.


What is a newline?

Which number represents the newline?


Actually none of the characters in the ASCII table is called a newline. When we say newline we usually mean the "sign" can convince the computer to go to the beginning of next row.


There is various sets of bytes that represent a newline depending on the operating system. In the operating systems we care about in this article a newline are always represented by a combination of both characters That in the ASCII table is called Lf-line feed (hexa 0x0A or decimal) and Cr-carriage return (Hexa 0x0D or decimal 13).


If you had ever seen a typewriter, you'll remember, in order to go to the next line, the user had to pull a handle Towa RDS the beginning of the line. (usually to the left side of the paper.) This movement first pushed the "carriage" to the beginning of the paper and when it arrived to the beginning (and got STUC K), further pulling of the handle turning the paper a bit so the carriage would point to the next line. Then the typewrite is ready for the next line.


That's, they used, operations carriage return, pushing the "carriage" to the beginning of the paper and line feed-go ing to the next line.


Typewriter


(Image from Wikipedia)


Therefore on MS Windows, a newline are represented by the CHARACTERS:CRLF. A carriage return followed by a line feed.


On Unix/linux systems and on Mac OSX, the newline are represented by a single LF (line feed).


Just for Curiosity, Mac OS Classic (before X), Commodore, ZX Spectrum, TRS-80 all used a CR (carriage Return) to Represe NT a newline.


(I learned programming on a ht-1080z which is a TRS-80 clone and later switched to a ZX Spectrum.)


Wikipedia has even more to say about newline.


So if you had a file filled with an ASCII printable characters with a few "newlines" sprinkled in and then you had a text file .


Encoding

Of course if you looked at the ASCII table you saw this only very few languages could is written with those letters. Mostly the Latin based languages. Many languages that use those characters has a few extra letters. For example in Hungarian there is a few more vowels:aáeéiíoó?? Uúü?. The 5 from Latin and 9 extra. For fun.) You cannot represent then within the ASCII table.


Therefore people has invented other encodings, besides ASCII. Without going into the details of each encoding are a mapping between numbers that can being saved in a computer file and "Drawi NGS "That should is displayed on the screen.


Remember, even in ASCII, you don't have a letter a in a file. You have a decimal number, saved that your computer knows to display as the letter A. The computer would display the letter a if it thinks this your file is in ASCII encoding, or in any of the ascii-based or a Scii-compatible encodings, such as Latin1 or UTF8.


The ancient times people used various encoding to represent their own language, but these encodings overlapped. The same number is used to represent difference characters (drawings) in the different languages. That does not allow the mixing of these languages in the same file and if the application is used the incorrect encoding T o Display a file, all got is a mix of unintelligible list of characters from some other language.


Can still see this problem if a Web page is written in one of the these ancient-time encodings, but the browser uses a D Ifferent encoding to show it. The solution would is to include a hint on the encoding in the HTML page, but at times people forget to do this.


The other good solution are to use UTF-8 encoding as this encoding maps out all the characters in the known universe. Unfortunately Klingon is not yet included.


UTF-8 is one of the good ways to map the Unicode characters to numbers. As Unicode currently includes more than 110,000 characters it cannot bes represented in one byte which can hold only number s between 0 and 255. So in UTF-8, every character are represented by 1 to 4 bytes. If you open a file that is written using the UTF-8 encoding, with a tool that can only handle ASCII characters, you'll See lots of "garbage". That's because in UTF-8 some of the characters is represented by numbers that is "control characters" in ASCII.


So to the casual viewer, the file would is indistinguishable from a binary file.


Binary file

A binary file is the basically any file, which is not A "line-oriented". Any file where besides the actual written characters and newlines there is other symbols as well.


So a program written in the C programming language are a text file, but after you compiled it, the compiled version is Bina Ry.


A Perl program is a text file, and if you have it with PAR::P Acker It would be a binary file.


A Microsoft Word file is a binary file as besides the actual text, it also contains various characters representing font s Ize and color.


An Open Office Write file was binary as it is a zipped set of XML files, and the XML files inside is considered text files . Even though they contain both text and characters that represent font-size and color.


An HTML file, was a text file too, even though it contains lots of characters that was invisible when viewed in a browser. It is considered a text file even though a newline, as described above, won ' t cause the next character to being displayed on The next line is viewed through a browsers. It is considered a text file, because all the "control characters" was themselves "printable characters" when viewed in a Regular text editor.


What's a text file and what is a binary file:)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.