Differences between Windows and Linux line-wrapping rules

Source: Internet
Author: User

Before the computer appeared, there was a gadget called a telex typewriter (teletype Model 33) that could play 10 characters per second. But it has a problem, that is, when the line is finished, it will take 0.2 seconds to hit two characters. If there are new characters coming in this 0.2 seconds, then this character will be lost.

So, the developers think of a way to solve this problem, is to add two after each line to end the character. One is called "Carriage return" (carriage return), which tells the typewriter to position the printhead at the left border, and the other is called "line Break" (linefeed), telling the typewriter to move the paper down one line.

This is the origin of "line break" and "carriage return", from their English name can also be seen in one or two.

Later, the computer invented, these two concepts are also like to the computer. At that time, memory was expensive, and some scientists thought it would be too wasteful to add two characters at the end of each line. So, there was a disagreement.

Unix system, each line at the end of only "< line >", that is, "\ n", the Windows system, the end of each line is "< Enter >< line >", that is "\ r \ n", the Mac system, the end of each line is "< Enter >". A direct consequence of this is that if the file under the Unix/mac system is opened in Windows, all the text will be turned into one line, and if the files in Windows are opened under Unix/mac, a ^m symbol may appear at the end of each line.

(Above content reproduced from Nanyi blog)

As a result, text files created under Linux will be connected to windows in a single line because Windows does not think of a newline character (CRLF).
Text files created under Windows may appear in Linux with a ^m after each line, this ^m to be typed with Ctrl + V CTRL + M, which means CR (carriage Return).

Speaking of which, some people might ask, why do I create text files in Windows that are normal in Linux?

For example, I create a text file under Windows A.txt, put it in my Linux, open with vim


You can see that the display results are normal and the end of the line does not have a ^m symbol. This is because VIM automatically detects line breaks when the file is opened, and if all newline characters of the text are ^m$ (CRLF, the line-wrapping tag for Windows), then vim automatically displays the text in a DOS format, ignoring the ^m$ at the end of each line, so the text is displayed as normal.

Note that the two arrows above my picture indicate the two flags at the bottom of the Vim editor [Noeol] and [DOS], first to explain the second flag "[DOS]", which means that vim recognizes that each line of text is a newline character of ^m$, so vim automatically displays the file in DOS text format. So we see that the text display is normal.

So why is there a time when a file created or edited under Windows will appear ^m under Linux, and does vim automatically recognize it? This is because VIM checks each line of text for line breaks, and as long as a line of newline characters is not in Windows format, Vim displays the file in UNIX file format, where the newline character is $, so we see a ^m symbol behind the line of text.

Here I use Cat-a to display the special symbols of the file:

A total of four lines of the file, you can see the newline character is ^m$ (arrow), so vim will use the [DOS] file format to display this text.
It is also possible to find that the last line of the file has no newline character, which is the origin of the [NOEOL] flag in vim in the first figure, because the last line of text processed under Windows does not add line breaks, and the rule for text created under Linux is that each line has a newline character, including the last line. So vim will prompt no end-of-line, telling us that this file contains lines that have no line break end.

Use Wc-l to count the number of rows in this file:

The result is 3 rows, one less line, because the last line of the file does not have a newline character.

I am using vim to edit a new file under Linux, the content and just like the a.txt, with cat-a view:

You can see the text created under Linux, each line has a newline character, including the last line, with Wc-l to count the rows:

At this point the statistics are correct.

Use SED to replace the a.txt created under Windows, remove the ^m from the line breaks in ^m$, and change the line character in Linux to $

Here I replaced the second line of the file with a newline character in the Linux format, paying attention to the ^m in the SED command
Instead of direct input at the command line, CTRL + V and ctrl+m. Then use VIM to open this file:


Because the second line of newline character is not ^m$ format, VIM will not be in the DOS file format to display the file press, you can find vim below the [DOS] prompt, indicating that vim in the UNIX file format to display this file, so some lines will be more than a ^m flag.

Added: sed handling of Windows line breaks

By the above you can know that vim for a full text file using the Windows format line breaks will display this text in [DOS] mode, automatically ignoring the ^m of the end of the line.

Bloggers in the use of SED command to process some files, will appear a normal file, after being processed by SED, then open the file again found nasty ^m. How does sed handle the text of a Windows line break?

First create a text file under Windows and pass it to my Linux. Display special characters with cat-a:

Here we see that the last line has no line breaks, the line breaks for the other lines are ^m$, the text file is processed using SED, some content is added to the second line, and then the cat-a is viewed:

Here I use. * Match all the contents of the second line,& all content that matches to, after & I added some content, with cat-a to see that when SED handles the substitution, if it matches the entire row, the match is everything except the newline character $ (the Linux newline character) , even if the newline character of the text is ^m$ (Windows line break).

So after the second line of text is processed by SED, ^m is my regular expression. * Matches as text content, and $ is not matched, always at the end of the line as a newline character, so that ^m and $ are split. As a result, this line of newline characters is processed into the Linux-formatted newline character $. The effect of using Vim opens as follows:

Because the file line breaks are Linux and Windows promiscuous, Vim displays the file in UNIX file format, the ^m of the file is displayed, and the second line of the ^m is matched to the SED, thus not at the end of the row. The reason for the [noeol] below Vim is that the last line does not have a newline character under Windows, so there is no ^m.

It is concluded that SED treats ^m in a file as a file content, so if you use SED to process text files created under Windows, it is likely that annoying ^m will appear after processing. about how other text processors handle Windows line breaks, Further research is needed.

In addition: Windows and Linux line break processing when displaying text with Cat

Or the above file, use Cat-a to view the contents of the file after replacing it with the SED command:

Do not display special characters, using cat to view files, the discovery shows as follows:

found that the second line is not the same as expected, I am clearly at the end of the file with the "Hello" this string, why ran to the beginning of the file, but also overwrite the original characters.

The second line of the original content is this (red is a special character):

My name is Liao^m Hello$

As I said earlier,^m This special character (note that this is a special character instead of two) that means carriage return (Carrige return). When the second line is displayed, cat reads the character from the beginning of the second line and outputs it to the screen, and when it reads the special character of the ^m , the meaning of the special character is understood as the most primitive carriage return, the typewriter age, and the carriage return indicates that the head returns to the beginning of the line The meaning of the newline is to move to the next line, so cat will go back to the beginning of the line and start outputting the characters to the screen, and the following characters will be displayed at the beginning of this line, so that the characters that begin this line are covered. When you print to the end of this line, you read theLinux line break, and then you change the line to start reading and outputting characters. This creates a strange phenomenon when the second line is displayed.

conclusion : In the normal output mode, cat will interpret the ^m character as carriage return, that is, return to the beginning of the current line, the $ character is understood as carriage return and line, that is, to the next line of wardrobe to start output content. This way, if a text is in a business that contains a ^m character, the cat will appear unexpectedly when displaying the file.

For example, I manually create a line of text that adds a carriage return character ^m to the text:

Then display the file with cat:

It can be found that cat returns to the beginning of the line after reading to ^m, so the characters after ^m are output to the beginning of the line, overwriting the original content.

Summary :

      • The file newline character created under Windows is ^m$, but the last line ends without a newline character
      • files created under Linux, each line ends with a newline character, including the last line
      • vim when you open a file, if the file All newline characters are in DOS format ^m$, then vim will automatically display the text file in the DOS file format, otherwise it will display the text in the default UNIX format, which is the symbol that may appear at the end of the line ^m
      • Wc-l is to count the number of rows with a newline character, So files created under Windows use Wc-l to count rows less than
      • a file created under Windows that appears normal under Linux, but some text-processing commands, such as sed processing, may change some of the file's newline characters, causing abnormal display When the
      • sed processes a file, the ^m in the Windows newline character is treated as a file content, that is, SED retains only $ as a newline character at the end of the line, and therefore may cause inconsistent line breaks.
      • Cat Treats ^m as a carriage return command when it does not display the contents of a special character output file, and $ as a carriage return and a newline. So if a file's newline character is mixed with Linux and Windows versions, it shows a ^m at the end of the line at Vim, and Cat appears normal (because cat treats ^m as a special character). However, if there is a special character ^m in the middle of the file, there are some problems with the cat display.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.