Have you ever wondered why a carriage return is called a carriage return? Which car does the carriage return come from? Do you know the difference between carriage return and line feed?
Forward
Let's start with a story. The following code is the code written by a colleague to process a simple file and output it to another file. Do you think there is a problem?
String content; using (streamreader sr = new streamreader ("a.txt") {content = sr. readtoend ();} string [] rows = content. split ('\ n'); string result = string. empty; using (streamwriter Sw = new streamwriter ("B .txt") {foreach (VAR row in rows) {// process row SW. writeline (ROW );}}
B .txt has an empty line in the middle of each line in a certain editor, but it is clearly written once in the Code. How can empty lines appear?
The problem iscontent.Split(‘\n‘);
This is because the line at the end of the file in Windows contains two characters\r\n
Instead of simply\n
. Open a file with notepad ++ and you can see it at a Glance:
Carriage Return and line feed past and present
Back to the first question, the linefeed is still understandable, but the carriage return is a strange name. What about the two special characters required for changing the behavior?
I remember seeing an interview with Zhang moment, vice president of guangsu Anzhen Venture Capital (former Google employee) a few days ago:
Google believes that computer science is completely based on rational design and is different from basic disciplines such as physics and chemistry. The principle behind it is understandable to humans, in order to learn to innovate. For example, you need to know why TCP/IP is designed like this, rather than just what it is. In Google's view, as long as people have a good foundation, they can do well in their environment, and the environment has changed, the previous experience may not be useful. Only two of Google's earliest employees came from computer science.
Some design in the computer often exists due to specific reasons or historical problems. Why does this strange carriage return exist?
Start with typewriter
When people use the first typewriter, they need to do two operations when they need to wrap the line at the end of a line. The first is to pull the carriage paper back to the beginning of the line, and then pull the line feed to move the paper down a line. This design affects the later design of the telex printer, and the design of the telex printer indirectly affects part of the design in the first computer system (because the first computer needs to be compatible with the telex printer ).
The device of the printer is called carriage, so the operation at the beginning of the line is called carriage return. After translating it into Chinese, it becomes a carriage return. The car here is actually a device on the printer. The typewriter later merged the two operations into one operating device.
ASCII code design
Everyone knows\r
And\n
It is included in the ASCII code. ASCII is designed at the same time by ISO and ASA (the predecessor of asni) and is supported in the standard draft of ISO.CR
+LF
OrLF
As a new line Identifier, while the Standard Draft of ASA supportsCR
+LF
.
CR
+LF
The reason for simultaneous use is to be compatible with the current telex printer. Like the old-fashioned printer, the telex printer requires two commands to complete a line feed. Therefore, this was used by many systems later.CR
+LF
As a New Line identifier.
Chaotic status quo
Although many systems use this Convention, many other systems use different line breaks.
Windows:CR
+LF
UNIX and Unix-like systems (Linux, OS X ):LF
Boss Ben
MacOS:CR
Most text-related Internet protocols (HTTP, FTP, IRC, SMTP) use ASCII codes.CR
+LF
Line Break.
This leads to a problem. If a file is directly copied from one system to another, the linefeed must be converted to use it correctly.