At the beginning of the concept is only a little vague, not too concerned about, the results of a search only found that this thing is too interesting, not only has an interesting story, but also has a lot of doorways, but also brought up some of the previous memories, the original previously also dealt with this problem.
1 Basic Concepts
|
|
|
Control characters |
Original meaning |
Line break |
\ n |
NewLine |
LF (line Feed) |
The cursor goes straight down a line (not necessarily at the beginning) |
Carriage return character |
\ r |
Return |
CR (carriage Return) |
Cursor back to the beginning of the bank |
The basic concepts are shown in the table above.
2 Origin
Why would there be two of these things? It has an interesting tale: Before the computer appeared, there was a gadget called a telex typewriter (teletype Model 33) that could play 10 characters per second. But it has a problem, that is, when the line is finished, it will take 0.2 seconds to hit two characters. If there are new characters coming in this 0.2 seconds, then this character will be lost.
So, the developers think of a way to solve this problem, is to add two after each line to end the character. One is called "carriage return", which tells the typewriter to position the printhead at the left border, and the other is called "line break", telling the typewriter to move the paper down one line.
This minute-and-minute legend is not textual, but it is true that the newline character is really used to scroll the moves line, and the carriage return is used to move the printhead to the beginning of the line and to the left edge. If the current position is the 5th row 6th character, then the carriage return is the current position becomes the 5th line 1th character, and the line break is the current position into the 6th line 6th character position, so that when printing on the typewriter, as long as the carriage return without wrapping, you can repeat the printing of characters on the same line.
3 now the situation
Later, the computer invented, these two concepts are also like to the computer. At that time, memory was expensive, and some scientists thought it would be too wasteful to add two characters at the end of each line. So, there was a disagreement:
\n:unix system/mac OS x System End-of-line terminator
\r:mac OS system end of line terminator
\n\r:windows system End-of-line terminator
That is, on a computer, you only need a symbol to indicate that the line has ended, and the cursor moves to the beginning of the line. Most systems use Direct line breaks \ n that is, the way paper rolls around a line, only Windows feels traditionally, using two symbols to indicate that the cursor is moving down a line at the beginning.
And the ENTER key is defined directly with the operating system, if UNIX is only \n,windows is \n\r, in short, to achieve a line-wrapping function. In addition, the newline character here is actually the same as the paragraph mark ^p in Word.
4 here's the problem.
Well, here's the problem, since the Unix system \ n completely indicates a newline carriage return, what does \ r mean? I don't care about that.
In addition, since the Windows system \n\r represents a newline carriage return, then the separate \ n and \ r are still the original meaning of the line and carriage return?
This is a question we can examine.
How to explore, in fact, this also has a problem, did not think before, then the problem of a half a day to understand.
Because I write in the TXT document in the Windows system directly, it will be read as two characters, and there is no escape function, so I am prepared to explore the way is through the program to write to the document character, the written character does not have a natural newline carriage return, all the newline carriage return is the explicit representation, This allows us to write other characters and separate \ R and \n\r to txt documents in the Windows system to see what each of the separate \ n and \ r in the Windows system represents.
Then I went through the experiment and got into a deep thinking about the results, and I finally found out where there was a problem.
The idea of an exploratory scenario seems clear and workable, but it's built on one of our assumptions: when we write a character to a document through a program, we think the program will literally write us to the string ' This is the first line \n\r this is the second line \ n This is a long third line \ r This is line fourth ' First, encode the code into 01 yards, then write the 01 code into a text document, and when we open the text document, the text document decodes and displays the 01 yards with GBK. GBK.
5 experimental results and analysis
But the experimental results tell us that this may not be the case, as the results of my experiment are:
1) for \r,win7.txt not recognized (as if there is no same)
2) for \n,win7.txt recognition, the execution result is: NewLine carriage return
3) for \n\r (and \ r \ n), win7.txt recognition, the execution result is: NewLine carriage return (easy to understand, when \ r does not exist directly implemented \ n)
This is not the same as the idea, then found a possible explanation: in the Windows environment, if you enter a text file, in the code, a ' \ n ' line break in the program is interpreted as ' \ R ', ' \ n ' two characters, conversely, if you read a text file, the file's adjacent two ' \ R ', ' \ n ' will be merged into a ' \ n ' output (this occurs at the time of transcoding).
And at this point I used the program transfer method to do the experiment to see the Python console output and Wxpython of the text box window, is how to identify the program transmission of ' \ R ', ' \ n ' and ' \n\r '. Must declare, including the above results, so that the situation is a process of polishing, the 6th section will have some real reference.
5.2 python Console Output
1) for \ R, recognition, execution result is: Move the cursor to the beginning of the current line, if the next character, the same time delete all the characters, start to display the next character, if the next is \ n, this line of text will not be deleted, the cursor to the beginning of the next line (that is, the character after the first line of the , which is the previous \ n character)
2) for \ n, recognition, execution result is: NewLine carriage return
3) for \n\r, recognition, execution result is: NewLine carriage return (it is easy to understand, when \ n execution, the cursor to enter the line, and then execute the \ r, nothing actually did it)
4) for \ r \ n, recognition, execution result is: line return, is mentioned in the above \ r Next is the case, do not delete characters and then line
5.3 Wxpython text Box window (single-line text box)
1) for \ R, recognition, execution result is: an invisible space appears, and the cursor stops behind this space, you can delete it forward
2) for \ n, recognition, the execution result is: an invisible space appears, and the cursor stops behind this space, you can delete it forward
3) for \n\r (and \ r \ n), recognition, execution result is: There are two invisible spaces, and the cursor stops behind these two spaces, you can remove it forward
5.4 Wxpython text Box window (multi-line text box)
1) for \ R, recognition, execution result is: NewLine carriage return
2) for \ n, recognition, execution result is: NewLine carriage return
3) for \n\r, recognition, execution result is: two newline carriage return
4) for \ r \ n, recognition, execution result is: A newline carriage return (this is a bit strange.) )
6 Real Situation
0) in Windows:
' \ R ' returns to the beginning of the current line without changing to the next line;
' \ n ' to wrap to the next line in the current position without going back to the beginning of the line;
1) files in Unix/mac system open in Windows, all text will become a line
2) files in windows open under Unix/mac, may have a ^m symbol at the end of each line
3) Linux saved files appear in Notepad on windows with black dots
4) under Linux, the command Unix2dos is to convert the Linux file format to the Windows file format, and the command Dos2unix to convert the Windows format to the Linux file format.
5) When using FTP software to transfer files between different platforms, in ASCII text mode transfer mode, some FTP client programs will automatically convert to the line format. The number of file bytes passed through this transfer may vary. If you do not want FTP to modify the original file, you can use bin mode (binary mode) to transfer the text.
6) A program runs on Windows to generate a text file in CR+LF format, while running on Linux generates a text file with the LF format wrapped.
7 other
1 )
In the C language, the carriage return and line break are two concepts, the carriage return refers to the cursor moved from any position in the line to the beginning of the row, wrapping refers to the case of the next line.
The second and later multiple (indefinite) arguments: The following argument is used to tell the computer that the first%d placeholder will output a worthy variable name.
Let's do an experiment: printf ("Hello");
Output result: hellopress any key to continue ...
You will see, there is no return to the line, press any key to continue ... Ran to the last line.
The format for carriage return has a separate format character \ r
For example: printf ("abcde\rf\n"); his output is:
Fbcde
Press any key to continue ...
In fact, the computer first outputs:
Abcde_ (Note "_" represents the position of the cursor)
Then, you encounter the \ r format character:
ABCDE (note at this point, the cursor is below the letter a)
In the then, output F and \ n (carriage return line)
FBCDE (Note that the letter F is now covered with the letter a)
_ (At this point, the cursor is at the beginning of the next line)
Follow, VC + + integrated environment output Press any key to continue ... String.
2 ) soft and hard return
The hard return is ordinary we press the return to produce, it in the line breaks also plays the role which the paragraph separates.
The soft return is generated with Shift + Enter (see this in Word ↓, in the substitution name is called manual newline character ^l), it wraps, but does not change paragraph, that is, the front and back two paragraphs of text in Word belong to the same "paragraph." You will appreciate this when you apply the formatting.
Our usual carriage return is a hard return, which is the small curved arrow that is produced by tapping the ENTER key in Word, which accounts for two bytes. This kind of carriage return can effectively make the paragraph mark out clearly. The text between two hard returns becomes a paragraph that can be individually set to paragraph marks without worrying about other paragraphs being affected. This is the reason why we used to use hard return: typesetting is convenient.
But the hard return also brought us trouble. If you are a web designer, or a forum Ranger, there must be this experience: when you plan to change, the line is really not flattering, line spacing is too big! In fact, this is the same as the hard return principle, but in Word and other text editor does not show its "true". But this kind of typesetting does cause a little difficulty, then we have to ask for a hard return brother: Soft return.
A soft return takes up only one byte and is a downward arrow in Word. If you copy text from a very complex web page to Word, it must not be unfamiliar to you. But it's not that easy to enter a soft return directly in Word. Because soft return is not a true paragraph mark, it is just another line, not a fragment. So it is not very conducive to typography, because it cannot be given a special format as a separate paragraph. But nonetheless, it has a pivotal position in web design.
Soft return allows the line spacing between the front and back lines to be significantly reduced, because it is not a paragraph mark, and the legal paragraph mark-hard return to distinguish. The hard return HTML code is <p>. </p>, the content of the paragraph is sandwiched inside, and the soft return code is very lean:<br>. So in the Web page want to use a soft return, just switch to the code page, type the soft return code.
Let me tell you the conversion of carriage return when different editor text is copied to each other.
As the Earth knows, if the text of the page is copied to Word, the hard return becomes a curved arrow, and the soft return becomes the downward arrow. As a result, friends who are accustomed to using Word to edit text are not accustomed to the uncomfortable situation.
It is also true that text in Word is copied to a Web page. You can say that word is compatible with Web pages, or how do you get a "Save as a Web page" option?
Notepad is also the more you touch the editor. But in recent years, with the development of society and the various drawbacks of Notepad, many people have limbo it. I can only express regret for this, because the function of Notepad itself is not replaced by other editors of the advantages. When you copy the text of the Web page again, you may want to paste it into Notepad and try it. Haha, no matter what the Web designer used to enter, now has become a kind of carriage return! What, you don't believe me? Let's see: The soft return becomes the normal carriage return, and the hard return turns into two normal carriage returns. You copy the text from Notepad to Word, the carriage return in Notepad is all turned into a hard return! You then copy the text from the Notepad to the Web editor, all the carriage return will become a soft return!!
3 to discuss the handling of the shift carriage return when the file is operated.
File manipulation at programming time
WB (binary) or WT (text) can also have an effect. I did an experiment (experimental. net2003)
Enter 12 in the 01.txt file and enter, and the binary that you see in UltraEdit is 0d 0a
Then the program is as follows:
int Mian ()
{FILE *FP1,*FP2,*FP3,*FP4,*FP5,*FP6;
Char a[10];
Char b[10];
FP1 = fopen ("01.txt", "R");
FP3 = fopen ("02.txt", "w");
Fread (a,sizeof (unsigned char), 8,FP1); A is 0a.
Fwrite (a,sizeof (unsigned char), 8,FP3); 02.txt is 0d 0a, the reason is that the input of the case, the newline carriage return converted to a newline, and then output when the line break will be transferred to a newline carriage return
Fclose (FP1);
Fclose (FP3);
FP2 = fopen ("01.txt", "RB");
FP4 = fopen ("03.txt", "WB");
Fread (b,sizeof (unsigned char), 8,FP2); B is 0d 0a.
Fwrite (b,sizeof (unsigned char), 8,FP4); 03.txt is 0d 0a, because the binary case of carriage return and newline (the kind of text-like conversion) does not exist
Fclose (FP2);
Fclose (FP4);
return 0;
}
It seems like the conclusion is this: in the way of reading, in text mode, enter is 0x0a; in binary mode, enter is 0x0d,0x0a.
MSDN found this: Also, in text mode, carriage return–linefeed combinations aretranslated to single linefeeds on input, and Li Nefeed characters aretranslated to carriage return–linefeed combinations on output. (in the case of input, a newline carriage return is converted to a newline, then the output is wrapped and then converted to a newline return) when a Unicode stream-i/o function operates in text mode (the default), the source or des Tination Stream is assumed to be a sequence of multibytecharacters. Therefore, the Unicode stream-input functions convert multibytecharacters to wide characters. For the same reason, the Unicode stream-outputfunctions convert wide characters to multibyte characters.
Open in binary (untranslated) mode; Translations involving Carriage-return Andlinefeed characters is suppressed. (The conversion of a carriage return and a newline in binary case does not exist).
Break up and talk about line breaks and carriage returns