Definitions of line breaks in different systems add read (), ReadLine (), ReadLines () and precautions

Source: Internet
Author: User
Tags readline in python

Today, when I learned to read a file, I found that the open file was displayed in two lines for the size of the default box, remembering how many words identifier a line in Python readlines () function. So I looked up some information to understand that different systems with different characters as line breaks, in the Windows system with the ' \ n ' (enter) as the symbol for the line, the following article is interesting, also very detailed, attached

Source: http://www.cnblogs.com/utank/p/4347059.html

difference between carriage return and line feed

First introduce the origins and differences between the two concepts of "enter" (carriage return, ' \ R ') and ' line feed, ' \ n '. Before the computer appeared, there was a device called the teletype Model 33, which could play 10 characters per second. But it has a problem, that is, after a line of line change, to use 0.2 seconds, just can hit two characters. If there are new characters coming in this 0.2 seconds, the character will be lost. So the developers figured out a way to solve the problem by adding two characters to the end after each line. One is called "carriage return", which tells the typewriter to position the print head at the left edge, and the other is called "line-wrapping", telling the typewriter to move the paper down one line. This is the "line" and "return" of the history, from their English name can also be seen in one or two.

Later, the computer was invented, and the two concepts were also on the computer. At the time, memory was expensive, and some scientists thought it would be a waste to add two characters to the end of each line. As a result, there is a divergence: Unix system, the end of each line only "< line >", that is, "\ n"; In the Windows system, each line at the end of the "< return >< line-break >", that is, "\ r \ n"; >, that is, "\ r".

A direct consequence is that all text becomes a row when the files under the UNIX/MAC system are open in Windows, and Windows files open under Unix/mac, and a ^m symbol may be shown at the end of each line. Some common escape characters are shown in the following illustration:

Note that in the Windows system, the return key is used as a combination of \ r \ n, when we enter the return from the keyboard, the Windows system will take the ENTER key as \ r \ n to handle, the UNIX system will only be treated as \ n, no matter what system, can be used \ N as a line to enter the end of the tag, but in programming we need to note that in the Windows system we will read the \ r This character, we have to separate the \ R and the normal input characters.

Windows and UNIX file formats are different, and the problem is generally the problem with \ r \ n. The carriage return (CR) and linefeed (LF) characters are used to represent the next line. And the standard doesn't specify which one to use. There are three different uses: Windows uses a carriage return + newline (CR+LG) to indicate that the next line (that is, the so-called PC format) UNIX uses a newline character (LF) to indicate that the next line of Mac uses a carriage return (CR) to represent the next line

When transferring files between different systems, you need to involve formatting conversions.

Transformation between two file formats:

1, Unix-> Windows: ' \ n '-> ' \ r \ n '

while (ch = fgetc (in))!= EOF)

{

if (ch = = ' \ n ')

Putchar (' \ R ');

Putchar (CH);

}

Just add a ' \ R ' character before the ' \ n ' that appears in the Unix file.

2, Unix <- Windows:' \ n ' <-' \ r \ n '
The situation from Windows to Unix is complex, and you can't just remove the ' \ R ' from the file. Because a carriage return symbol is sometimes embedded at the end of a line of text in a Windows file, this occurs in the hit printer. So before you convert, determine if ' \ r ' and ' \ n ' appear at the same time. If it appears at the same time, remove ' \ r ' If it does not appear at the same time, keep ' \ n '.

Cr_flag = 0; /* No CR encountered yet * *

while (ch = fgetc (in))!= EOF)

{

if (cr_flag && ch!= ' \ n ') {

/* This CR did not preceed LF * *

Putchar (' \ R ');

}

if (!) ( Cr_flag = (ch = = ' \ r '))

Putchar (CH);

-------------------------------------------------Split Line----------------------------------------------------


Also, you should pay attention to the difference between read (), ReadLine () and ReadLines () function in use.

The contents of the document are



In order to verify the above line break, the third line of GG is not followed by a newline key (the fourth row of the contents of the third line)

Read (): whole text as a string (only one element)


ReadLine (): reads the first line of characters (only one element) of a document



ReadLines (): Each line of the document as a string element, how many rows there are many elements.



Notice that my code will reread the file each time, otherwise read on the basis of read above, the test case (may be related to the input of the function, directly change the input value, the input is a pointer character)



Let's turn to the use of these three functions http://blog.csdn.net/werm520/article/details/6898473

When we talk about "text processing", we usually refer to what we are dealing with. Python is easy to read the contents of a text file into a string variable that can be manipulated. The file object provides three read methods:. Read (),. ReadLine (), and. ReadLines (). Each method can accept a variable to limit the amount of data read at a time, but they usually do not use a variable. read () reads the entire file at a time, and is typically used to place the contents of the file in a string variable. However, the. Read () generates the most direct string representation of a file's content, but it is unnecessary for sequential row-oriented processing, and is not possible if the file is larger than available memory.

. ReadLine () and. ReadLines () are very similar. They are all used in structures similar to the following:

Python. ReadLines () example

FH = open (' C:\\autoexec.bat ') for line in Fh.readlines (): Print Line.readline () and. ReadLine The difference between S () is that the latter reads the entire file at once, like. Read (). ReadLines () automatically analyzes the contents of a file into a list of rows, which can be made up of Python's for ... Structure for processing. On the other hand,. ReadLine () reads only one row at a time, usually much slower than. ReadLines (). You should use the. ReadLine () only if there is not enough memory to read the entire file at once.

Write:

WriteLine () is the output after the line, the next write will be written on the next line. Write () is the output after the cursor at the end of line will not wrap, next write will be followed by this line



[python] view plain copy print?   With ReadLine output, this footprint is relatively small for larger files.       #coding: Utf-8 f = open (' Poem.txt ', ' r ') result = List () for line in open (' Poem.txt '): line = F.readline () Print line result.append [line] Print result F.close () Open (' Result-readline.txt ', ' W '). WRI Te ('%s '% \ n '. Join (Result))



[Python]  View plain  copy  print? #coding:utf-8   ' ' cdays-4-exercise-6.py  file basic operations        @note:  File read Write,  list sort,  string operations        @see:  string Each method can refer to HEKP (str) or Python online documentation http:// docs.python.org/lib/string-methods.html  ' '       f = open (' Cdays-4-test.txt ',   ' R ')                      #以读方式打开文件    result = list ()    for line in  F.readlines ():                            #依次读取每行        line  = line.strip ()                                #去Off the end of each wardrobe blank        if not len (line)  or line.startswith (' # '):         #判断是否是空行或注释行             continue                                       #是的话, skip non-processing        result.append (line)                                 #保存    result.sort ()                                           #排序结果

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.