In data mining, the format of the original file is often maddening, and an important step is to organize the format of the data file.
Recently, the project to take over, provided the data file format is simply invisible, using pandas can not open, has been an IO error. Look closely and find that many rows of data in a file are "ended, but other lines are missing, so the need is obvious: to determine if there is a end to each line", no, Plus it's good.
A flashback is a good way, after all, what many people need is a quick solution, not a why. The solution is as follows:
1b = Open ('B_file.txt', W)2With open ('A_file.txt','R') as lines:3 forLineinchlines:4line =Line.strip ()5 if notLine.endswith (R'"'):6Line + = R'"'7Line + ='\ n'8 B.write (line)9 Ten b.close () OneA.close ()
The key to the whole process is
line = Line.strip ()
Before I was lazy, the direct use omitted the above line, the result in the judgment condition planted a somersault, the program thinks each line is not with "end:
if not Line.endswith (R'"')
Bite the bullet and try it, rewrite:
for line in Open (Data_path+ " heheda.txt , " r '
At this point the condition is if not line[-2] = = R ' ", in order to get the correct result except for the last line. As a well-known reason, in a Windows system, the carriage return of a file is "\ r \ n", so that when no strip () handles the carriage return, it is necessary to manually move one byte at the end of each line to determine the end of each line. And for the last line of the file, the general situation is not a carriage return as the end, after all, do not change the line. Thus Line[-2] is positioned in the middle of the last Chinese character, will \xx\xx, hard written \xx "\xx, so that the last word display error.
Python contains characters that are written and read at the end of each line plus a specific character