Today want to write a program to merge files, have always felt that python encoding decoding is annoying, as long as the file merge and so on are written in C # , but the recent use of Linux, there is no vs, can only obediently use python to write, in the morning to see the next, there is no responsibility I think, can only say that before too that what .... ok, Gossip less, The following is a brief introduction to the file read Operation.
First of all, I use the python2.7,python read the file content mainly has the following several commonly used methods: first, a test, you can clearly understand how each method is exactly what it looks like.
The contents of the file are as follows
One: Fopen.read (size)
The parameter size refers to the number of reads, if omitted, that represents the reading of the entire file content
1 # Coding=utf-8 2 fopen=open ('train2.txt','r')3 text=fopen.read ()4print(text)# reads everything by default
Displays the entire text content, as Follows:
two kinds: Fopen. ReadLine () reads the contents of a file line
three kinds: Fopen.readlines () Read all the rows to List inside , [line1,line2,... linen] This is a common way to avoid loading all files into memory to improve operational efficiency
Fopen=open ('train2.txt','r') lines=[]lines =fopen.readlines () for in lines: print(line)
Read the file almost like this, below we write the file, This involves coding problem,python write file is fout.write (str), str is a string.
Writes to Chinese characters are mostly error-prone because of coding problems.
pythe file Defaults toASCIICoding , so when you display chinese, you will be prompted by T syntaxerror:non-ascii character and the Like., first add to the code front:
# Coding=utf-8
Unicode is a built-in function, the second parameter indicates the encoding format of the source String.
Decode is the method that any string has, converting the string to a Unicode format, The parameter indicates the encoding format of the source String.
< Span style= "font-family:arial" > encode The is also a method of any string that converts the string to the format specified by the Parameter.
with u ' kanji ' constructed with Unicode type, constructed as str type
from Unicode turn str , to use < Span>encode method
from Str Turn Unicode , to use Decode
So when writing in chinese, the code is as Follows:
#Coding=utf-8Fopen=open ('Train2.txt','R') Fout=open ('2.txt','W') Lines=[]lines=Fopen.readlines () forLineinchLines#read in GBK encoding (of course, read GBK encoded Text) and ignore the wrong encoding, converted to UTF-8 encoded outputLine.decode ('GBK','Ignore'). Encode ('Utf-8') fout.write (line) fout.write ('\ n') #这是换行
well, read and write files are almost enough, the following is a simple regular, this is placed in the next chapter
Read and simple regular use of files in Python (i)