Previously there was a way to convert the document format using PowerShell, but because PowerShell support was not so full, it didn't work. Here, use Python to do one more.
Ideas
Detect source format, if not UTF8, then convert, otherwise skip
Code
Import ChardetImport SYSImport Codecsdef findencoding(s): file = Open(s, Mode=' RB ') buf = file.Read() result = Chardet.Detect(buf) file.Close() return result[' encoding ']def convertencoding(s): encoding = findencoding(s) if encoding != ' Utf-8 ' and encoding != ' ASCII ': Print(the Convert%s %sTo Utf-8 " % (s, encoding)) Contents = "' with Codecs.Open(s, "R", encoding) as sourcefile: Contents = sourcefile.Read() with Codecs.Open(s, "W", "Utf-8") as TargetFile: TargetFile.Write(Contents) Else: Print("%sencoding is%s, there is no need to convert " % (s, encoding))if __name__ == "__main__": if Len(SYS.argv) != 2: Print("Error filename") Else: convertencoding(SYS.argv[1])
The actual test can be successfully converted.
Knowledge points
- Chardet, this module is used to detect the encoding format. Returns a dict type after detection is complete. Dict Key and two, one is encode, one is confidence, the parameter function as the name implies.
- With as this syntax is very useful, especially when you open the file, you can handle forgetting to close the file so that the file has been occupied and other exceptions.
Batch Conversion
Import ChardetImport SYSImport CodecsImport OSdef findencoding(s): file = Open(s, Mode=' RB ') buf = file.Read() result = Chardet.Detect(buf) file.Close() return result[' encoding ']def convertencoding(s): if OS.Access(s,OS.W_OK): encoding = findencoding(s) if encoding != ' Utf-8 ' and encoding != ' ASCII ': Print(the Convert%s %sTo Utf-8 " % (s, encoding)) Contents = "' with Codecs.Open(s, "R", encoding) as sourcefile: Contents = sourcefile.Read() with Codecs.Open(s, "W", "Utf-8") as TargetFile: TargetFile.Write(Contents) Else: Print("%sencoding is%s, there is no need to convert " % (s, encoding)) Else: Print("%sRead Only " %s)def Getallfile(Path, suffix='. '): "Recursive is enable" F = OS.Walk(Path) Fpath = [] for Root, dir, fname inch F: for name inch fname: if name.EndsWith(suffix): Fpath.Append(OS.Path.Join(Root, name)) return Fpathdef ConvertAll(Path): fclist = Getallfile(Path, ". C") fhlist = Getallfile(Path, ". h") flist = fclist + fhlist for fname inch flist: convertencoding(fname)if __name__ == "__main__": Path = "' if Len(SYS.argv) == 1: Path = OS.GETCWD() elif Len(SYS.argv) == 2: Path = SYS.argv[1] Else: Print("Error parameter") Exit() ConvertAll(Path)
You can specify a directory, or you can use it in the current directory and recursively traverse it.
Knowledge points
- Os.walk, traverse all Files
- os.access, checking file properties
Converting encoded formats using Python