Just came to this company, familiar with the environment, boss began to let me do a migration, modify the code work, I want to say is that this kind of work is really boring ~ ~, look at other people's code, to change the code, here to change a variable, where to change the filename, there are no technical content, very cumbersome things, But by migrating code, you know the environment. So much to talk about today's topic-code format changes, for some reason, need to move the code from a computer room to B room, the two can not access each other, but the historical reasons cause a computer room code is all UTF8 coding, B room requirements is GBK code, see how this solves.
Coding problems
Let's start by saying why there are coding problems, take the example above, b computer room This database is all GBK encoded, so the data taken from the database are GBK, from the database is taken out of the data is GBK code, to show the time is not garbled, in the database is not taken out of the case of data conversion, When you need to send the header to set the encoding to GBK, the output of the file (HTML, TPL, etc.) must be GBK, and see the following diagram will be clearer:
DB (GBK) => PHP (encoding format is not limited but if there are Chinese characters in the code file, the file will be GBK encoded or converted to GBK when the Chinese character is exported) => header (GBK) => html, TPL (GBK)
Or there is a way to only when the library in the code in the UTF8 into the GBK, in general utf8 or more popular point, less problem
DB (GBK) => php (UTF8, and convert data removed from database to UTF8) => header (UTF8) => html, TPL (UTF8)
As long as the above two kinds of standard coding format, there will be no garbled situation, at least I test the first way is no problem, so I guess the second is OK, OK, now to write a conversion file encoding format of the small script:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 This is the |
; > #!/usr/bin/python #-*-coding:utf-8-*-#Filename: changeencode.py import os import sys DEF changeencode (FILE,FR Omencode,toencode): Try:f=open (file) s=f.read () F.close () U=s.decode (Fromencode) S=u.encode (toencode) f=open (file, "W "); F.write (s) return 0; except:return-1; def do (Dirname,fromencode,toencode): For Root,dirs,files in Os.walk (dirname): For _file in Files: _file=os.path.jo In (Root,_file) if (Changeencode (_file,fromencode,toencode)!=0): Print [conversion failed:] "+_file else:print" [Success:] "+_file def checkparam (dirname,fromencode,toencode): encode=["UTF-8", "GBK", "GBK", "Utf-8"] if (not fromencode in encode or not Toencode in encode): Return 2 if (Fromencode==toencode): Return 3 if (not Os.path.isdir (dirname)): Return 1 return 0 &NBSp If __name__== "__main__": error={1: "First parameter is not a valid folder", 3: "Source code and target encoding are the same", 2: "The encoding you want to convert is no longer within range: UTF-8,GBK"} dirname=sys.argv[ 1] fromencode=sys.argv[2] toencode=sys.argv[3] Ret=checkparam (Dirname,fromencode,toencode) if (ret!=0): Print error[ RET] Else:do (dirname,fromencode,toencode) |
The script is simple, and it's easy to use.
The code is as follows:
./changeencode.py target_dir fromencode Toencode
Here's a note of the relationships of several common encodings:
The US-ASCII encoding is a subset of the UTF-8 encoding, which is obtained from the StackOverflow, which is the following: ASCII is a subset of UTF-8, so all ASCII files are already UTF-8 encoded ,
I tried it. Yes, when the characters are not added, the display code is US-ASCII, after adding the Chinese characters, it becomes utf-8.
There is the ASNI encoding format, which means that the local encoding format, for example, in the simplified Chinese operating system, ASNI encoding represents the GBK code, this also need to pay attention to
The other thing is that a command to view the file encoding format under Linux is:
File-i *
You can see the encoding format of the file.
Of course, some of the above may have special characters in the file, the processing will fail, but the general program file is no problem.