A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
about the historical evolution of the code, UTF-8 is how to develop, why Windows still keep GBK encoding ...
And so on, online a search a lot of, most of them are forwarded, share after the same content, still can not solve my inner doubts ...
Coding is a matter of egg pain, if not clear, how to mix in China?
I have set a basic world view of coding by looking through multiple documents and in-depth experiments.
"Code One":below with a few simple code snippet, step by step to explain the code in the "compilation" and "Solution" problem!! (Running in Linux)
1 Importsys, Locale2 3s ="Small Armor"4 Print(s)5 Print(type (s))6 Print(Sys.getdefaultencoding ())7 Print(Locale.getdefaultlocale ())8 9With open ("UTF1","W", encoding ="Utf-8") as F:Ten F.write (s) OneWith open ("GBK1","W", encoding ="GBK") as F: A F.write (s) -With open ("Jis1","W", encoding ="Shift-jis") as F: -F.write (s)
The"Code One" run Result:code is simple, the person who has learned Python should be able to understand what it means ~ ~
Let's look at the results of the operation:
1 Small Armor 2 <class'str'>3 utf-84 (' en_US'UTF-8')
Here's what it means to print out the two "Utf-8":As you can imagine, it is to print the "small armor" as it is, and then save the "small armor" to 3 files.
(Shift-jis is in Japanese encoded format)
I don't know if this stupid code is going to happen? (focus on the back)
The above utf-8 means: system default encoding
The following utf-8 means: local default encoding
Now we look at the contents of the three files of UTF1, GBK1, JIS1, respectively:
UTF1: Small Armor GBK1: С??? JIS1:??? B
Why UTF1 content is clear, there is no coding problem, and GBK1, jis1 content has been garbled?
Because my file is stored with an encoded format other than Utf-8, when I read these two files, I used the default encoding "Utf-8" of the Linux operating system.
Then write to disk not with Utf-8, read out but with Utf-8, of course, can't read out.
(Here you need to understand the true role of encoding)
1 #CODING=GBK2 Importsys, Locale3 4s ="Small Armor"5 #CODING=GBK6 Importsys, Locale7 8s ="Small Armor"9 Print(s)Ten Print(type (s)) One Print(Sys.getdefaultencoding ()) A Print(Locale.getdefaultlocale ()) - -With open ("UTF2","W", encoding ="Utf-8") as F: the F.write (s) -With open ("GBK2","W", encoding ="GBK") as F: - F.write (s) -With open ("Jis2","W", encoding ="Shift-jis") as F: +F.write (s)
"Code two" Run results:code knot structure is as simple as
But please note: I added a code statement to my head.
Before the code runs, let's guess the results yourself ~ ~ ~
1 Hao 忕 敳2<class 'Str'>3Utf-84('en_US','UTF-8')5 Traceback (most recent):6File"2", line 15,inch<module>7 F.write (s)8Unicodeencodeerror:'Shift_JIS'Codec can'T encode character'\u704f'In position 0:illegal multibyte sequenceHere's the problem:
1, the code clearly s = "small armour", why became "Hao 忕 敳"??
2. Why has the JIS code failed? (Before at most, there is only garbled problem, not error, then what happened inside it?) ）
3. What does "CODING=GBK" mean?
4, I clearly wrote "CODING=GBK" The code statement, why the system code, local default encoding or not changed? (What's the use of my writing?) )
1, it means that the Python3 compiler when reading the. py file, what format should I use to "decode" it? It is only related to reading, so when you are sure what format encoding you use for your code editing, you can write the corresponding encoding format to the header file.
(in this model code, I use the Linux default encoding editor, that is, Utf-8, then run in the back, but asked to decode with GBK, nature is too much, there will be s= "small armor" garbled problem)
(we must know that the code is "compilation" and "solution" of the two steps, must be one-to-do to correct decoding! Although we are usually called "coding format", this is somewhat misleading.
In fact the other half is the "decoding format", to consciously distinguish between "compilation" and "solution", we can not like some articles on the internet to confuse the two!! )
2, according to the above explanation should be able to understand that after writing it, and will not change the local, system default encoding.
(The local default encoding is only relevant to the operating system, and Linux is GBK in Utf-8,windows.) ）
(The system default encoding is actually the difference between Python3 and Python2, Python3 is Utf-8,python2 is ASCII.) ）
3. What are the functions of the above two codes?
Knock on the blackboard, draw the key:
system default encodingMeans:
When the Python3 compiler reads a. py file, if there is no header file encoding declaration, the. py file is decoded by default using "Utf-8". And when calling the Encode () function, the default is "Utf-8" if the argument is not passed. (This has to do with the "encoding" parameter in the Open () function below to make a distinction, very misleading!!! ）
Local default encodingMeans:
When you write a python3 program, if you use the open () function, without giving it an incoming "Encoding"This parameter, the local default encoding is used automatically. Yes, if you're on a Windows system, it's the default GBK Format！！！
(This problem has troubled me for a long time, do not say that has been the default utf-8 to everlasting, I changed to win after the frequent breach of faith. So please pay attention here: Linux can not pass "encoding" parameters, and win can not forget ~ ~ ~)
4, again to answer the question of error:
Because our compiler has already used GBK to decode this. py file, so the read out of the variable s has become the "Hao 忕 敳" We see now! So at this time to save S to the disk file, in fact, it is garbled after the "Hao 忕 敳". And in Japanese, there is no such 3 words, so natural feedback said "in the position of the position 0, the code failed."
Now let's look at the contents of the three files for UTF2, GBK2, Jis2, respectively:
utf2 : 灏忕敳gbk2 : 小甲jis2 :
(Is it the same as the result you imagined?? Hey hehe ~ ~)Problem:
Explain:1, why I use "utf-8" to encode the storage, and later with the Linux default "Utf-8" to decode, but there is garbled?
2, why I use "GBK" to encode storage, followed by the Linux default "Utf-8" to decode, obviously encoding, decoding format inconsistent, but can display normally?
1, the actual above two problems is the same problem, I believe that the careful classmate already know the problem is where, I have already said very clearly. At this time the variable s has become "Hao 忕 敳", then utf2 this text file is naturally displayed "Hao 忕 敳".
2, and "Hao 忕 敳" This three characters is how to come?
第1步： 小甲（unicode） ---用 "utf-8" 编码---> e5b0 8fe7 94b2 (utf-8编码后的二进制代码)第2步： e5b0 8fe7 94b2 ---用 “gbk” 解码---> " 灏忕敳 " （unicode）(乱码)第3步： “ 灏忕敳 ” --- 用 “ gbk ” 编码---> e5b0 8fe7 94b2 ( 第2步的逆向)第4步： e5b0 8fe7 94b2 ---用 “ utf-8 ” 解码--->
"Code three":I think the above steps are clear enough ~
3rd, 4 step is the reverse push back, it becomes the normal "small armor"
Read the "coding" and "decoding" the process, your coding problem has been solved more than half!
#Coding=shift-jisImportsys, locales="Small Armor"Print(s)Print(type (s))Print(Sys.getdefaultencoding ())Print(Locale.getdefaultlocale (),"\ n") A= S.encode ("Shift-jis")Print(a)Print(Type (a)) b= A.decode ("Utf-8")Print(b)Print(type (b))Print(A.decode ("GBK") ) with open ("UTF3","W", encoding ="Utf-8") as F:f.write (s) with open ("GBK3","W", encoding ="GBK") as F:f.write (s) with open ("Jis3","W", encoding ="Shift-jis") as F:f.write (s)#Python Learning Group 548377875
The"Code three" Run results:overall structure of the code is still the same, but the middle of a little extra code, easy to explain ~
Ranae Redundant Tsukinuke<class 'Str'>UTF-8('en_US','UTF-8') b'\xe5\xb0\x8f\xe7\x94\xb2'<class 'bytes'>Small Armor<class 'Str'>Hao 忕 敳
as you can see here, our variable s has become a "ranae redundant Tsukinuke" (another garbled encoding caused by JIS decoding).
So at this time, I put "ranae redundant Tsukinuke" with "Shift-jis" decoding back and assigned to the variable A, print, you can see A is the normal display of "small armor", which also proves that my above inference is absolutely correct!!
Now we are still looking at the contents of the three files of Utf3, GBK3, JIS3, respectively:
UTF3: Ranae redundant Tsukinuke gbk3:??? IJIS3: Small Armor
(Oops~~ heck, it's such a mess again.)
(Thank you for the "King of Exile")
至此，代码的示范部分就结束了~~ 码字码得我手酸 ~~~~(>_<)~~~~
1, all the file encoding format is determined by the editor you use now!! Text that is edited in Windows is sometimes garbled and sometimes normal when the browser resolves the display, because many text editors in Windows use the same encoding format as the operating system by default.
So before the text is stored, be sure to figure out whether we're using utf-8 or gbk!!!.
And when you use the Python open () function, it is the memory of the process interacting with the disk, and the encoding format in this interactive process is using the operating system's default encoding (Linux is utf-8,windows to GBK)
2, believe that the students learn Python often hear, python3 the default code is utf-8. And sometimes, some people say that the default encoding of Python3 is Unicode, then will someone with my beginner when the same silly points not clear the relationship between the two?
Therefore, there is no ambiguity between the above two statements, the process in memory is the expression of "Unicode" encoding, when the Python3 compiler reads a. py file on the disk, is the default "Utf-8", when the process appears in the open (), write () such as the storage code, The default encoding for the operating system is used by default when storage interaction with the disk is required.I don't know how to become a "gorgeous" Split line ~ ~ ~
Company Python big man summed up to the new code principle, read thoroughly understand the Python coding principle
Start building with 50+ products and up to 12 months usage for Elastic Compute Service