Problem: In the transmission when the other party does not know what format to pass to the user, came to each other forget what format:
At the time of encoding: UTF8: is 3 characters, GBK is 2 characters, each character number corresponding character, a 2 binary string, if 10 bytes are all Chinese, can never be utf-8, but also save into Chinese, it can only be GBK, third-party toolbox, Automatically detects what encoding this text is.
Question: What if you don't know what format encoding you're going to be working on?
# If you want to use Chardet, you need to install a third-party toolkit Chardet Import chardetf=open ('log', mode='rb') Data= F.read () f.close () result=chardet.detect (open ('log', mode=' RB'). Read ())print(Result)
2. Please install Chardet as follows,
The above files are created and written by themselves (can be written in GBK format, or other format, using the above code to detect)
The result of execution is: a dictionary; the specific results can be viewed by itself;
3. For our top file if it will show GB2312 encoding, we can convert it to the corresponding encoding format; Unicode
Data.decode ("gb2312")
Four: Now is the file content is relatively small, if the file content is relatively large, what do we do?
Python file Operation two