In the process of processing text in Python, there is often a case of text character set conversion,
And we want to use a method, do not care about the text of the original character set is what, directly to the desired character set on it.
Method One:
import chardet def convert_encoding(data,new_coding=‘UTF-8‘): # 任意字符集转换 encoding = chardet.detect(data)[‘encoding‘] if new_coding.upper() != encoding.upper(): data = data.decode(encoding,data).encode(new_coding)
Method Two:
import icu def convert_encoding2(data,new_coding=‘UTF-8‘): encoding = icu.CharsetDetector(data).detect().getName() # encoding = chardet.detect(content)[‘encoding‘] if new_coding.upper() != encoding.upper(): # data = data.decode(encoding,data).encode(new_coding) data = unicode(data,coding).encode(new_coding)
Method Three:
import cchardetdef convert_encoding3(data,new_coding=‘UTF-8‘): encoding = cchardet.detect(data)[‘encoding‘] if new_coding.upper() != encoding.upper(): data = data.decode(encoding,data).encode(new_coding)
How to use:
Use method one here
#转换成utf-8convert_encoding(data,‘utf-8‘)#转抱成GBK convert_encoding(data,‘gbk‘)#转抱成GB2312convert_encoding(data,‘gbk‘)
Python arbitrary character set conversion