str8="the difference between China and Korea"#a=str8.find ("Python")#Print aB=str8.find ("and the")PrintBword=str8.split (" ")#Python3 and Spark can split the Chinese directly right herePrintWord forIinchWord:#Python 2.x needs this output PrintI
#这是关于编码的问题
# print "-" *50
# Data=str8.decode ("Utf-8"). Encode ("gb2312")
# Print type (data)
# Data2=data.decode ("gb2312")
# Print type (DATA2)
# Print Data2.split (U "")
# Data3=data2.encode ("Utf-8"). Split ("")
# print Data3
# for I in DATA3:
# Print I
Results:
7
[' \xe4\xb8\xad\xe5\x9b\xbd ', ' \xe5\x92\x8c ', ' \xe9\x9f\xa9\xe5\x9b\xbd ', ' \xe7\x9a\x84\xe5\x8c\xba\xe5\x88\xab ']
China
And
Korea
The difference
--------------------------------------------------
<type ' str ' >
<type ' Unicode ' >
[u ' \u4e2d\u56fd ', U ' \u548c ', U ' \u97e9\u56fd ', U ' \u7684\u533a\u522b ']
[' \xe4\xb8\xad\xe5\x9b\xbd ', ' \xe5\x92\x8c ', ' \xe9\x9f\xa9\xe5\x9b\xbd ', ' \xe7\x9a\x84\xe5\x8c\xba\xe5\x88\xab ']
China
And
Korea
The difference
Python Split split Chinese