Unicode range and Presentation language
Unicode is a universal character set that contains 65,535 characters. The computer stores Unicode as an encoding when it handles special characters (all characters except the ASCII table). Of course, Unicode unification has taken a lot of effort, and there are some incompatibilities with different encodings today, but the usual code is enough to understand some of the basics.
The Unicode character represents the range reference context for the language:
Http://www.cnblogs.com/chenwenbiao/archive/2011/08/17/2142718.html
The range of Chinese (including Japanese and Korean):
Python generates all Unicode
#-*-coding:utf-8-*-defPrint_unicode (Start, end): With open ('Unicode_set.txt','W') as F:start=Start CT=0 whileStart <=End:Try: Ustr= Hex (Start) [2:] od= (4-len (USTR)) *'0'+ USTR#pre-complement 0USTR ='\u'+OD index= Start-start + 1f.write (str (index)+'\ t'+'0x'+ OD +'\ t'+ Ustr.decode ('Unicode-escape'). Encode ('Utf-8','Ignore')) F.write ('\ r \ n') Start= Start + 1exceptException, E:Printe Start+ = 1PrintStart#Print_unicode (0x4e00, 0X9FBF)Print_unicode (0x0, 0X9FBF)
Generate results
Chinese
You can see that some of them cannot be displayed.
Unicode ranges and methods for generating all Unicode in Python