This library primarily provides access to the UNICODE character set defined by RFC3454. When we need to compare the Internet domain name is the same, we need to compare the host name on the internet is the same, or more precisely to compare the application domain name, such as whether the case is differentiated. Or, you can restrict the characters that fit in the printable character set to make up the domain name.
This library encapsulates the access to the RFC 3454 character table, and if all uses a dictionary or a list to represent it , it is now saved in the form of a UNICODE database, which makes it easy to access, It is therefore necessary to use the tool mkstringprep.py to generate the appropriate files.
This library mainly provides some operation functions, but does not provide the structure to access the data. The functions of all these operations are as follows:
STRINGPREP.IN_TABLE_A1 (Code)
Check if code is in table tablea.1 (Unassigned code points in Unicode 3.2).
STRINGPREP.IN_TABLE_B1 (Code)
Check if code is in table tableb.1 (commonly mapped to nothing).
STRINGPREP.MAP_TABLE_B2 (Code)
Returns the value of code in table tableb.2 (Mapping for case-folding used with NFKC).
STRINGPREP.MAP_TABLE_B3 (Code)
Returns the value of code in table tableb.3 (Mapping for case-folding used with no normalization).
STRINGPREP.IN_TABLE_C11 (Code)
Check if code is in table tablec.1.1 (ASCII space characters).
STRINGPREP.IN_TABLE_C12 (Code)
Check if code is in table tablec.1.2 (non-ascii space characters).
STRINGPREP.IN_TABLE_C11_C12 (Code)
Check if code is in table tablec.1 (Space characters, Union of c.1.1 and c.1.2).
STRINGPREP.IN_TABLE_C21 (Code)
Check if code is in table tablec.2.1 (ASCII control characters).
STRINGPREP.IN_TABLE_C22 (Code)
Check if code is in table tablec.2.2 (Non-ascii control characters).
STRINGPREP.IN_TABLE_C21_C22 (Code)
Check if code is in table tablec.2 (Control characters, Union of c.2.1 and c.2.2).
STRINGPREP.IN_TABLE_C3 (Code)
Check if code is in table tablec.3 (Private use).
STRINGPREP.IN_TABLE_C4 (Code)
Check if code is in table tablec.4 (non-character code points).
STRINGPREP.IN_TABLE_C5 (Code)
Check if code is in table tablec.5 (surrogate codes).
STRINGPREP.IN_TABLE_C6 (Code)
Check if code is in table tablec.6 (inappropriate for plain text).
STRINGPREP.IN_TABLE_C7 (Code)
Check if code is in table tablec.7 (inappropriate for canonical representation).
STRINGPREP.IN_TABLE_C8 (Code)
Check if code is in table tablec.8 (change display properties or is deprecated).
STRINGPREP.IN_TABLE_C9 (Code)
Check if code is in table tablec.9 (Tagging characters).
STRINGPREP.IN_TABLE_D1 (Code)
Check if code is in table Tabled.1 (characters with bidirectional property "R" or "AL").
STRINGPREP.IN_TABLE_D2 (Code)
Check if code is in table Tabled.2 (characters with bidirectional property "L" ).
Example:
def nameprep (label): # Map Newlabel = [] for C in Label:if Stringprep.in_table_b1 (c): # Map to Nothing Continue Newlabel.append (STRINGPREP.MAP_TABLE_B2 (c)) label = U "". Join (Newlabel) # Normali Ze label = unicodedata.normalize ("NFKC", label) # Prohibit for C in Label:if Stringprep.in_table_c12 (c) O R STRINGPREP.IN_TABLE_C22 (c) or STRINGPREP.IN_TABLE_C3 (c) or STRINGPREP.IN_TABLE_C4 (c) or STRINGPREP.IN_TABLE_C5 (c) or STRINGPREP.IN_TABLE_C6 (c) or STRINGPREP.IN_TABLE_C7 (c) or Stringprep.in_table_c8 (c) or STRINGPREP.IN_TABLE_C9 (c): Raise Unicodeerror ("Invalid Charac ter%r "% c) # Check bidi RandAL = map (stringprep.in_table_d1, label) for C in Randal:if C: # There is a RandAL char in the string. Must perform further # Tests: # 1) The characters in sections 5.8 must be prohibited. # This is table C.8, which were already checked # 2) If A string contains any randalcat character, the String # must not contain any LCat character. If Filter (STRINGPREP.IN_TABLE_D2, label): Raise Unicodeerror ("violation of BIDI requirement 2") # 3) If A string contains any randalcat character, a # Randalcat character must is the first character of the # string, and a randalcat character must is the last # character of the string. If not randal[0] or not randal[-1]: Raise Unicodeerror ("violation of BIDI Requirement 3") return label
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
3.6 stringprep--character standard library for Internet domain names