3.6 stringprep--character standard library for Internet domain names

Source: Internet
Author: User
Tags control characters

This library primarily provides access to the UNICODE character set defined by RFC3454. When we need to compare the Internet domain name is the same, we need to compare the host name on the internet is the same, or more precisely to compare the application domain name, such as whether the case is differentiated. Or, you can restrict the characters that fit in the printable character set to make up the domain name.

This library encapsulates the access to the RFC 3454 character table, and if all uses a dictionary or a list to represent it , it is now saved in the form of a UNICODE database, which makes it easy to access, It is therefore necessary to use the tool mkstringprep.py to generate the appropriate files.

This library mainly provides some operation functions, but does not provide the structure to access the data. The functions of all these operations are as follows:

STRINGPREP.IN_TABLE_A1 (Code)

Check if code is in table tablea.1 (Unassigned code points in Unicode 3.2).

STRINGPREP.IN_TABLE_B1 (Code)

Check if code is in table tableb.1 (commonly mapped to nothing).

STRINGPREP.MAP_TABLE_B2 (Code)

Returns the value of code in table tableb.2 (Mapping for case-folding used with NFKC).

STRINGPREP.MAP_TABLE_B3 (Code)

Returns the value of code in table tableb.3 (Mapping for case-folding used with no normalization).

STRINGPREP.IN_TABLE_C11 (Code)

Check if code is in table tablec.1.1 (ASCII space characters).

STRINGPREP.IN_TABLE_C12 (Code)

Check if code is in table tablec.1.2 (non-ascii space characters).

STRINGPREP.IN_TABLE_C11_C12 (Code)

Check if code is in table tablec.1 (Space characters, Union of c.1.1 and c.1.2).

STRINGPREP.IN_TABLE_C21 (Code)

Check if code is in table tablec.2.1 (ASCII control characters).

STRINGPREP.IN_TABLE_C22 (Code)

Check if code is in table tablec.2.2 (Non-ascii control characters).

STRINGPREP.IN_TABLE_C21_C22 (Code)

Check if code is in table tablec.2 (Control characters, Union of c.2.1 and c.2.2).

STRINGPREP.IN_TABLE_C3 (Code)

Check if code is in table tablec.3 (Private use).

STRINGPREP.IN_TABLE_C4 (Code)

Check if code is in table tablec.4 (non-character code points).

STRINGPREP.IN_TABLE_C5 (Code)

Check if code is in table tablec.5 (surrogate codes).

STRINGPREP.IN_TABLE_C6 (Code)

Check if code is in table tablec.6 (inappropriate for plain text).

STRINGPREP.IN_TABLE_C7 (Code)

Check if code is in table tablec.7 (inappropriate for canonical representation).

STRINGPREP.IN_TABLE_C8 (Code)

Check if code is in table tablec.8 (change display properties or is deprecated).

STRINGPREP.IN_TABLE_C9 (Code)

Check if code is in table tablec.9 (Tagging characters).

STRINGPREP.IN_TABLE_D1 (Code)

Check if code is in table Tabled.1 (characters with bidirectional property "R" or "AL").

STRINGPREP.IN_TABLE_D2 (Code)

Check if code is in table Tabled.2 (characters with bidirectional property "L" ).

Example:

def nameprep (label): # Map Newlabel = [] for C in Label:if Stringprep.in_table_b1 (c): # Map to Nothing Continue Newlabel.append (STRINGPREP.MAP_TABLE_B2 (c)) label = U "". Join (Newlabel) # Normali Ze label = unicodedata.normalize ("NFKC", label) # Prohibit for C in Label:if Stringprep.in_table_c12 (c) O             R STRINGPREP.IN_TABLE_C22 (c) or STRINGPREP.IN_TABLE_C3 (c) or STRINGPREP.IN_TABLE_C4 (c) or            STRINGPREP.IN_TABLE_C5 (c) or STRINGPREP.IN_TABLE_C6 (c) or STRINGPREP.IN_TABLE_C7 (c) or Stringprep.in_table_c8 (c) or STRINGPREP.IN_TABLE_C9 (c): Raise Unicodeerror ("Invalid Charac ter%r "% c) # Check bidi RandAL = map (stringprep.in_table_d1, label) for C in Randal:if C: # There is a RandAL char in the string.  Must perform further # Tests: # 1) The characters in sections 5.8 must be prohibited.          # This is table C.8, which were already checked # 2) If A string contains any randalcat character, the            String # must not contain any LCat character.            If Filter (STRINGPREP.IN_TABLE_D2, label): Raise Unicodeerror ("violation of BIDI requirement 2")            # 3) If A string contains any randalcat character, a # Randalcat character must is the first character of the            # string, and a randalcat character must is the last # character of the string. If not randal[0] or not randal[-1]: Raise Unicodeerror ("violation of BIDI Requirement 3") return label


Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

3.6 stringprep--character standard library for Internet domain names

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.