標籤:milang python 網域名稱 互連網 unicode
本庫主要提供訪問RFC3454定義的UNICODE字元集。當我們需要比較互連網的網域名稱是否相同時就需要比較互連網上主機名稱是否相同,更確切地說就是比較應用程式的網域名稱稱,比如是否大小寫有區分。又或者限制在可列印的字元集合的字元組成網域名稱。
本庫封裝對RFC 3454的字元表訪問,如果全部使用字典或者列表來表示會比較大,目前使用像UNICODE資料庫的形式來儲存,這樣就可以很方便訪問,因此需要使用工具mkstringprep.py來產生相應的檔案。
本庫主要提供一些操作函數,而沒有提供結構來訪問資料的。所有這些操作的函數如下:
stringprep.in_table_a1(code)
檢查code是否在表tableA.1(Unassigned code points in Unicode 3.2).
stringprep.in_table_b1(code)
檢查code是否在表tableB.1 (Commonly mapped to nothing).
stringprep.map_table_b2(code)
返回code在表tableB.2的值 (Mapping for case-folding used with NFKC).
stringprep.map_table_b3(code)
返回code在表tableB.3的值 (Mapping for case-folding used with no normalization).
stringprep.in_table_c11(code)
檢查code是否在表tableC.1.1 (ASCII space characters).
stringprep.in_table_c12(code)
檢查code是否在表tableC.1.2 (Non-ASCII space characters).
stringprep.in_table_c11_c12(code)
檢查code是否在表tableC.1 (Space characters, union of C.1.1 and C.1.2).
stringprep.in_table_c21(code)
檢查code是否在表tableC.2.1 (ASCII control characters).
stringprep.in_table_c22(code)
檢查code是否在表tableC.2.2 (Non-ASCII control characters).
stringprep.in_table_c21_c22(code)
檢查code是否在表tableC.2 (Control characters, union of C.2.1 and C.2.2).
stringprep.in_table_c3(code)
檢查code是否在表tableC.3 (Private use).
stringprep.in_table_c4(code)
檢查code是否在表tableC.4 (Non-character code points).
stringprep.in_table_c5(code)
檢查code是否在表tableC.5 (Surrogate codes).
stringprep.in_table_c6(code)
檢查code是否在表tableC.6 (Inappropriate for plain text).
stringprep.in_table_c7(code)
檢查code是否在表tableC.7 (Inappropriate for canonical representation).
stringprep.in_table_c8(code)
檢查code是否在表tableC.8 (Change display properties or are deprecated).
stringprep.in_table_c9(code)
檢查code是否在表tableC.9 (Tagging characters).
stringprep.in_table_d1(code)
檢查code是否在表tableD.1 (Characters with bidirectional property “R” or “AL”).
stringprep.in_table_d2(code)
檢查code是否在表tableD.2 (Characters with bidirectional property “L”).
例子:
def nameprep(label): # Map newlabel = [] for c in label: if stringprep.in_table_b1(c): # Map to nothing continue newlabel.append(stringprep.map_table_b2(c)) label = u"".join(newlabel) # Normalize label = unicodedata.normalize("NFKC", label) # Prohibit for c in label: if stringprep.in_table_c12(c) or stringprep.in_table_c22(c) or stringprep.in_table_c3(c) or stringprep.in_table_c4(c) or stringprep.in_table_c5(c) or stringprep.in_table_c6(c) or stringprep.in_table_c7(c) or stringprep.in_table_c8(c) or stringprep.in_table_c9(c): raise UnicodeError("Invalid character %r" % c) # Check bidi RandAL = map(stringprep.in_table_d1, label) for c in RandAL: if c: # There is a RandAL char in the string. Must perform further # tests: # 1) The characters in section 5.8 MUST be prohibited. # This is table C.8, which was already checked # 2) If a string contains any RandALCat character, the string # MUST NOT contain any LCat character. if filter(stringprep.in_table_d2, label): raise UnicodeError("Violation of BIDI requirement 2") # 3) If a string contains any RandALCat character, a # RandALCat character MUST be the first character of the # string, and a RandALCat character MUST be the last # character of the string. if not RandAL[0] or not RandAL[-1]: raise UnicodeError("Violation of BIDI requirement 3") return label
著作權聲明:本文為博主原創文章,未經博主允許不得轉載。
3.6 stringprep--互連網網域名稱的字元標準庫