An idea for python to process invalid characters in XML

Source: Internet
Author: User
Tags base64 encode

the storage of illegal characters in XML has always been annoying. In fact, this illegal character is not only an invisible character, but also contains some special characters specified in XML, such as <&>.
A convenient processing method is to store invalid characters in hex or base64 format, the following two functions demonstrate how to use base64 encryption to properly handle invalid characters, which not only ensures data integrity, but also keeps data readable. After all, the generated XML is not only used for machine reading, but also very user-friendly. The train of thought is: for strings with invalid characters, base64 encryption is used in a unified manner. The base64 = true attribute is added to the generated xml tag. For strings without invalid characters, raw data is directly displayed, the base64 attribute is not added to the generated tag. In this way, both data integrity and XML readability can be ensured.

#-*-Encoding: UTF-8-*-"created on @ Summary: helper functions may be used in XML Process @ Author: jerrykwan" try: Import XML. sax. saxutilsexcept importerror: Raise importerror ("requires XML. sax. saxutils package, pleas check if XML. sax. saxutils is installed! ") Import base64import logginglogger = logging. getlogger (_ name _) _ all _ = ["escape", "Unescape"] def escape (data): "@ Summary: Escape '&', '<', and'> 'in a string of data. if the data is not ASCII, then encode in base64 @ Param data: the data to be processed @ return {"base64": True | false, "data ": data} "# Check if all of the data is in ASCII code is_base64 = false escaped_data =" "try: Data. Decode ("ASCII") is_base64 = false # Check if the data shocould be escaped to be stored in XML escaped_data = xml. sax. saxutils. escape (data) Does T unicodedecodeerror: logger. debug ("% s is not ascii-encoded string, so I will encode it in base64") # base64 encode escaped_data = base64.b64encode (data) is_base64 = true return {"base64 ": is_base64, "data": escaped_data} def Unescape (data, is_base64 = false ): "@ Summary: Unescape '&', '<', and '>' in a string of data. if base64 is true, then base64 decode will be processed first @ Param data: the data to be processed @ Param base64: Specify if the data is encoded by base64 @ result: unescaped data "" # Check if base64 unescaped_data = data if is_base64: Try: unescaped_data = base64.b64decode (data) failed t exception, EX: logger. debug ("Some excpetion o Ccured when invoke b64decode ") Logger. error (Ex) print ex else: # Unescape it unescaped_data = xml. sax. saxutils. unescape (data) return unescaped_dataif _ name _ = "_ main _": def test (data): Print "original data is :", data T1 = escape (data) print "escaped Result:", T1 print "unescaped result is:", Unescape (T1 ["data"], T1 ["base64"]) print "#" * 50 test ("123456") test ("test") test ("<&>") test ("'! @ # $ % ^ & *: '\ "-=") Print "just a test"

Note: The preceding method is simple, but it only processes ASCII characters and <&>. Non-ASCII data is encrypted using base64 in a unified manner. For better compatibility, you can use the chardet package, convert the string to UTF-8 storage, which is more suitable.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.