Python processes control characters in text files,
Control characters
The Control Character (Control Character), or non-printable Character, appears in the specific information text, indicating the Character of a certain Control function, such as the Control Character: LF (line feed), CR (carriage return) FF, DEL, BS, and BEL; Communication special characters: SOH, EOT, and ACK).
There are two sets of specific control characters:
The seven-digit ASCII defines 33 codes as control characters, which are 0 to 31 and 127 (at 0x00-0x1F and 0x7F ).
Compatible eight-bit ISO/IEC 8859-1 with 32 Codes defined from ISO/IEC 6429 to 128, located at 0x80-0x9F.
Control Character List:Http://ascii-table.com/control-chars.php
Python solution for controlling characters: (not verified one by one)
Solution 1:
strip_control_characters = lambda s:"".join(i for i in s if 31<ord(i)<127)
Solution 2:
def strip_control_characters(str_input): if str_input: import re # unicode invalid characters RE_XML_ILLEGAL = u'([\u0000-\u0008\u000b-\u000c\u000e-\u001f\ufffe-\uffff])' + \ u'|' + \ u'([%s-%s][^%s-%s])|([^%s-%s][%s-%s])|([%s-%s]$)|(^[%s-%s])' % \ (unichr(0xd800),unichr(0xdbff),unichr(0xdc00),unichr(0xdfff), unichr(0xd800),unichr(0xdbff),unichr(0xdc00),unichr(0xdfff), unichr(0xd800),unichr(0xdbff),unichr(0xdc00),unichr(0xdfff), ) str_input = re.sub(RE_XML_ILLEGAL, "", input) # ascii control characters str_input = re.sub(r"[\x01-\x1F\x7F]", "", input) return str_input
Solution 3:
import re def remove_control_chars(s): control_chars = ''.join(map(unichr, range(0,32) + range(127,160))) control_char_re = re.compile('[%s]' % re.escape(control_chars)) return control_char_re.sub('', s) cleaned_json = remove_control_chars(original_json)obj = simplejson.loads(cleaned_json)
Summary
The above is all about this article. I hope this article will help you learn or use python. If you have any questions, please leave a message.