How to remove unwanted characters from a string in python?
Problem:
Filter unnecessary leading and trailing spaces in user input
'+++ Abc123 ---'
Filter '\ R' in the edited text in a windows environment ':
'Hello world \ r \ N'
Remove the unicode Character and tone in the text.
"Zh à o qán S limit n L limit Zh ō u wúzh é ng wáng"
How to solve the above problems?
Remove strings at both ends: strip (), rstrip (), and lstrip ()
#! /Usr/bin/python3 s = '----- abc123 ++' # Delete null characters print (s. strip () # Delete print (s. rstrip () # Delete print (s. lstrip () # delete both sides-+ and null character print (s. strip (). strip ('-+ '))
Delete a single fixed position character: Slice + stitching
#! /Usr/bin/python3 s = 'abc: 8080' # Remove the colon new_s = s [: 3] + s [4:] print (new_s)
Delete arbitrary characters and delete multiple characters at the same time: replace (), re. sub ()
#! /Usr/bin/python3 # Remove the same character s = '\ tabc \ t123 \ tisk' print (s. replace ('\ t', '') import re # Remove the \ r \ n \ t character s =' \ r \ nabc \ t123 \ nxyz 'print (re. sub ('[\ r \ n \ t]', ', s ))
Delete multiple characters at the same time: Map str. maketrans () in translate () py3.
#! /Usr/bin/python3 s = 'abc123xyz '# a _> x, B _> y, c _> z, character ing encrypted print (str. maketrans ('abcxyz', 'xyzabc') # convert it into a string print (s. translate (str. maketrans ('abcxyzz', 'xyzabc ')))
Remove the tones From unicode characters
#! /Usr/bin/python3 import sysimport unicodedatas = "Zh à o qán S limit n L limit Zh limit u wúzh è ng wáng" remap = {# ord return ascii value ord ('\ t '): '', ord ('\ F'):'', ord (' \ R'): None} # Remove \ t, \ f, \ ra = s. translate (remap) ''' by using dict. the fromkeys () method constructs a dictionary. Each Unicode and note serves as the key. If all the values are None, unicodedata is used. normalize () standardizes the original input into a decomposed form character sys. maxunicode: the integer that gives the maximum Unicode code point value, that is, 1114111 (hexadecimal 0x10FFFF ). Unicodedata. combining: returns the normalized combination class assigned to the character chr as an integer. If no combination class is defined, 0 is returned. ''' Cmb _ chrs = dict. fromkeys (c for c in range (sys. maxunicode) if unicodedata. combining (chr (c) # we recommend that you split this part to understand B = unicodedata. normalize ('nfd ', a) ''' call the translate function to delete all heavy notes ''' print (B. translate (cmb_chrs ))
The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.