Did not find Python has a similar functional module, so yesterday wrote a simple strip_tags but there are some problems, today applied to the collection on the part of the function is intact,
1. For self-closing and label handling
2. And filtering of tag parameters
From Html.parser import htmlparserdef strip_tags (HTML, Allow_tags=none, allow_attrs=none): result = [] start = [] data = [] # Special self-closing and labeling, press HTML5 rules, such as <br> <wbr> no longer using/> end special_end_tags = [' Area ', ' base ', ' BR ', ' col ', ' embed ', ' hr ', ' img ', ' input ', ' keygen ', ' link ', ' meta ', ' param ', ' source ', ' tra CK ', ' WBR '] def starttag (tag, attrs): If tag not in Allow_tags:return start.append (tag) My_attrs = [] If attrs:for attr in Attrs:if allow_attrs and attr[0] not in Allow_ Attrs:continue My_attrs.append (attr[0] + ' = "' + attr[1] + '" ') if My_attrs: My_attrs = ' + ('. Join (my_attrs)) Else:my_attrs = ' Else: My_attrs = ' Result.append (' < ' + tag + my_attrs + ' > ') def endtag (tag): if start and tag = = Start [Len (Start)-1]: # Special self-closing and tags follow HTML5 rule without backslash direct angle brackets End If tag not in Special_end_tags:result.append (' </' + tag + ' > ') parser = Htmlparser () Parser.handle_data = result.append if Allow_tags:parser.handle_starttag = Starttag Parser.handle_endtag = Endtag parser.feed (HTML) parser.close () for I in range (0, Len (result)): TMP = Result[i].rstrip (' \ n ') tmp = Tmp.lstrip (' \ n ') if Tmp:data.append (TMP) return '. J Oin (data)
Python Cleans HTML tags similar to PHP's Strip_tags function function (ii)