In the project process, we often need to filter some html tags to improve data security. In fact, it is to delete the data that has potential harm to the application. It is used to remove tags and delete or encode unnecessary characters. First, share some common
$ Str = preg_replace ("/
] *? Srcs * = s * ('| ")(.*?) \ 1 [^>] *? /? S *>/I "," ", $ str); // filter img tags $ str = preg_replace ("/s +/"," ", $ str ); // filter excess carriage return $ str = preg_replace ("/<[] +/si", "<", $ str ); // filter <__( "<" followed by a space) $ str = preg_replace ("/
/Si "," ", $ str); // comment $ str = preg_replace ("/<(!. *?)> /Si "," ", $ str); // filter DOCTYPE $ str = preg_replace ("/<(/? Html. *?)> /Si "," ", $ str); // filter html tags $ str = preg_replace ("/<(/? Head. *?)> /Si "," ", $ str); // filter head tags $ str = preg_replace ("/<(/? Meta. *?)> /Si "," ", $ str); // filter meta Tags $ str = preg_replace ("/<(/? Body. *?)> /Si "," ", $ str); // filter the body tag $ str = preg_replace ("/<(/? Link. *?)> /Si "," ", $ str); // filter link tags $ str = preg_replace ("/<(/? Form. *?)> /Si "," ", $ str); // filter form tags $ str = preg_replace ("/cookie/si "," COOKIE ", $ str ); // filter COOKIE tags $ str = preg_replace ("/<(applet. *?)> (.*?) <(/Applet. *?)> /Si "," ", $ str); // filter the applet tag $ str = preg_replace ("/<(/? Applet. *?)> /Si "," ", $ str); // filter the applet tag $ str = preg_replace ("/<(style. *?)> (.*?) <(/Style. *?)> /Si "," ", $ str); // filter the style tag $ str = preg_replace ("/<(/? Style. *?)> /Si "," ", $ str); // filter the style tag $ str = preg_replace ("/<(title. *?)> (.*?) <(/Title. *?)> /Si "," ", $ str); // filter the title tag $ str = preg_replace ("/<(/? Title. *?)> /Si "," ", $ str); // filter the title tag $ str = preg_replace ("/<(object. *?)> (.*?) <(/Object. *?)> /Si "," ", $ str); // filter the object tag $ str = preg_replace ("/<(/? Objec. *?)> /Si "," ", $ str); // filter the object tag $ str = preg_replace ("/<(noframes. *?)> (.*?) <(/Noframes. *?)> /Si "," ", $ str); // filter noframes tags $ str = preg_replace ("/<(/? Noframes. *?)> /Si "," ", $ str); // filter noframes tags $ str = preg_replace ("/<(I? Frame. *?)> (.*?) <(/I? Frame. *?)> /Si "," ", $ str); // filter the frame tag $ str = preg_replace ("/<(/? I? Frame. *?)> /Si "," ", $ str); // filter the frame tag $ str = preg_replace ("/<(script. *?)> (.*?) <(/Script. *?)> /Si "," ", $ str); // filter the script tag $ str = preg_replace ("/<(/? Script. *?)> /Si "," ", $ str); // filter the script tag $ str = preg_replace ("/javascript/si "," Javascript ", $ str ); // filter the script tag $ str = preg_replace ("/vbscript/si", "Vbscript", $ str ); // filter the script tag $ str = preg_replace ("/on ([a-z] +) s * =/si", "On \ 1 =", $ str ); // filter the script tag $ str = preg_replace ("// & #/si", "& #", $ str); // filter the script tag
Easier writing:
Function delhtml ($ str) {// clear the html tag $ st =-1; // start $ et =-1; // end $ stmp = array (); $ stmp [] = ""; $ len = strlen ($ str); for ($ I = 0; $ I <$ len; $ I ++) {$ ss = substr ($ str, $ I, 1); if (ord ($ ss) = 60) {// ord ("<") = 60 $ st = $ I;} if (ord ($ ss) = 62) {// ord (">") = 62 $ et = $ I; if ($ st! =-1) {$ stmp [] = substr ($ str, $ st, $ et-$ st + 1) ;}}$ str = str_replace ($ stmp, "", $ str); return $ str ;}
Next:
function clear_html_label($html) { $search = array ("'
]*?>.*?《script》'si", "'<[/!]*?[^<>]*?>'si", "'([rn])[s]+'", "'&(quot|#34);'i", "'&(amp|#38);'i", "'&(lt|#60);'i", "'&(gt|#62);'i", "'&(nbsp|#160);'i", "'&(iexcl|#161);'i", "'&(cent|#162);'i", "'&(pound|#163);'i", "'&(copy|#169);'i", "'&#(d+);'e"); $replace = array ("", "", "1", """, "&", "<", ">", " ", chr(161), chr(162), chr(163), chr(169), "chr(1)");return preg_replace($search, $replace, $html); }
The above three methods can be implemented, but each has its own advantages and disadvantages. you can choose one based on your project requirements.