This article mainly introduces php regular expressions for filtering HTML tags and attributes. This article uses code examples to provide regular expressions for filtering HTML content. For more information, see comments in the code, this article has a great effect on friends who use PHP to collect data. For more information, see
$ Str = preg_replace ("/\ s +/", "", $ str); // filter excess carriage returns $ str = preg_replace ("/<[] +/si ", "<", $ str); // filter <__( "<" followed by a space) $ str = preg_replace ("/<\! --.*? -->/Si "," ", $ str); // comment $ str = preg_replace ("/<(\!. *?)> /Si "," ", $ str); // filter DOCTYPE $ str = preg_replace ("/<(\/? Html. *?)> /Si "," ", $ str); // filter html tags $ str = preg_replace ("/<(\/? Head. *?)> /Si "," ", $ str); // filter head tags $ str = preg_replace ("/<(\/? Meta. *?)> /Si "," ", $ str); // filter meta Tags $ str = preg_replace ("/<(\/? Body. *?)> /Si "," ", $ str); // filter the body tag $ str = preg_replace ("/<(\/? Link. *?)> /Si "," ", $ str); // filter link tags $ str = preg_replace ("/<(\/? Form. *?)> /Si "," ", $ str); // filter form tags $ str = preg_replace ("/cookie/si "," COOKIE ", $ str ); // filter COOKIE tags $ str = preg_replace ("/<(applet. *?)> (.*?) <(\/Applet. *?)> /Si "," ", $ str); // filter the applet tag $ str = preg_replace ("/<(\/? Applet. *?)> /Si "," ", $ str); // filter the applet tag $ str = preg_replace ("/<(style. *?)> (.*?) <(\/Style. *?)> /Si "," ", $ str); // filter the style tag $ str = preg_replace ("/<(\/? Style. *?)> /Si "," ", $ str); // filter the style tag $ str = preg_replace ("/<(title. *?)> (.*?) <(\/Title. *?)> /Si "," ", $ str); // filter the title tag $ str = preg_replace ("/<(\/? Title. *?)> /Si "," ", $ str); // filter the title tag $ str = preg_replace ("/<(object. *?)> (.*?) <(\/Object. *?)> /Si "," ", $ str); // filter the object tag $ str = preg_replace ("/<(\/? Objec. *?)> /Si "," ", $ str); // filter the object tag $ str = preg_replace ("/<(noframes. *?)> (.*?) <(\/Noframes. *?)> /Si "," ", $ str); // filter noframes tags $ str = preg_replace ("/<(\/? Noframes. *?)> /Si "," ", $ str); // filter noframes tags $ str = preg_replace ("/<(I? Frame. *?)> (.*?) <(\/I? Frame. *?)> /Si "," ", $ str); // filter the frame tag $ str = preg_replace ("/<(\/? I? Frame. *?)> /Si "," ", $ str); // filter the frame tag $ str = preg_replace ("/<(script. *?)> (.*?) <(\/Script. *?)> /Si "," ", $ str); // filter the script tag $ str = preg_replace ("/<(\/? Script. *?)> /Si "," ", $ str); // filter the script tag $ str = preg_replace ("/javascript/si "," Javascript ", $ str ); // filter the script tag $ str = preg_replace ("/vbscript/si", "Vbscript", $ str ); // filter the script tag $ str = preg_replace ("/on ([a-z] +) \ s * =/si", "On \ 1 = ", $ str); // filter script tags $ str = preg_replace ("// & #/si", "& #", $ str); // filter script tags, such as javAsCript: alert (
Clear spaces and line feed
function DeleteHtml($str){$str = trim($str);$str = strip_tags($str,"");$str = ereg_replace("\t","",$str);$str = ereg_replace("\r\n","",$str);$str = ereg_replace("\r","",$str);$str = ereg_replace("\n","",$str);$str = ereg_replace(" "," ",$str);return trim($str);}
Filter HTML attributes
1. filter regular expressions of all html tags:
The code is as follows:
] +>
// Regular expression used to filter attributes of all html tags:
$ Html = preg_replace ("/<([a-zA-Z] +) [^>] *>/", "<\ 1>", $ html );
3. filter out regular expressions of some html tags (for example, exclude
, That is, do not filter
):
The code is as follows:
] +>
4. the enumeration expression used to filter some html tags (for example, to filter
):
The code is as follows:
] *>
5. exclude the regular expression for filtering the attributes of some html tags (for example, exclude the alt attribute, that is, do not filter the alt attribute ):
The code is as follows:
\ S (?! Alt) [a-zA-Z] + = [^ \ s] *
6. the regular expression for filtering the attributes of some html tags (such as the alt attribute ):
The code is as follows:
(\ S) alt = [^ \ s] *