Php sensitive word filtering Advanced Edition

Source: Internet
Author: User
We have introduced a php program that filters out some special characters, next, we will upgrade the sensitive word filter function to be more powerful. with this function, we are no longer afraid of adding spaces or other punctuation marks to the sensitive words .... We have introduced a php program that filters out some special characters, next, we will upgrade the sensitive word filter function to be more powerful. with this function, we are no longer afraid of adding spaces or other punctuation marks in the middle of sensitive words.


As long as the user can speak, there may be advertisements or other sensitive words. Therefore, the sensitive word filter must be added to keep the site "pure ".

Filter mechanism: add php keyword regular match

// $ Str is user data
Function wordFilter ($ str)
{
/*
Obtain the sensitive word list
Storage of sensitive words:
1: stored in a txt file (General method)
2: stored in the cache (better method)
I am stored in memcachd.
*/
$ Words = getSensitiveWords ();

Foreach ($ words as $ word)
{
$ Preg_letter = '/^ [A-Za-z] + $ /';
If (preg_match ($ preg_letter, $ str ))
{// Match Chinese characters
$ Str = strtolower ($ str );
$ Pattern_1 = '/([^ A-Za-z] + '. $ word. '[^ A-Za-z] +) | ([^ A-Za-z] + '. $ word. '\ s +) | (\ s + '. $ word. '[^ A-Za-z] +) | (^ '. $ word. '[^ A-Za-z] +) | ([^ A-Za-z] + '. $ word. '$ )/';
// The sensitive words are not empty.
If (preg_match ($ pattern_1, $ str ))
{
$ Flag = TRUE;
}
$ Pattern_2 = '/(^ '. $ word. '\ s +) | (\ s + '. $ word. '\ s +) | (\ s + '. $ word. '$) | (^ '. $ word. '$ )/';
// The sensitive word can contain spaces on both sides.
If (preg_match ($ pattern_2, $ str ))
{
$ Flag = TRUE;
}
}
Else
{// Match an English string, case insensitive
$ Pattern = '/\ s *'. $ word. '\ s */';
If (preg_match ($ pattern, $ str ))
{
$ Flag = TRUE;
}
}
}
}
Problems:

If only keyword matching is added, there are a variety of anti-filtering methods, including adding spaces or other punctuation marks in the middle.
Example:
Sensitive word: Buckle

After processing:
Buckle
Buckle
Button
1 button
At this time, the regular expression matching of the code may fail.

Solution:

Remove all punctuation marks and special characters from user data before sensitive word judgment.

Code:

$ Flag_arr = array ('? ','! ',' ¥ ',' (',') ',': ',' '"', '… ','. ', 'Nbsp ',']','【','~ ');

$ Content_filter = preg_replace ('/\ s/', '', preg_replace ("/[: punct:]/",'', strip_tags (html_entity_decode (str_replace ($ flag_arr, '', $ content), ENT_QUOTES, 'utf-8 '))));
$ Content_filter is the processed user data, and then performs the wordFilter ($ content_filter) filter operation.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.