We have introduced a php program that filters out some special characters, next, we will upgrade the sensitive word filter function to be more powerful. with this function, we are no longer afraid of adding spaces or other punctuation marks to the sensitive words .... We have introduced a php program that filters out some special characters, next, we will upgrade the sensitive word filter function to be more powerful. with this function, we are no longer afraid of adding spaces or other punctuation marks in the middle of sensitive words.
As long as the user can speak, there may be advertisements or other sensitive words. Therefore, the sensitive word filter must be added to keep the site "pure ".
Filter mechanism: add php keyword regular match
// $ Str is user data
Function wordFilter ($ str)
{
/*
Obtain the sensitive word list
Storage of sensitive words:
1: stored in a txt file (General method)
2: stored in the cache (better method)
I am stored in memcachd.
*/
$ Words = getSensitiveWords ();
Foreach ($ words as $ word)
{
$ Preg_letter = '/^ [A-Za-z] + $ /';
If (preg_match ($ preg_letter, $ str ))
{// Match Chinese characters
$ Str = strtolower ($ str );
$ Pattern_1 = '/([^ A-Za-z] + '. $ word. '[^ A-Za-z] +) | ([^ A-Za-z] + '. $ word. '\ s +) | (\ s + '. $ word. '[^ A-Za-z] +) | (^ '. $ word. '[^ A-Za-z] +) | ([^ A-Za-z] + '. $ word. '$ )/';
// The sensitive words are not empty.
If (preg_match ($ pattern_1, $ str ))
{
$ Flag = TRUE;
}
$ Pattern_2 = '/(^ '. $ word. '\ s +) | (\ s + '. $ word. '\ s +) | (\ s + '. $ word. '$) | (^ '. $ word. '$ )/';
// The sensitive word can contain spaces on both sides.
If (preg_match ($ pattern_2, $ str ))
{
$ Flag = TRUE;
}
}
Else
{// Match an English string, case insensitive
$ Pattern = '/\ s *'. $ word. '\ s */';
If (preg_match ($ pattern, $ str ))
{
$ Flag = TRUE;
}
}
}
}
Problems:
If only keyword matching is added, there are a variety of anti-filtering methods, including adding spaces or other punctuation marks in the middle.
Example:
Sensitive word: Buckle
After processing:
Buckle
Buckle
Button
1 button
At this time, the regular expression matching of the code may fail.
Solution:
Remove all punctuation marks and special characters from user data before sensitive word judgment.
Code:
$ Flag_arr = array ('? ','! ',' ¥ ',' (',') ',': ',' '"', '… ','. ', 'Nbsp ',']','【','~ ');
$ Content_filter = preg_replace ('/\ s/', '', preg_replace ("/[: punct:]/",'', strip_tags (html_entity_decode (str_replace ($ flag_arr, '', $ content), ENT_QUOTES, 'utf-8 '))));
$ Content_filter is the processed user data, and then performs the wordFilter ($ content_filter) filter operation.