Java implements sensitive word filtering (DFA algorithm) and javadfa

Source: Internet
Author: User

Java implements sensitive word filtering (DFA algorithm) and javadfa

Before writing, Alan recommended a article from http://cmsblogs.com /? For a blog post with p = 1031, refer to some sections to describe the blog.

When filtering sensitive words, do you not need to explain them too much? To put it bluntly, you must be able to detect certain words in the project (for example, when entering xxoo-Related Words). In many projects, there will be a sensitive word management module, in the sensitive word management module, you can add sensitive words, filter the sensitive words in the input content based on the added sensitive words, and perform corresponding processing, either prompting or highlighting, you can either directly replace it with other text or symbols.

Sensitive word filtering has many practices. I briefly describe several of my understandings:

① Query sensitive words in the database, loop through each sensitive word, and then search the input text from start to end to check whether the sensitive word exists. If yes, perform corresponding processing, in this way, you can find a processing method.

Advantage: so easy. Using java code is basically not difficult.

Disadvantages: This efficiency makes my heart go through 100,000 horses and horses, and the matching is not a bit painful. If it is English, you will find a very speechless thing. For example, English a is a sensitive word, if I am an English document, how many sensitive words does the program have to handle? Who can tell me?

② The legendary DFA algorithm (with finite automatic machines) is exactly what I want to share with you. After all, it feels quite common. I hope you can check the information on the Internet on your own, this is not detailed here.

Advantage: At least higher efficiency than the above sb.

Disadvantages: it is not difficult to learn algorithms. It is not difficult to use algorithms that have never been learned. It means it is a bit painful to understand, and the matching efficiency is not high, which is memory-consuming, the more sensitive words, the larger the memory usage.

③ The third one should be particularly noted here, that is, you should write an algorithm yourself or optimize it based on the existing algorithm. This is also one of the most advanced realms pursued by Alan, if anyone has his own ideas, don't forget Alan. You can add Alan's QQ: 810104041 to teach Alan to play tricks.

So how is the legendary DFA algorithm implemented?

Step 1: Initialize the sensitive Dictionary (encapsulate sensitive words into the sensitive dictionary using the principles of the DFA algorithm, and save the sensitive dictionary using HashMap). The Code is as follows:

Package com. cfwx. rox. web. sysmgr. util; import java. util. hashMap; import java. util. hashSet; import java. util. iterator; import java. util. list; import java. util. map; import java. util. set; import com. cfwx. rox. web. common. model. entity. sensitiveWord;/*** sensitive dictionary initialization ** @ author AlanLee **/public class SensitiveWordInit {/*** sensitive dictionary */public HashMap sensitiveWordMap; /*** initialize sensitive words ** @ return */public Map initKeyWor D (List <SensitiveWord> sensitiveWords) {try {// extract sensitive words from the sensitive word Set object and encapsulate them into the Set <String> keyWordSet = new HashSet <String> (); for (SensitiveWord s: sensitiveWords) {keyWordSet. add (s. getContent (). trim ();} // Add the sensitive dictionary to the HashMap addSensitiveWordToHashMap (keyWordSet);} catch (Exception e) {e. printStackTrace ();} return sensitiveWordMap;}/*** encapsulate sensitive lexicon ** @ param keyWordSet */@ SuppressWarnings ("rawtyp Es ") private void addSensitiveWordToHashMap (Set <String> keyWordSet) {// initialize the HashMap object and control the container size sensitiveWordMap = new HashMap (keyWordSet. size (); // sensitive word String key = null; // Map nowMap = null for storing sensitive Dictionary data in the corresponding format; // used to construct the sensitive dictionary Map <String, String> newWorMap = null; // use an Iterator to cycle the sensitive word set iterator <String> Iterator = keyWordSet. iterator (); while (iterator. hasNext () {key = iterator. next (); // equals to the sensitive dictionary, HashM The ap object occupies the same address in the memory, so the nowMap object changes. The sensitiveWordMap object also changes nowMap = sensitiveWordMap; for (int I = 0; I <key. length (); I ++) {// intercept the word in a sensitive word. In a sensitive word dictionary, the word is the Key value of the HashMap object. char keyChar = key. charAt (I); // determines whether the word exists in the sensitive dictionary Object wordMap = nowMap. get (keyChar); if (wordMap! = Null) {nowMap = (Map) wordMap;} else {newWorMap = new HashMap <String, String> (); newWorMap. put ("isEnd", "0"); nowMap. put (keyChar, newWorMap); nowMap = newWorMap;} // if the word is the last word of the current sensitive word, it is identified as the ending word if (I = key. length ()-1) {nowMap. put ("isEnd", "1");} System. out. println ("encapsulated sensitive dictionary process:" + sensitiveWordMap);} System. out. println ("view sensitive Dictionary data:" + sensitiveWordMap );}}}

Step 2: Write a tool for filtering sensitive words. You can write the method you need in the tool. The Code is as follows:

Package com. cfwx. rox. web. sysmgr. util; import java. util. hashSet; import java. util. iterator; import java. util. map; import java. util. set;/*** sensitive word filtering tool class ** @ author AlanLee **/public class SensitivewordEngine {/*** sensitive dictionary */public static Map sensitiveWordMap = null; /*** only filter the smallest sensitive word */public static int minMatchTYpe = 1;/*** filter all sensitive words */public static int maxMatchType = 2; /*** number of sensitive words in the sensitive dictionary *** @ return */pub Lic static int getWordSize () {if (SensitivewordEngine. sensitiveWordMap = null) {return 0;} return SensitivewordEngine. sensitiveWordMap. size ();}/*** whether the sensitive word is included ** @ param txt * @ param matchType * @ return */public static boolean isContaintSensitiveWord (String txt, int matchType) {boolean flag = false; for (int I = 0; I <txt. length (); I ++) {int matchFlag = checkSensitiveWord (txt, I, matchType ); If (matchFlag> 0) {flag = true ;}} return flag ;} /*** obtain sensitive word content ** @ param txt * @ param matchType * @ return sensitive word content */public static Set <String> getSensitiveWord (String txt, int matchType) {Set <String> sensitiveWordList = new HashSet <String> (); for (int I = 0; I <txt. length (); I ++) {int length = checkSensitiveWord (txt, I, matchType); if (length> 0) {// Save the detected sensitive words to the sensitiveWordList in the set. add (Txt. substring (I, I + length); I = I + length-1 ;}} return sensitiveWordList ;} /*** replace sensitive words ** @ param txt * @ param matchType * @ param replaceChar * @ return */public static String replaceSensitiveWord (String txt, int matchType, String replaceChar) {String resultTxt = txt; Set <String> set = getSensitiveWord (txt, matchType); Iterator <String> iterator = set. iterator (); String word = null; Strin G replaceString = null; while (iterator. hasNext () {word = iterator. next (); replaceString = getReplaceChars (replaceChar, word. length (); resultTxt = resultTxt. replaceAll (word, replaceString);} return resultTxt ;} /*** replace sensitive word content ** @ param replaceChar * @ param length * @ return */private static String getReplaceChars (String replaceChar, int length) {String resultReplace = replaceChar; for (int I = 1; I <length; I ++) {resultReplace + = replaceChar;} return resultReplace ;} /*** check the number of sensitive words ** @ param txt * @ param beginIndex * @ param matchType * @ return */public static int checkSensitiveWord (String txt, int beginIndex, int matchType) {boolean flag = false; // number of sensitive words recorded int matchFlag = 0; char word = 0; Map nowMap = SensitivewordEngine. sensitiveWordMap; for (int I = beginIndex; I <txt. length (); I ++) {word = txt. charAt (I); // determines whether the word exists in the sensitive dictionary nowMap = (Map) nowMap. get (word); if (nowMap! = Null) {matchFlag ++; // determines whether it is the end word of a sensitive word. if it is the end word, determines whether to continue to detect if ("1 ". equals (nowMap. get ("isEnd") {flag = true; // judge the filter type. if it is a small filter, the loop is exceeded; otherwise, the loop is continued if (SensitivewordEngine. minMatchTYpe = matchType) {break ;}} else {break ;}} if (! Flag) {matchFlag = 0;} return matchFlag ;}}

Step 3: Everything is ready. Of course, sensitive words in the database are queried and filtered. The Code is as follows:

@ SuppressWarnings ("rawtypes") @ Override public Set <String> sensitiveWordFiltering (String text) {// initialize the sensitive dictionary object SensitiveWordInit sensitiveWordInit = new SensitiveWordInit (); // obtain the sensitive word object set from the database (the called method comes from the Dao layer, which is the implementation class of the service layer) List <SensitiveWord> sensitiveWords = sensitiveWordDao. getSensitiveWordListAll (); // construct the sensitive dictionary Map sensitiveWordMap = sensitiveWordInit. initKeyWord (sensitiveWords); // input the sensitive dictionary SensitivewordEngine in the SensitivewordEngine class. sensitiveWordMap = sensitiveWordMap; // obtain the sensitive words. If you pass in 2, you can obtain all the sensitive words Set <String> set = SensitivewordEngine. getSensitiveWord (text, 2); return set ;}

Last step: Write a method at the Controller layer to request the front-end. The front-end obtains the required data and processes it accordingly. The Code is as follows:

/*** Filter sensitive words ** @ param text * @ return */@ RequestMapping (value = "/word/filter") @ ResponseBody public RespVo sensitiveWordFiltering (String text) {RespVo respVo = new RespVo (); try {Set <String> set = sensitiveWordService. sensitiveWordFiltering (text); respVo. setResult (set);} catch (Exception e) {throw new RoxException ("An error occurred while filtering sensitive words. Please contact the maintenance personnel");} return respVo ;}

  

Alan wrote a lot of comments in the code. I hope you can understand it with your own brains.

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.