This feature has been used for a long time to replace the article blocking word with regular expressions. Because there is no pressure during use, its performance has not been optimized.
At the leader's request today, I tested and improved the performance, and found that the improved performance has increased by more than 100 times! It took more than 130 milliseconds to replace an article. Now it only takes less than 1 millisecond!
The main difference is the regular expression generation and the number of times the circular article content is generated.
The main code below is provided for your reference.
Private Static readonly RegEx reg_ B = new RegEx (@ "\ B", regexoptions. compiled); Private Static readonly RegEx reg_en = new RegEx (@ "[A-Za-Z] +", regexoptions. compiled); Private Static readonly RegEx reg_num = new RegEx (@ "^ [\-\. \ s \ D] + $ ", regexoptions. compiled); Private Static RegEx reg_word = NULL; // combines the regular Private Static RegEx getregex () {If (reg_word = NULL) of all blocked words) {reg_word = new RegEx (getpattern (), Regexoptions. compiled | regexoptions. ignorecase);} return reg_word;} // <summary> // check whether the input content contains dirty words (true is returned if it contains) /// </Summary> Public static bool hasblockwords (string raw) {return getregex (). match (raw ). success;} // <summary> // Replace the dirty word with the * sign. // </Summary> Public static string wordsfilter (string raw) {return getregex (). replace (raw, "***");} // <summary> // obtain the dirty words contained in the content /// </Summary> Public stati C ienumerable <string> getblockwords (string raw) {foreach (match mat in reg_word.matches (raw) {yield return (mat. value) ;}} Private Static string getpattern () {stringbuilder patt = new stringbuilder (); string s; foreach (string word in getblockwords () {If (word. length = 0) continue; If (word. length = 1) {patt. appendformat ("| ({0})", word);} else if (reg_num.ismatch (Word) {patt. appendfor MAT ("| ({0})", word);} else if (reg_en.ismatch (Word) {S = reg_ B .replace (word ,@"(?: [^ A-Za-Z] {0, 3}) "); patt. appendformat ("| ({0})", S);} else {S = reg_ B .replace (word ,@"(?: [^ \ U4e00-\ u9fa5] {0, 3}) "); patt. appendformat ("| ({0})", S) ;}} if (patt. length> 0) {patt. remove (0, 1);} return patt. tostring () ;}/// <summary> /// obtain all dirty words /// </Summary> Public static string [] getblockwords () {return New String [] {"Kuomintang", "Fuck", "110"}; // You should obtain it from the database}
This program can replace the following content:
Kuomintang
State-civilian-party
Guo o Mino party
Fuck
F. U. C. K
110 (the 110 deformation statement is not replaced)