The use of regular expression key regular expression
. * (keywords 1| keywords 2| keyword 3). *
Simulating business code
@WebServlet (name = "Patterncontrol", Urlpatterns = {"/P"}) public class Patterncontrol extends HttpServlet {private STA Tic final Pattern pattern = Initpattern (); private static Pattern Initpattern () {list<string> stringlist = null; try {stringlist = Files.readalllines (Paths.get ("/users/hans/documents/word.txt")); } catch (IOException e) {e.printstacktrace (); } StringBuilder StringBuilder = new StringBuilder (". * ("); Stringbuilder.append (stringlist.get (0)); for (int i = 1; i < stringlist.size (); i++) {Stringbuilder.append ("|" + Stringlist.get (i)); } stringbuilder.append ("). *"); Pattern pattern = Pattern.compile (stringbuilder.tostring ()); return pattern; } protected void DoPost (HttpServletRequest request, httpservletresponse response) throws Servletexception, IOException {if (pattern = = null) {Response.senderror ("pattern is null"); Return } if (Request.getparameter ("word") = = null) {Response.senderror ("word is null"); Return } Boolean ismatcher = Pattern.matcher (Request.getparameter ("word")). Matches (); if (ismatcher) {response.senderror (209); } else {response.senderror (409); }} protected void Doget (HttpServletRequest request, httpservletresponse response) throws Servletexception, ioexcept Ion {DoPost (request, response); }} |
The premise of time space occupancy
A total of 28,448 keywords, compiled into the above regular expression
CPU |
2.2GHz Intel i7 four core |
Memory |
16GB MHz DDR3 |
Time (average results of multiple experiments)
Stage |
Time-consuming (ms) |
Initialization |
Read sensitive words: 38 Compile regular expression: 41 |
Each match |
47 |
Space conditions (multiple experimental average results)
Stage |
memory Consumption (MB) |
Initializing compiled regular expressions |
11 |
Each match |
Minimal |
CPU and heap Runtime scenario diagram
Conclusion
The effect of filtering sensitive words with regular expressions is better
Second, using string violence to match the core idea
Loop A total number of keywords several times, determine whether the string to be matched to include the current cycle of the keyword
Simulating business code
@WebServlet (name = "Patterncontrol", Urlpatterns = {"/P"}) public class Patterncontrol extends HttpServlet {private STA Tic final list<string> stringlist = Initstringlist (); private static list<string> initstringlist () {list<string> stringlist = null; try {stringlist = Files.readalllines (Paths.get ("/users/hans/documents/word.txt")); } catch (IOException e) {e.printstacktrace (); } return stringlist; The private Boolean Matchkeyword (string text) {for (string markword:stringlist) {if (Text.contains ( Markword)) {return true; }} return false; } protected void DoPost (HttpServletRequest request, httpservletresponse response) throws Servletexception, IOException {if (Request.getparameter ("word") = = null) {Response.senderror ("word is null"); Return } Boolean ismatcher = Matchkeyword (Request.GetParameter ("word")); if (ismatcher) {response.senderror (209); } else {response.senderror (409); }} protected void Doget (HttpServletRequest request, httpservletresponse response) throws Servletexception, ioexcept Ion {DoPost (request, response); }} |
Time Space occupancy time (multiple experimental average results)
Stage |
Time-consuming (ms) |
Initialization |
Read sensitive words: 38 |
Each match |
10 |
Space conditions (multiple experimental average results)
Stage |
memory Consumption (MB) |
Initialization |
3 |
Each match |
Minimal |
Conclusion
Better use of violent matching
Third, using tire tree to match the core idea
Make all the sensitive words a tree of trees with only one character per node
Simulating business code
public class Testacwithoutfail {public void Insert (String str, node root) {node p = root; for (char C:str.tochararray ()) {if (P.childrennodes.get (c) = = null) {P.childrennodes.put (c, New Node ()); } p = P.childrennodes.get (c); } P.isend = true; public boolean Iscontainsensitiveword (String text, Node root) {for (int i = 0; i < text.length (); i++) { Node Nownode = root; for (int j = i; J < Text.length (); j + +) {char word = Text.charat (j); Nownode = NowNode.childrenNodes.get (word); if (Nownode! = null) {if (nownode.isend) {return true; }} else {break; }}} return false; } public string Containsensitiveword (string text, Node root) {for (int i = 0; i < text.length (); i++) {Node nownode = root; for (int j = i; J < Text.length (); j + +) {char word = Text.charat (j); Nownode = NowNode.childrenNodes.get (word); if (Nownode! = null) {if (nownode.isend) {return text.substring (i, j + 1); }} else {break; }}} ' return ' ";} public static void Main (string[] args) throws IOException, interruptedexception {list<string> stringlist = F Iles.readalllines (Paths.get ("/users/hans/documents/word.txt")); Testacwithoutfail acnofail = new Testacwithoutfail (); Node root = new node (); for (String text:stringlist) {Acnofail.insert (text, root); } string string = "Tiresdfsdffffffffffaaaaaaaaaa"; Thread.Sleep (10 * 1000); for (int i = 0; i < 1; i++) {System.out.println (Acnofail.iscontainsensitiveword (string, root)); }}}class NOde {map<character, node> childrennodes = new hashmap<> (); Boolean isend;} |
Time Space occupancy time (multiple experimental average results)
Stage |
Time-consuming (ms) |
Initialization |
Read sensitive words: 38 Build comparison tree: 36 |
Each match |
0.01 magnitude (executes 1000 times 34ms, executes 10,000 times 130ms) |
Space conditions (multiple experimental average results)
Stage |
memory Consumption (MB) |
Initialization |
Read string: 3 Build comparison tree: 24 |
Each match |
Minimal |
Conclusion
In this business and the violent match effect which is well worth discussing
Iv. The practice of the Service Deployment mode master (Lawless that way)
- Deploy one server separately
- Put keywords in a table
- Set the scheduled task to initialize once per day
My plan
- Put keywords in a table
- Set the scheduled task to initialize once a day (in fact, feel the daily update is quite wasteful, write an interface, the table after the data changed, call it can be)
- Each machine on multiple machines reads the database once a day to solve the problem of synchronization (or to cache the data with tair, but I feel like it's not a problem)
Filter sensitive Word mode