Filter sensitive Word mode

Source: Internet
Author: User

The use of regular expression key regular expression

. * (keywords 1| keywords 2| keyword 3). *

Simulating business code
@WebServlet (name = "Patterncontrol", Urlpatterns = {"/P"}) public class Patterncontrol extends HttpServlet {private STA    Tic final Pattern pattern = Initpattern ();        private static Pattern Initpattern () {list<string> stringlist = null;        try {stringlist = Files.readalllines (Paths.get ("/users/hans/documents/word.txt"));        } catch (IOException e) {e.printstacktrace ();        } StringBuilder StringBuilder = new StringBuilder (". * (");        Stringbuilder.append (stringlist.get (0));        for (int i = 1; i < stringlist.size (); i++) {Stringbuilder.append ("|" + Stringlist.get (i));        } stringbuilder.append ("). *");        Pattern pattern = Pattern.compile (stringbuilder.tostring ());    return pattern; } protected void DoPost (HttpServletRequest request, httpservletresponse response) throws Servletexception, IOException {if (pattern = = null) {Response.senderror ("pattern is null");            Return            } if (Request.getparameter ("word") = = null) {Response.senderror ("word is null");        Return        } Boolean ismatcher = Pattern.matcher (Request.getparameter ("word")). Matches ();        if (ismatcher) {response.senderror (209);        } else {response.senderror (409); }} protected void Doget (HttpServletRequest request, httpservletresponse response) throws Servletexception, ioexcept    Ion {DoPost (request, response); }}
The premise of time space occupancy

A total of 28,448 keywords, compiled into the above regular expression

CPU 2.2GHz Intel i7 four core
Memory

16GB MHz DDR3

Time (average results of multiple experiments)
Stage Time-consuming (ms)
Initialization

Read sensitive words: 38

Compile regular expression: 41

Each match 47
Space conditions (multiple experimental average results)
Stage memory Consumption (MB)
Initializing compiled regular expressions 11
Each match Minimal

CPU and heap Runtime scenario diagram

Conclusion

The effect of filtering sensitive words with regular expressions is better

Second, using string violence to match the core idea

Loop A total number of keywords several times, determine whether the string to be matched to include the current cycle of the keyword

Simulating business code
@WebServlet (name = "Patterncontrol", Urlpatterns = {"/P"}) public class Patterncontrol extends HttpServlet {private STA    Tic final list<string> stringlist = Initstringlist ();        private static list<string> initstringlist () {list<string> stringlist = null;        try {stringlist = Files.readalllines (Paths.get ("/users/hans/documents/word.txt"));        } catch (IOException e) {e.printstacktrace ();    } return stringlist; The private Boolean Matchkeyword (string text) {for (string markword:stringlist) {if (Text.contains (            Markword)) {return true;    }} return false; } protected void DoPost (HttpServletRequest request, httpservletresponse response) throws Servletexception, IOException            {if (Request.getparameter ("word") = = null) {Response.senderror ("word is null");        Return } Boolean ismatcher = Matchkeyword (Request.GetParameter ("word"));        if (ismatcher) {response.senderror (209);        } else {response.senderror (409); }} protected void Doget (HttpServletRequest request, httpservletresponse response) throws Servletexception, ioexcept    Ion {DoPost (request, response); }}

Time Space occupancy time (multiple experimental average results)

Stage Time-consuming (ms)
Initialization

Read sensitive words: 38

Each match 10

Space conditions (multiple experimental average results)

Stage memory Consumption (MB)
Initialization 3
Each match Minimal
Conclusion

Better use of violent matching

Third, using tire tree to match the core idea

Make all the sensitive words a tree of trees with only one character per node

Simulating business code

public class Testacwithoutfail {public void Insert (String str, node root) {node p = root; for (char C:str.tochararray ()) {if (P.childrennodes.get (c) = = null) {P.childrennodes.put (c,            New Node ());        } p = P.childrennodes.get (c);    } P.isend = true;            public boolean Iscontainsensitiveword (String text, Node root) {for (int i = 0; i < text.length (); i++) {            Node Nownode = root;                for (int j = i; J < Text.length (); j + +) {char word = Text.charat (j);                Nownode = NowNode.childrenNodes.get (word);                    if (Nownode! = null) {if (nownode.isend) {return true;                }} else {break;    }}} return false; }  public string Containsensitiveword (string text, Node root) {for (int i = 0; i < text.length (); i++) {Node nownode = root;            for (int j = i; J < Text.length (); j + +) {char word = Text.charat (j);            Nownode = NowNode.childrenNodes.get (word);                if (Nownode! = null) {if (nownode.isend) {return text.substring (i, j + 1);           }} else {break;    }}} ' return ' ";} public static void Main (string[] args) throws IOException, interruptedexception {list<string> stringlist = F        Iles.readalllines (Paths.get ("/users/hans/documents/word.txt"));        Testacwithoutfail acnofail = new Testacwithoutfail ();        Node root = new node ();        for (String text:stringlist) {Acnofail.insert (text, root);        } string string = "Tiresdfsdffffffffffaaaaaaaaaa";        Thread.Sleep (10 * 1000);        for (int i = 0; i < 1; i++) {System.out.println (Acnofail.iscontainsensitiveword (string, root)); }}}class NOde {map<character, node> childrennodes = new hashmap<> (); Boolean isend;}
Time Space occupancy time (multiple experimental average results)
Stage Time-consuming (ms)
Initialization

Read sensitive words: 38

Build comparison tree: 36

Each match 0.01 magnitude (executes 1000 times 34ms, executes 10,000 times 130ms)
Space conditions (multiple experimental average results)

Stage memory Consumption (MB)
Initialization

Read string: 3

Build comparison tree: 24

Each match Minimal
Conclusion

In this business and the violent match effect which is well worth discussing

Iv. The practice of the Service Deployment mode master (Lawless that way)
    • Deploy one server separately
    • Put keywords in a table
    • Set the scheduled task to initialize once per day
My plan
    • Put keywords in a table
    • Set the scheduled task to initialize once a day (in fact, feel the daily update is quite wasteful, write an interface, the table after the data changed, call it can be)
    • Each machine on multiple machines reads the database once a day to solve the problem of synchronization (or to cache the data with tair, but I feel like it's not a problem)

Filter sensitive Word mode

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.