About Dirty Dictionary filtering problems-filtering dirty data with regular expressions

Source: Internet
Author: User
Tags bool datetime readline regular expression
Data | problems | regular

Method One: Use regular expressions

1//Dirty Dictionary data store file path
2 private static string File_name= "Zang.txt";
3//Dirty data dictionary table, such as: Dirty data One | Dirty data two | dirty data three
4 public static string Dirtystr= "";
5
6 Public Validdirty ()
7 {
8 if (httpruntime.cache["Regex"]==null)
9 {
Ten dirtystr=readdic ();
11//Regular expression for detecting dirty dictionaries
The Regex validatereg= The new regex (^ (?!) +dirtystr+ "). (? <! " +dirtystr+ ")) *$", regexoptions.compiled|regexoptions.explicitcapture);
HttpRuntime.Cache.Insert ("Regex", Validatereg,null,datetime.now.addminutes (), TimeSpan.Zero);
14}
15
16}
Readdic private String ()
18 {
file_name=environment.currentdirectory+ "\" +file_name;
20
if (! File.exists (file_name))
22 {
Console.WriteLine ("{0} does not exist.", file_name);
Return "";
25}
StreamReader sr = File.OpenText (file_name);
input= String "";
A while (Sr. Peek () >-1)
29 {
input = Sr. ReadLine ();
31}
32
Sr. Close ();
return to input;
35
36}
37
38
validbyreg bool (String str)
40 {
A regex reg= (regex) httpruntime.cache["regex"];
Return Reg. IsMatch (str);
43
44}

Feel this method of execution is not very high, simple test 1000 words of the article, Dirty Dictionary has more than 800 keywords
The formula is 1.238 seconds, we have no better way, please do not hesitate to enlighten!

Method Two: General circulation search method

public bool Validgeneral (string str)
{

if (! File.exists (file_name))
{
Console.WriteLine ("File path or file path does not have error message");
return false;
}
Else
{
StreamReader objreader = new StreamReader (file_name,system.text.encoding.getencoding ("gb2312"));
String Sline= "";
ArrayList arrtext = new ArrayList ();

                while (sline!= null )
                {
                     sline = objReader.ReadLine ();
                     if (sline!= null)
                         Arrtext.add (sline);
                    
               }
                Objreader.close ( );


foreach (String soutput in Arrtext)
{
String[] Strarr=soutput.split (' | ');

for (int i = 0; i < strarr.length; i++)
{
if (str. IndexOf (Strarr[i])!=-1)
{
return false;
}

}

}
return true;

}

}

The following is the test method, what is the problem please point out!

1DateTime T1 =datetime.now;
2 string str= "213";
3 str+= "Cherish crystal Love cherish crystal Love cherish crystal Love";
4 str+= "Cherish crystal Love cherish crystal Love cherish crystal Love";
5 str+= "Cherish crystal Love cherish crystal Love cherish crystal Love";
6 str+= "Cherish crystal Love cherish crystal Love cherish crystal Love";
7 str+= "Cherish crystal Love cherish crystal Love cherish crystal Love";
8 str+= "Cherish crystal Love cherish crystal Love cherish crystal Love";
9 str+= "Cherish crystal Love cherish crystal Love cherish crystal Love";
Ten str+= "Cherish crystal Love cherish crystal Love cherish crystal Love";
One str+= "Cherish crystal Love cherish crystal Love cherish crystal Love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love cherish the love of Crystal Crystal Love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love cherish the love of Crystal Crystal Love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal Love cherish crystal Love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish the love of crystal crystal Love Cherish Crystal Love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
Panax Notoginseng str+= "Cherish the love of crystal Crystal love to cherish the love of Crystal, Crystal Love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
Wu str+= "Cherish crystal Love cherish crystal Love cherish crystal Love";
str+= "Cherish crystal Love cherish crystal love Cherish crystal Love love";
Validdirty vd=new Validdirty ();
Console.WriteLine (VD. Validbyreg (str));
DateTime T2 =datetime.now;
TimeSpan ts=t2-t1;
Console.WriteLine (TS. TotalMilliseconds);
Console.read ();

Algorithm

Retrieving text file length/time consuming (MS)

Regular algorithm

10 Chinese Characters/980

100 Chinese Characters/999

1000 Chinese Characters/1234

Common algorithm

10 Chinese characters/234

100 Chinese Characters/234

1000 Chinese Characters/265



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.