See http://www.jb51.net/article/20575.htm in the text
But when I test here, the regex is about one times faster. But still not very satisfied, because we use a lot of dirty word on the site, the efficiency has some impact, after some thinking, they made an algorithm. Test on their own machine, the use of the original text of the dirty font, 0x19c string length, 1000 cycles, text lookup time 1933.47ms,regex used 1216.719ms, and my algorithm only used 244.125ms.
Update: A new BitArray is used to determine whether a char has ever appeared in all dirty words. The total time was reduced from 244ms to 34ms.
The main algorithm is as shown in the code
Copy Code code as follows:
private static Dictionary dic = new Dictionary ();
private static BitArray Fastcheck = new BitArray (char. MaxValue);
static void Prepare ()
{
string[] Badwords =//read from File
foreach (string word in badwords)
{
if (!dic. ContainsKey (word))
{
Dic. ADD (word, NULL);
MaxLength = Math.max (maxlength, Word. Length);
Fastcheck[word[0]] = true;
}
}
}
Use of the time
Copy Code code as follows:
int index = 0;
while (Index < target. Length)
{
if (!fastcheck[target[index]])
{
while (Index < target. Length-1 &&!fastcheck[target[++index]]);
}
for (int j = 0; J < math.min (MaxLength, Target. Length-index); J + +)
{
String sub = target. Substring (index, j);
if (DIC). ContainsKey (sub))
{
Sb. Replace (Sub, "* * *", index, J);
Index + j;
Break
}
}
index++;
}