AC state machine

Source: Internet
Author: User

1. Overview
The Aho-corasick automatic machine algorithm (AC automatic machine) was developed by Bell Laboratory in 1975. The algorithm uses finite automatic machines to skillfully convert character comparison to state transfer. This algorithm has two features: one is scanning text without backtracking, and the other is time complexity is O (n). time complexity is irrelevant to the number and length of keywords.

Let's take a look at the most primitive multi-mode matching algorithm:

Main string t, n = strlen (t ).

Pattern string pi mi = strlen (PI)

View plaincopy to clipboardprint?
For (I = 0; I <n-min (m); ++ I)
For (j = 0; j <K; ++ J)
If (n-MK <= N-I & memcmp (T [I], PK, MK) = 0)
Printf ("match/N ");
For (I = 0; I <n-min (m); ++ I)
For (j = 0; j <K; ++ J)
If (n-MK <= N-I & memcmp (T [I], PK, MK) = 0)
Printf ("match/N ");

It is the time complexity of O (Mn.

The above algorithm is stupid. Let's take a look at what the smart AC algorithm means.

2. AC algorithm ideas
AC algorithm idea: Create a deterministic tree finite state machine with a multi-mode string, and use the primary string as the input of the finite state machine to convert the state machine. When it reaches certain states, indicates that the Occurrence Mode matches.

It is a deterministic finite state machine composed of multi-mode he/she/his/hers:

 

1. The state machine first converts the state conversion path according to the real-line annotation. When all the state conversion path conditions of the Real-line annotation cannot meet, the state conversion path is based on the dotted state conversion path. For example, when the status is 0, if the input is H, it is converted to status 1; if the input is S, it is converted to status 3; otherwise, it is converted to status 0.

2. the matching process is as follows: status conversion starts from status 0 and the master string is used as the input. For example, if the main string is ushers, the state conversion process is as follows:

 

3. When the status changes to red points such as 2, 5, 7, and 9, it indicates that a pattern match occurs.

For example, if the main string is ushers, a pattern match occurs in the States 5, 2, and 9. The matched pattern strings include she, he, and hers.

Definition:

In the pre-processing stage, the AC automatic machine algorithm creates three functions, namely the function Goto, the failure function failure, and the output function output. Thus, a tree finite automatic machine is constructed.

A steering function refers to the steering relationship between States. G (PRE, x) = Next: Status pre is converted to status next after a character X is input ). If this conversion does not exist in the mode string, next = failstate.

An invalid function refers to a kind of steering relationship between the State and the State. F (PER) = Next: the conversion relationship used in case of comparison mismatch. During the construction of the steering function, the non-existent transition is represented by failstate, but the failstate is not a specific state. When the state machine is converted to the failstate state, it does not know where to go. Therefore, we need to find a meaningful state in the state machine to replace the failstate. When the State is failstate, it is automatically switched to that State.

This State node should have the following characteristics: the input characters from this state node up to the root node (State 0, it is exactly the same as the input string passed up from the status node that generates the failstate state. In addition, this state node is the node with the largest depth among all nodes that have these conditions. If no status node meets the conditions, the failure function is 0.

Exhausted. For example, if you enter any character in status 9, The failstate state is generated and the function must be invalid. The input string from status 3 to status 0 is S, and the input string from status 9 to status 0 is sreh. String S is the same, and status 3 is the only node that meets this condition, then

F (9) = 3.

To put it bluntly, the failure function is to do this:

 

That is to say, when the comparison mode string 1 is mismatched, find a mode string 2 so that P2 [0... j-1] = P1 [I-j + 1... i]. Then proceed to the comparison mode string 2. Looking at the figure above, Do you think of anything? By the way, it is the KMP algorithm. Some people say that the AC algorithm is an extension of the KMP algorithm in the case of multi-mode matching.

An output function refers to a relationship between a State and a mode string. Output (I) = {p} indicates that when the state machine reaches status I, all the mode strings in the {p} set of mode strings may have completed matching.

Example:

When the mode string is he/she/hers/his, as shown in:

Steering function:

 

Invalid function:

 

Output function:

 

3. AC code analysis
The following code refers to the acsmx. c file of the Snort intrusion detection system open source software.

3.1 Data Structure Analysis

All statuses are stored in an acsm_statetable array.

Typedef struct {

Int nextstate [alphabet_size];

Int failstate;

Acsm_pattern * matchlist;

} Acsm_statetable;

Nextstate corresponds to the steering function; failstate corresponds to the failure function; matchlist corresponds to the output function.

3.2 code analysis

The code process is as follows:

 

Reprinted from: http://blog.csdn.net/sealyao/archive/2009/09/16/4560427.aspx

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.