I. Finite automata definitions and basic terminology:
A finite automaton M is a 5-tuple (Q,, A,σ,δ), wherein:
- Q is a finite set of all States;
- ∈q (belongs) is the initial state;
- A⊆q (subset) is a collection of accepted states; (corresponds to multi-mode?) )
- Σ is a limited input alphabet;
- Δ is a transfer function from Q *σ, called the transfer function of finite automata m;
Notation and terminology:
- σ* represents all the finite-length string collections formed with all the characters in the alphabet Σ.
- n the length of the input string.
- m pattern string (pattern string), also known as the end state m, when the state is m, the pattern string of the M-length matches successfully.
- |x|: The length of the string x, as shown in symbol notation.
- : The string w is the prefix of the string x, as shown in symbolic notation.
-  : The string w is the suffix of the string x, as shown in symbol notation. (Note that the prefix/suffix follows the delivery rules)
- ε: Represents an empty string, which is the suffix, prefix, of all strings. (ε read as epsilon)
- a: The character A in the following refers to all characters (a∈σ), not specifically the character ' a '.  
two. The introduced function definition:
The M state changes from Q to δ (q, a);
final function φ (finite state functions) (φ reads "FAI", corresponding lowercase is Φ) is a function from σ * to Q, φ (w) is the state after the termination of the perpetual-motion M-Scan string W; m accepts the string W and only if φ (w) ∈a, the function Φ has the following recursive relationship definitions:
φ (ε) = q0; (The end state of the empty string ε is q0)
φ (WA) =δ (φ (w), a) (where w∈σ*,a∈σ)
is the mapping from σ * to {0,1, ..., m}, Σ (x) is the suffix of the string x while is the maximum length of the prefix for p;
σ (x) = MAX{K:PK? x}
P0 =ε is the suffix of all strings;
note: The main meaning of the suffix function is to find out when the current match fails. Find out if the matched partial string x is the prefix of the pattern string p to be matched, that is, the match can skip the part length in X (σ (x)), can be used to implement the transfer process, and also indicates that the state after accepting the input string x (end state), that is also used to implement the end state function.
three, string matching automata (string-matching automation)
(a) is a state-transition diagram of an automaton that accepts all strings ending with the string "Ababaca". Where state 0 is the initial state, state 7 is the only accepted state (single-mode match).
-
-
-
-
- table (c) is the final state table in which the automaton processes (accepts) the input text t= "Abababacaba". When the input character T[i], the final state of this string t[0...i] φ (t[0...i) corresponds to the last column of table (c) one by one. There is t["abababaca"] = P.length = 7 (the only accepted state), that is, in the T string to match the success pattern string P, the end position is 9, the starting position is (9-p.length+1) = 3.
3. String matching finite automaton definition:
Given a pattern (pattern) string p[1...m], its corresponding string matching finite automaton is defined as follows:
- The state set q = {0,1,... m}, the start state q0 is the state 0,state m is the only accepted state;
- The transfer function Δ can be represented by a suffix function (this is important because the state transfer function is an abstract concept and the prefix function can be represented by code):
δ (q,a) =σ (pq,a) < equation one >
Assuming that the currently read-in string is T, in order for T's string (ending in t[i] to match the pattern string pj, the PJ is required to meet the suffix of ti, while assuming the Q =φ (TI), the instruction reads the word string ti after the automaton m state into Q, and according to the transfer function < equation one > Q is the prefix of the maximum length of the pattern string, and the suffix of ti, so there is PQ in the state Q. TI and Q =σ (TI) (when Q equals m, the description pattern string p is the entire Ti suffix, also means that the matching lookup succeeds), so there is σ (Ti) = q, to obtain a perpetual motion also supports the following equation (the end state function is also abstract, converted to a suffix function expression, can be represented by code):
φ (ti) =σ (ti) (i = 0,1,... N) < equation two >
2 At the same time there are two lemma (specific proof can refer to the introduction of the algorithm):
lemma 1, postfix function inequalities:
σ (XA) ≤σ (x) + 1 (for any string x, and letter a)
lemma 2, suffix function recursive lemma:
for any string x, as well as the letter A, if Q =σ (x), there are:
σ (XA) =σ (PQA)
< equation two > can be proved by mathematical induction, specifically as follows:
1, when i = 0, because T0 =ε, so there is φ (ti) = 0 =σ (ti)
2, assuming φ (ti) =σ (ti), prove φ (ti+1) =σ (ti+1), with Q for φ (TI), with the letter A for t[i+1], there are:
φ (ti+1) =φ (TIA) (ti+1 = = Tia)
=δ (φ (Ti), a) (according to the definition of the end state function)
=δ (Q,a) (according to the definition of Q)
=σ (PQA) (according to equation one)
=σ (Tia) (according to Lemma II)
=σ (ti+1) (ti+1 = = Tia)
from the above can be known when reading the end state of t I (i.e. read into the t[i] after the transfer function state) is equal to the pattern length, the match succeeds, the following is the finite automaton matching algorithm pseudo-code:
here is the pseudo-code that implements the transfer function according to < equation >:
Pattern matching with finite automata (finite automata)