Pattern matching with finite automata (finite automata)

Source: Internet
Author: User

I. Finite automata definitions and basic terminology:

A finite automaton M is a 5-tuple (Q,, A,σ,δ), wherein:

    • Q is a finite set of all States;
    • ∈q (belongs) is the initial state;
    • A⊆q (subset) is a collection of accepted states; (corresponds to multi-mode?) )
    • Σ is a limited input alphabet;    
    • Δ is a transfer function from Q *σ, called the transfer function of finite automata m;


Notation and terminology:

    • σ*  represents all the finite-length string collections formed with all the characters in the alphabet Σ.
    • n the length of the input string.
    • m pattern string (pattern string), also known as the end state m, when the state is m, the pattern string of the M-length matches successfully.
    • |x|: The length of the string x, as shown in symbol notation.
    • : The string w is the prefix of the string x, as shown in symbolic notation.
    •  : The string w is the suffix of the string x, as shown in symbol notation. (Note that the prefix/suffix follows the delivery rules)
    • ε: Represents an empty string, which is the suffix, prefix, of all strings. (ε read as epsilon)
    • a: The character A in the following refers to all characters (a∈σ), not specifically the character ' a '.  

two. The introduced function definition:

The M state changes from Q to δ (q, a);

final function φ (finite state functions) (φ reads "FAI", corresponding lowercase is Φ) is a function from σ * to Q, φ (w) is the state after the termination of the perpetual-motion M-Scan string W; m accepts the string W and only if φ (w) ∈a, the function Φ has the following recursive relationship definitions:
φ (ε) = q0; (The end state of the empty string ε is q0)

φ (WA) =δ (φ (w), a) (where w∈σ*,a∈σ)

is the mapping from σ * to {0,1, ..., m}, Σ (x) is the suffix of the string x while is the maximum length of the prefix for p;
σ (x) = MAX{K:PK? x}
P0 =ε is the suffix of all strings;
note: The main meaning of the suffix function is to find out when the current match fails. Find out if the matched partial string x is the prefix of the pattern string p to be matched, that is, the match can skip the part length in X (σ (x)), can be used to implement the transfer process, and also indicates that the state after accepting the input string x (end state), that is also used to implement the end state function.
three, string matching automata (string-matching automation)


(a) is a state-transition diagram of an automaton that accepts all strings ending with the string "Ababaca". Where state 0 is the initial state, state 7 is the only accepted state (single-mode match).

    •   table (c) is the final state table in which the automaton processes (accepts) the input text t= "Abababacaba". When the input character T[i], the final state of this string t[0...i] φ (t[0...i) corresponds to the last column of table (c) one by one. There is t["abababaca"] = P.length = 7 (the only accepted state), that is, in the T string to match the success pattern string P, the end position is 9, the starting position is (9-p.length+1) = 3.

3. String matching finite automaton definition:

Given a pattern (pattern) string p[1...m], its corresponding string matching finite automaton is defined as follows:

    1. The state set q = {0,1,... m}, the start state q0 is the state 0,state m is the only accepted state;
    2. The transfer function Δ can be represented by a suffix function (this is important because the state transfer function is an abstract concept and the prefix function can be represented by code):

δ (q,a) =σ (pq,a) < equation one >
Assuming that the currently read-in string is T, in order for T's string (ending in t[i] to match the pattern string pj, the PJ is required to meet the suffix of ti, while assuming the Q =φ (TI), the instruction reads the word string ti after the automaton m state into Q, and according to the transfer function < equation one > Q is the prefix of the maximum length of the pattern string, and the suffix of ti, so there is PQ in the state Q. TI and Q =σ (TI) (when Q equals m, the description pattern string p is the entire Ti suffix, also means that the matching lookup succeeds), so there is σ (Ti) = q, to obtain a perpetual motion also supports the following equation (the end state function is also abstract, converted to a suffix function expression, can be represented by code):
φ (ti) =σ (ti) (i = 0,1,... N) < equation two >

2 At the same time there are two lemma (specific proof can refer to the introduction of the algorithm):

lemma 1, postfix function inequalities:

σ (XA) ≤σ (x) + 1 (for any string x, and letter a)

lemma 2, suffix function recursive lemma:

for any string x, as well as the letter A, if Q =σ (x), there are:
σ (XA) =σ (PQA)


< equation two > can be proved by mathematical induction, specifically as follows:
1, when i = 0, because T0 =ε, so there is φ (ti) = 0 =σ (ti)
2, assuming φ (ti) =σ (ti), prove φ (ti+1) =σ (ti+1), with Q for φ (TI), with the letter A for t[i+1], there are:
φ (ti+1) =φ (TIA) (ti+1 = = Tia)
=δ (φ (Ti), a) (according to the definition of the end state function)
=δ (Q,a) (according to the definition of Q)
=σ (PQA) (according to equation one)
=σ (Tia) (according to Lemma II)
=σ (ti+1) (ti+1 = = Tia)


from the above can be known when reading the end state of t I (i.e. read into the t[i] after the transfer function state) is equal to the pattern length, the match succeeds, the following is the finite automaton matching algorithm pseudo-code:
here is the pseudo-code that implements the transfer function according to < equation >:

Pattern matching with finite automata (finite automata)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.