[Algorithm] uses finite automaton for String Matching

Source: Internet
Author: User
ArticleDirectory
    • Input
    • Output
    • Sample
Timus 1102. Strange dialog requires you to determine whether the specified input is a valid dialog.
1102. Strange Dialog

Time Limit: 1.0 second
Memory limit: 16 MB

One entity named "one" tells with his friend "puton" and their conversation is interesting. "One" can say words "out" and "output", besides he Callhis friend by name. "puton" can say words "in", "input" and "one ". they understand each other perfect and even write dialogue in strings without spaces.

You haveNStrings. Find which of them are dialogues.

Input

In the first line of input there is one non-negative integerN≤ 1000. NextNLines contain non-empty strings. Each string consists of small Latin letters. Total length of all strings is no more then 107 characters.

Output

Output consistsNLines. Line contains Word "yes", if string is some dialogue of "one" and "puton", otherwise "no ".

Sample
Input Output
6
Puton
Inonputin
Oneputonininputoutoutput
Oneinputwooutoutput
Outpu
Utput
Yes
No
Yes
No
No
No

Problem Author:Katya ovechkina
Problem Source:Tetrahedron team contest May 2001

Question

One and puton talk. One can only say three words: Out, output, and puton. Puton can only say the words in, input, and one. The conversations between them are directly connected by words (there is no space between words ).

Your task is to determine whether the given input (which only contains lowercase Latin letters) is a valid dialog.

Mathematical background

OneFinite Automaton(Deterministic finite automaton, DFA)MIs a 5-tuple (Q, q0, A, Σ, Delta), where:

    • Q isStatusA finite set
    • Q0 and q areInitial status
    • A q isAcceptance statusSet
    • Σ is limitedInput alphabet
    • Delta is a function from Q x Σ to Q, calledMOfTransfer Function

Finite automatic machines start with the status q0. Each time a character is read into the input string. If a finite automatic machine reads the input character a During the Q state, it changes from the Q state to the delta (Q, A) state (a transfer is performed ). Every time the current Q status belongs to a, it means the automatic machine.M AcceptedThe string read so far. The input that is not accepted is calledRejected.

Many String MatchingAlgorithmYou must create a finite automatic machine.TScan methods to find out the ModeP. The automatic machines used for string matching are very effective: they only check each text character once and check the time of each text character as a constant. Therefore, the time required after the automatic machine is created is round (n ).

Solutions

Our task is to determine whether the input is composed of only the six words given in the question. This is a multi-mode string matching problem. There are six modes in total.

Now, we need to construct the corresponding string matching automatic machine according to the given pattern, as shown below:

    • Status set q = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 99}
    • Initial status q0 = 0
    • Acceptance status set a = {0, 1, 2, 3}
    • Input alphabet Σ = {A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, r, S, T, U, V, W, X, Y, Z}
    • The transition function delta uses the state transition diagram as follows:

This string-matching automatic machine has the following features:

    • She is a multi-mode string matching automatic machine.
    • She has a special status 99. Whenever the automatic machine reads the input character a in status Q, if the corresponding directed edge is not found in the above status transition diagram, it is transferred to this special status. In addition, stop the automatic mechanism immediately and return a matching failure.
    • She has multiple acceptance statuses, which are consecutively numbered from the initial status 0. This isProgramTo determine whether or not to match.

The construction in this state transition diagram starts with the following six modes:

One 0-> 4-> 5-> 0 *
Puton 0-> 11-> 12-> 13-> 14-> 0 *
In 0-> 7-> 1 *
Out 0-> 4-> 6-> 1 *
Input 0-> 7-> 1 *-> 8-> 9-> 2 *
Output 0-> 4-> 6-> 1 *-> 8-> 9-> 2 *

Then, you need to carefully consider the transition relationship between various States.

Finally, the corresponding C # program is as follows:

 Using System; Namespace Skyiv. Ben. timus { // Http://acm.timus.ru/problem.aspx? Space = 1 & num = 1102  Sealed class T1102 { Static readonly int [] A = {0, 0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 3, 4, 5, 0, 0, 0, 6, 7, 0, 0, 0, 0 }; Static readonly int [,] Delta = { // * Transfer Function of e I n o p t u finite Automatic Machine {99, 99, 7, 99, 4, 11, 99, 99 }, // 0 accept status, initial status {99, 99, 7, 99, 4, 8, 99, 99 }, // 1 accept status {99, 99, 7, 99, 10, 11, 99, 99 }, // 2 accept status {99, 0, 7, 99, 4, 11, 99, 99 }, // 3 accept status {99, 99, 99, 5, 99, 99, 99, 6 },// 4 Status {99, 0, 99, 99, 99, 99, 99, 99 }, // 5 Status {99, 99, 99, 99, 99, 99, 1, 99 }, // 6 Status {99, 99, 99, 1, 99, 99, 99, 99 }, // 7 Status {99, 99, 99, 99, 99, 99, 99, 9 }, // 8 Status {99, 99, 99, 99, 99, 99, 2, 99 }, // 9 Status {99, 99, 99, 3, 99, 99, 99, 6 }, // 10 Status {99, 99, 99, 99, 99, 99, 99, 12 }, // 11 Status {99, 99, 99, 99, 99, 99, 13, 99 }, // 12 Status {99, 99, 99, 99, 14, 99, 99, 99 }, // 13 Status {99, 99, 99, 0, 99, 99, 99, 99 }, // 14 Status }; Static void Main (){ For ( Int C, q = 0, n = Int . Parse ( Console . Readline (); n> 0; n --, q = 0 ){ While (C = Console . Read ())! = '\ N' ) If (Q <99 & C! ='\ R' ) Q = delta [q, a [c- 'A' ]; Console . Writeline (q <4 )? "Yes" : "No" );}}}}

Because the mode only contains the seven letters E, I, N, O, P, T, and U, array A (the array contains 26 elements, corresponding to 26 lowercase Latin letters in turn) map the input alphabet to an integer from 0 to 7, 1 to 7 correspond to the first seven letters in sequence, and 0 corresponds to other letters. Then, the two-dimensional array Delta represents the transition function delta, which is obtained directly from the above State Conversion Diagram. The rest is simple. The main method reads each input row in the for loop in sequence, then executes the automatic machine in the while loop, and then outputs the matching result based on the status of the automatic machine.

Further discussion

The input size of this question cannot exceed 107 characters, the time limit is 1.0 seconds, and the memory limit is 16 Mb. The following table lists the running time and memory usage of several submitted programs:

ID Date Author Problem Language Judgement
Result
Execution
Time
Memory
Used
2612947 19:52:41 20 May 2009 Skyivben 1102 C ++ Accepted 0.062 121 KB
2612930 19:44:58 20 May 2009 Skyivben 1102 C # Accepted 0.125 10 561 KB
2612807 17:17:31 20 May 2009 Skyivben 1102 C # Accepted 0.718 857 KB

The third row in the preceding table is the result submitted by the C # program. It can be seen that the running time of this C # program has reached 0.718 seconds, which is close to the question's time limit.

If the time limit for this question is changed to 0.2 seconds, what should we do? Search for more efficient string matching algorithms?

In fact, the string matching algorithm used in the previous C # program is very efficient and there is almost no room for improvement. The bottleneck of this C # program is not the string matching algorithm, but the I/O, that is, console . the read method is not efficient enough, and the method needs to be called about 107 times in the internal loop. Replace the main method with the following program fragment:

  static void  main () { var  S =  New byte  [10000000 + 100];  int  I = 0, n =  console . openstandardinput (). read (S, 0, S. length);  while  (s [I ++]! =  '\ n' );  for  ( int  C, q = 0; I 
  
    while  (C = s [I ++])! = 
    '\ n' ) 
    If  (q <99 & C! = 
    '\ R' ) q = delta [q, a [C-
    'A' ]; 
    console . writeline (q <4 )? 
    "yes" : 
   " no ") ;}
  

You can shorten the running time to 0.125 seconds, as shown in the second row in the preceding table. In this C # program, we callStreamThe read method of the class reads all the input to the byte array S, avoiding multiple calls.Console. Read method. Because the input only contains lowercase Latin letters, and does not contain Chinese characters and other characters that require two-byte encoding, you can use a byte array.Byte[], Rather than using character ArraysChar[]. However, the memory usage has increased from 857 KB to 10,561 kb.

What if the time limit is 0.2 seconds and the memory limit is 1 MB?

Simply translate the first C # program into a C or C ++ program. Use the getchar of C/C ++ to replace the C # program.Console. Read method. The running time is reduced to 0.062 seconds, and the memory usage is reduced to 121 KB, as shown in the first row in the table above. It can be seen that the getchar of C/C ++ is very efficient. In fact, in the vast majority of C/C ++ implementations, getchar should be a macro rather than a function.

For more information about this question, see a post on the csdn Forum: super depressed. All four methods are "memory limit exceeded ".

When it comes to string matching, most people will think of the classic knuth-Morris-Pratt algorithm. This algorithm, designed by Donald knuth (author of the classical masterpiece of computer programming and author of the famous electronic typographical system Tex), is a single pattern matching algorithm using finite automatic machines. It does not need to calculate the transfer function delta, the matching time is round (n), and only uses the auxiliary array π [1, m], it is within the time of round (m, calculated in advance based on the mode. Array π allows us to calculate (in the sense of spread) The function delta as needed.

Questions

There is a bug in the second C # program above, but this bug is not displayed in most cases. So this program can be accepted.

Dear reader, can you find out this bug?

Note: This bug is irrelevant to the string matching algorithm and does not exist in the first C # program.

Algorithm and data structure directory timus directory

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.