POJ2525-Text Formalization

Source: Internet
Author: User

 Reprinted please indicate the source: Thank youHttp://blog.csdn.net/lyy289065406/article/details/6746954

 

General question:

First of all, the "general question" described below is not the original intention of the question, but it is impossible to perform AC based on the original intention of the question, because the test database is very different from the original intention of the question. By the way, it is recommended that you do not want to do this if you are new to poj, because if you do not have a test database, you will lose your mind and it is easy to lose your mind if you have a test database.

 

There are two types of strings, contraction and acronym, both of which have their extended form expand. The extension formats of C contraction and a acronym are provided as follows:

"Contraction or acronym"->"Expansion"

In the red part, there may be spaces between quotation marks and arrows, and there may be no spaces.

 

And then provide several texts., Each text is separated by an exclusive line #(Note that if there are several # consecutive occurrences, it is not a separator), and each text contains several lines, each with a maximum length of 80.

It is required to replace the contraction and acronym positions of each text with its extended format, and then output the Extended text again.

 

About expansion:

1. Contraction

Contraction has three forms of representation in text, namely, "as listed," (prototype) "uppercased," (uppercase) and "capitalized." (uppercase ).

Example: given that the contraction is isn' t, its extended type is not

If isn' t is detected during text detection, it is replaced with is not

If isn' t is detected, replace it with is not

If isn' t is detected, replace it with is not

If other strings such as isn' t are detected, the original string is output without replacement because they are not in any of the preceding three cases.

Note:

(A) The three forms must be checked based on the priority. If "as listed," is available, it cannot be replaced with "uppercased," or "capitalized. "; if" uppercased, "cannot be replaced with" capitalized. ".

(B) The extended and contraction expressions must be consistent.

2. acronym

Acronym has only one form of representation in text, that is, "as listed,", but there are two special requirements for replacement:

(A) For the same text, acronym can be replaced only once, and its position is the position where it appears for the first time. Other positions are not replaced.

(B) When acronym is replaced, "Space (acronym prototype)" must be added at the end of its replacement formula )".

For example, if acronym is set to CS, its extended type is computing science.

Replace CS in text with computing science (CS)

 

3. Non-contraction and acronym

Both contraction and acronym are replaced by letters.

When contraction or acronym is composed of Non-letter strings, its extended form is only "as listed,", and after acronym is replaced, "Space (acronym prototype)" is still required )".

For example:

(A) The given contraction is <! The extended type is A-C + D

Then the text file is detected. <! => Replace it with a-C + D, which is case-insensitive.

(B) If acronym is set to &, its extended type is ** rr !! $

Replace & in text with ** rr !! $ (&&&)

 

4. contraction and acronym can be the prefix or suffix of any string and must be replaced.

5. It cannot be replaced iteratively, that is, it can only be replaced for strings originally belonging to text, and cannot be replaced with extended expand strings.

 

Solution:

In fact, if the original question and the test item of the test database are the same, this question is a simple simulation question. The string is replaced and should not be placed in the "advanced question.

 

The error message intentionally misled by the original question is as follows: (note the following)

1. contraction and acronym are not the prefixes or suffixes of any words.

2. text is a standard delimiter string + punctuation

3. "contraction or acronym"-> "expansion" the red part is fixed with two spaces between the two quotation marks.

 

The information actually reflected from the test database is: (note that the following is the correct information)

1. contraction and acronym can be prefixes or suffixes of any character (string ).

2. Text is purely a combination of strings.

3. "contraction or acronym"-> "expansion" the number of spaces between two red quotes is not fixed.

 

 

Understanding the above, the processing of this question is very simple. Because there are many strings to be matched, trietree is the most efficient in this question, and hash is likely to time out.

 

1. Input contraction and acronym into trietree. The trietree here is a simple deformation, that is, the Boolean flag originally used to mark words in the node is changed to an integer ID, id = 0 indicates that the string from trietree to the current node is not contraction or acronym; Id> 0 indicates that the string from trietree to the current node is contraction or acronym, and ID is its extended ing.

2. Create an extended string two-dimensional array with the behavior ID. The corresponding row is the extended type of contraction or acronym.

3. the preceding two steps are called "dictionary input". When a dictionary is input, three forms are entered simultaneously when the contraction is input, and IDs must be input in priority. Such input may result in Repeated input. To ensure priority, if the ID of a certain position is no longer 0 when a contraction or acronym is input, skip not input, avoid replacing a higher-priority extension string with a lower-priority extension string.

4. When entering acronym, when the extended strings are expand, by the way, place "Space (acronym prototype)" at the end of the extended string.

5. When detecting text, set whether the ID of acronym is extended once, and the extended is no longer extended. When the next text is read (#), the mark is cleared.

6. Read text lines by line and replace text lines by line. Since contraction and acronym can be any character string prefix or suffix, the detection of each row should be character-by-character detection, that is, for character C, if the branch pointer of the string headed by C in trietree is null, C is directly output; otherwise, C is recursively checked to check whether it is contraction or acronym. If so, C is directly output, continue to check the next character.

 

 

Source correction

Alberta Collegiate Programming Contest 2003.10.18

Http://ugweb.cs.ualberta.ca /~ ACPC/2003/

 

// Memory time // 692 K 32 Ms # include <iostream> # include <cmath> using namespace STD; struct trie_node // trietree node {int ID; // indicates whether the string from root to the current node is a word // ID indicates the sequential number of the Word Input trietree. ID = 0 indicates that the word does not exist int Len; // word length trie_node * Next [128]; // branch, with the size of the unexpanded ASCII character count}; Class solve {public: Solve (int c, int ): C (C), A (a) {id = 0; trie_node * root = new trie_node; // construct the root node initial (Root) of trietree ); // initialize the root node expand = new char * [C * 3 + A + 1]; // apply for the extended string space for (INT I = 1; I <= C * 3 + A; I ++) Expand [I] = new char [strsize ()]; entryword (Root ); // input the extended string (Dictionary Registration) readtext (Root) for each string; // extended article }~ Solve () {for (INT I = 1; I <= C * 3 + A; I ++) Delete [] expand [I];} int strsize (void) const {return 81;} // The length of the string: Char uppalkaline (char C); // If C is a lowercase letter, it is returned in upper case; if it is another character, return its own void initial (trie_node * P); // initialize the tiretree node void entryword (trie_node * root); // enter the extended string of each string (Dictionary registration ), and map its ID to the extended string array expandvoid readtext (trie_node * root); // read text lines by line and output protected: int C; // Number of contraction int A lines by line; // acronym quantity int ID; // enter the order number of the words in trietree int keyid; // sequence number <= Ke The Yid word is contraction, and the sequence number> keyid word is acronymchar ** expand; // record the extension string of contraction and acronym, using ID as the ing}; void solve :: initial (trie_node * P) {P-> id = 0; P-> Len = 0; memset (p-> next, 0, sizeof (p-> next )); return;} void solve: entryword (trie_node * root) {int I, j, k; // temporarychar TC;/* enter contraction to trieree, at the same time, enter the three forms */for (I = 1; I <= C; I ++) {trie_node * P1 = root; // At listtrie_node * P3 = root; // uppercasedtrie_node * P2 = root; // capitalizedbo Ol flag1 = false, flag2 = false, flag3 = false; // priority mark. IDs of input extension strings are not overwritten by char TMPS [200]; gets (TMPS ); for (j = 1; TMPS [J]! = '\ "'; J ++) {// At listif (! P1-> next [TMPS [J]) {p1-> next [TMPS [J] = new trie_node; initial (P1-> next [TMPS [J]);} P1-> next [TMPS [J]-> Len = p1-> Len + 1; p1 = p1-> next [TMPS [J]; // uppercasedtc = uppphosphatase (TMPS [J]); If (! P2-> next [TC]) {P2-> next [TC] = new trie_node; initial (P2-> next [TC]);} p2-> next [TC]-> Len = P2-> Len + 1; P2 = P2-> next [TC]; // capitalizedtc = (j = 1? Uppphosphatase (TMPS [J]): TMPS [J]); If (! P3-> next [TC]) {P3-> next [TC] = new trie_node; initial (P3-> next [TC]);} p3-> next [TC]-> Len = P3-> Len + 1; P3 = P3-> next [TC];} // enter the number, establish the ing between abbreviations and Extended Forms // due to the priority principle, if the ID has previously been registered to trietree, if (P1-> ID) flag1 = true will not be registered in the future; elsep1-> id = ++ ID; If (P2-> ID) flag2 = true; elsep2-> id = ++ ID; If (P3-> ID) flag3 = true; elsep3-> id = ++ ID;/* enter the contraction extension form to expand */while (TMPS [++ J]! = '\ "'); // Skip"-> "part k = J + 1; for (j = 0; TMPS [k]! = '\ "'; K ++, J ++) {// At listif (! Flag1) Expand [P1-> id] [J] = TMPS [k]; // uppercasedif (! Flag2) Expand [P2-> id] [J] = uppphosphatase (TMPS [k]); // capitalizedif (! Flag3) {If (j = 0) Expand [P3-> id] [J] = uppphosphatase (TMPS [k]); elseexpand [P3-> id] [J] = TMPS [k] ;}} if (! Flag1) Expand [P1-> id] [J] = '\ 0'; If (! Flag2) Expand [P2-> id] [J] = '\ 0'; If (! Flag3) Expand [P3-> id] [J] = '\ 0';}/* enter acronym to trieree */keyid = ID; // register the shard ID for (I = 1; I <= A; I ++) {trie_node * P = root; bool flag = false; char TMPS [200]; gets (TMPS); For (j = 1; TMPS [J]! = '\ "'; J ++) {If (! P-> next [TMPS [J]) {P-> next [TMPS [J] = new trie_node; initial (p-> next [TMPS [J]);} p-> next [TMPS [J]-> Len = p-> Len + 1; P = p-> next [TMPS [J];} If (p-> ID) Flag = true; elsep-> id = ++ ID; /* enter the extension form of acronym to expand */If (! Flag) {While (TMPS [++ J]! = '\ "'); K = J + 1; for (j = 0; TMPS [k]! = '\ "'; K ++, J ++) Expand [p-> id] [J] = TMPS [k]; // The extension string of acronym must be appended with "(acronym)" Expand [ID] [J ++] = ''; expand [ID] [J ++] = '('; For (k = 1; TMPS [k]! = '\ "'; K ++) Expand [ID] [J ++] = TMPS [k]; Expand [ID] [J ++] = ')'; expand [ID] [J] = '\ 0';} return;} void solve: readtext (trie_node * root) {int I, j, F; bool * firstapp; // mark whether acronym has been extended once. firstapp = new bool [ID + 1]; for (F = keyid + 1; F <= ID; F ++) firstapp [f] = false; char * line = new char [strsize ()]; while (gets (line) // input the article {for (I = 0; line [I]; I ++) {/* Article Terminator */If (line [I] = '#' & line [I + 1]! = '#') {Printf ("#"); For (F = keyid + 1; F <= ID; F ++) // clear the flag firstapp [f] = false; break;} trie_node * P = root; // If (! P-> next [line [I]) {printf ("% C", line [I]); continue ;} // The Words Starting with the current line [I] character may be registered with trietree, and check whether the subsequent characters constitute the word for (j = I ;! P-> ID; j ++) {If (p-> next [line [J]) P = p-> next [line [J]; elsebreak ;} if (p-> ID! = 0) {If (p-> id <= keyid) // The word headed by line [I] Is contraction {printf ("% s ", expand [p-> id]); I + = p-> len-1;} else if (p-> ID> keyid &&! Firstapp [p-> id]) // The Word "line [I]" is acronym {// AND THE firstapp [p-> id] = true has never been extended in the current text; printf ("% s", expand [p-> id]); I + = p-> len-1 ;} else if (p-> ID> keyid & firstapp [p-> id]) // The Word "line [I]" is acronym {// and has been extended in the current text. Line [I] printf ("% C ", line [I]) ;}} else // The words starting with line [I] are not registered with trietree, and line [I] {printf ("% C ", line [I]) ;}} printf ("\ n"); // line feed per line in the text} Delete [] firstapp; return;} Char solve: uppalkaline (char C) {If (C <'A' | C> 'Z') return C; return c-32;} int main (void) {int C,; scanf ("% d", & C, & A); getchar (); // if the next input function is gets (), the return letter solve poj2525 (C, a); Return 0 ;}

 

Sample input-1

3 2

"Doesn't"-> "does not"

"Isn' t"-> "is not"

"Can't"-> "cannot"

"ACM"-> "Association for Computing Machinery"

"CS"-> "Computing Science"

#

ACM

#

The ACM can't solve

All the problems in CS. Though large and having

Using resources at its disposal, the ACM doesn't use magic. Magic isn't

Part of science, and hence not part of CS. Thank you for your

Suggestions.

Signed, ACM

#

Doesn 'tisn' tcan 'tacmcs

#

The ACM doesn' t like magic.

#

It's not that the ACM won't use it, it's

#

Just that the ACM doesn' t understand magic.

#

Sample output-1

#

Association for Computing Machinery (ACM) ACM

#

The Association for Computing Machinery (ACM) cannot solve can't solve

All the problems in computing science (CS). Though large and having

Specified resources at its disposal, the ACM does not use magic. Magic is not

Part of science, and hence not part of CS. Thank you for your

Suggestions.

Signed, ACM

#

Does notis notcannotassociation for Computing Machinery (ACM) computing science (CS)

#

The Association for Computing Machinery (ACM) does not like magic.

#

It's not that the Association for Computing Machinery (ACM) won't use it, it's

#

Just that the Association for Computing Machinery (ACM) does not understand magic.

#

Sample input-2

4 0

"Doesn't"-> "does not"

"Doesn' t"-> "&&&&&&&&"

"Doesn't"-> "%"

"Doesn' t"-> "########"

Doesn' t-> does not

Doesn' t-> &&&&&&&&

Doesn't-> %

Doesn' t-> ########

#

Sample output-2

Does not-> does not

Does not-> &&&&&&&&

Does not-> %

########

#

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.