I wrote a topcoder exercise question a few days ago. +Code+ CommentsArticleNow, I will write the second article in this series.
Note: There will be three articles in this series, a complete set of SRM questions I have done before, with 250, 500, and 1000 points, this article will introduce the 500 points. afterwards, I will occasionally take the time to participate in the latest SRM, and then offer the questions and answers to them :)
Source: srm250 div2
Level: 500 points
Question:
Problem Statement:
Problem Statement
For computers it can be hard to determine in which language a given text is written. A simple way to try to determine the language is the following: for the given text and for some sample texts, for which we know the ages, we determine the letter frequencies and compare these.
The frequency of a letter is the total number of occurrences of that letter divided by the total number of letters in the text. To determine this, we ignore case and non-letter characters.
Once the letter frequencies of the text and of a language are known, we can calculate the difference between the two. This difference we define by the sum of the squared differences of the frequencies:
The lesser this value, the closer text resembles that language. compare text with each element of versions and return the (0-based) index of the language that has the smallest difference with text. in case of a tie, return the smallest index.
Definition
Class:
Languagerecognition
Method:
Whichlanguage
Parameters:
Vector <string>, string
Returns:
Int
Method signature:
Int whichlanguage (vector <string> ages, string text)
(Be sure your method is public)
Constraints
-
Ages contains between 1 and 50 elements, inclusive.
-
Each element of ages has length between 1 and 50, inclusive.
-
Text has length between 1 and 50, inclusive.
-
Each element of versions and text consists only of characters with ASCII value between 32 and 127, inclusive.
-
Each element of ages and text contains at least one letter ('A'-'Z' and 'A'-'Z ').
Examples
0)
{"This is an English sentence .",
"Dieser ist ein deutscher Satz .",
"C 'est une phrase francaise .",
"Dit is een Nederlandse Zin ."
}
"In Welke Taal is deze Zin geschreven? "
Returns: 3
The differences are 0.0385, 0.0377, 0.0430 and 0.0276, so the sentence is written in language 3, Dutch. note that Dutch is somewhat similar to German, somewhat less similar to English and not similar to French.
1)
{"AAAAA", "BBBB", "CCC", "DD", "E "}
"XXX"
Returns: 0
In case of a tie, return the language with the smallest index.
2)
{"AABB", "AABB", "? B! "," AB! @ # $ % "}
"AB"
Returns: 0
Ignore case and the non-letter characters.
/*
Analysis: the most important thing to do in topcoder or ACM is to understand the question: (, at this time,ProgramOnly when a Member can really feel the importance of English, because not only do they need to understand it, but also because the competition is time-limited and fast. the general meaning of this question is to make a language recognition. Of course, what you actually need is far from the scary name of this question. the specific description is to give you a few sentences as the candidate sentence, and then give you a sentence as the sentence to be recognized. you need to select which type of sentence the sentence belongs. the judgment is based on the Occurrence Frequency of letters. The minimum value of the occurrence frequency of each letter and the sum of the frequencies of each sentence is the closest, then the sentence index is selected. this question is actually not difficult. The key is to clarify the ideas, divide the functions to be implemented into subfunctions for implementation, and finally it can be easily completed. my problem-solving code is as follows:
*/
# Include < Iostream >
# Include < String >
# Include < Stdio. h >
# Include < Vector >
# Include < Set >
# Include < Map >
# Include < Algorithm >
Using Namespace STD;
Typedef Vector < Float > Vec_flt;
Class Languagerecognition
{
Public :
Int Lannum;
Vec_flt DIFs;
Public :
Int Whichlanguage (vector < String > Ages, String Text)
{
Int Mostone = 0 ;
Float Smallest = 100 ;
Float TMP;
Vec_flt textfren;
Vec_flt curfren;
Textfren = Getvecofstring (text );
For ( Int I = 0 ; I < Ages. Size (); I ++ )
{
Curfren = Getvecofstring (ages [I]);
TMP = Getdifferences (curfren, textfren );
Cout < TMP < Endl;
If (TMP < Smallest) // If it is smaller, change
{
Smallest=TMP;
Mostone=I;
}
}
Return Mostone;
}
Float Getdifferences (vec_flt F1, vec_flt F2)
{
Float Returnvalue = 0 ;
For ( Int I = 0 ; I < 26 ; I ++ )
{
Returnvalue+ =(F1 [I]-F2 [I])*(F1 [I]-F2 [I]);
}
Return Returnvalue;
}
Vec_flt getvecofstring ( String Str)
{
Vec_flt returnflt ( 26 , 0 );
Vector < Int > Letters ( 26 , 0 );
Int Totalletters = 0 ;
For ( Int I = 0 ; I < Str. Size (); I ++ ) // First, calculate the total number and separate Number of valid letters.
{
If (STR [I] > = 97 && STR [I] <= 122 ) // Non-capital letters are counted.
{
Letters [STR [I]-97]++;
Totalletters++;
}
Else If (STR [I] > = 65 && STR [I] <= 90 ) // Non-lowercase letters are counted.
{
Letters [STR [I]-65]++;
Totalletters++;
}
}
If (Totalletters ! = 0 )
{
For ( Int I = 0 ; I < 26 ; I ++ ) // Generation frequency
{
Returnflt [I]=Letters [I]/(Float) Totalletters;
}
}
Return Returnflt;
}
} ;
Oh, isn't it difficult? Therefore, high-score questions are not necessarily difficult. The key is to clarify the ideas to quickly solve them.