In the second chapter of "Programming Zhuji", the question of a modified word is mentioned, which refers to a word that can be changed by changing the order of the letters in other words, also called brother words, such as army->mary. Some algorithms can be deduced from the modified words, including the problem of string inclusions, whether the two strings are modified words, and the problem of finding the set of the modified words in the dictionary.
One, string inclusion problem
(1) Problem Description: Existence of string 1 and string 2, assuming that the string 2 is relatively short, how quickly to determine that the character in string 2 exists in the string 1 (assuming that the string contains only letters)?
(2) For example: String 1 is Abcdefghijk and string 2 is ABCDE, then string 1 contains the string 2 because the letters contained in the string 2 are also in the string 1.
(3) Solution:
Idea of a
The most straightforward idea is to compare each character in string 2 by polling string 1 to see if it is inside the string 1. It is obvious that this time efficiency is O (n*m).
/*************************************************************************
> File Name:test.cpp
> Author:songlee
************************************************************************/
#include < iostream>
#include <string>
using namespace std;
void Compare (String long_str, String short_str)
{
int i,j;
For (I=0, I<short_str.size (); ++i)
{for
(j=0; j<long_str.size (); ++j)
{
if (long_str[j] = = Short_str[i])
{break
;
}
}
if (j = = Long_str.size ())
{
cout << "false" << Endl;
return;
}
cout << "true" << Endl;
return;
}
int main ()
{
string l = "Abcdefghijk";
string s = "ABCDEF";
Compare (L, s);
return 0;
}
Idea Two
Here because the string is assumed to contain only letters, we can use an extra array of flag[26] as a 26 character identifier, first traversing the long string will correspond to the identity position 1, and then traversing the short string, if the corresponding indicator bit is 1, contains; The time complexity of this method is O (n+m), and in order to improve space efficiency, 26 bit bits are used instead of arrays (bitset containers).
/*************************************************************************
> File Name:test1.cpp
> Author:songlee
************************************************************************/
#include < iostream>
#include <bitset>
#include <string>
using namespace std;
BOOL Compare (String long_str, String short_str)
{
bitset<26> flag;
for (int i=0; i<long_str.size (); ++i)
{
//Flag.set (n) place nth bit to 1
flag.set (long_str[i]-' A ');
}
for (int i=0; i<short_str.size (); ++i)
{
//Flag.test (n) to determine if nth bit is 1
if (!flag.test (short_str[i]-' A ') ) return
false;
return true;
}
int main ()
{
string l = "Abcdefghijk";
string s = "Abcdez";
if (Compare (l, s))
cout << "true" << Endl;
else
cout << "false" << Endl;
return 0;
}
This method can also be optimized, for example, if the long string prefix is a short string, then we can not need to n+m time, and only need 2m times. Concrete realization please think for yourself.
Idea Three
Assign a prime number to each letter, starting from 2 to 3,5,7 ... Iterate over the long string to get the product of each character's corresponding prime number. The short string is then traversed to determine whether the product can be divisible by the number of characters in the short strings, and if there is a remainder in the result, there are mismatched letters; if the whole process has no remainder, then the letters in the short word string are in the long string. The time complexity of this method is also O (n+m), which requires 26 extra space prime, but one disadvantage of this approach is the need to deal with large integers, because the product can be very large. (Here we use the int64_t and uint64_t defined in the <cstdint> header file)
/************************************************************************* > File Name:test2.cpp > Author:s Onglee ************************************************************************/#include <iostream> #
include<string> #include <stdint.h>//#include <cstdint>//c++11 using namespace std; BOOL Compare (String long_str, string short_str) {unsigned int primenum[26] = {2,3,5,7,11,13,17,19,23,29,31,37,41,43,4
7, 53,59,61,67,71,73,79,83,89,97,101};
/* int64_t and uint64_t respectively represent 64-bit signed and unsigned reshaping number * * * in different digits of the machine platform general, are 64-bit * * uint64_t ch = 1;
for (int i=0; i<long_str.size (); ++i) {ch = ch*primenum[long_str[i]-' A '];
for (int i=0; I<short_str.size (), ++i) {if (ch%primenum[short_str[i]-' A ']!= 0) return false;
return true;
int main () {string L = ' Abcdefghijk ';
string s = "Abcdek";
if (Compare (l, s)) cout << "true" << Endl; else cout << "false" << Endl;
return 0;
}
Two, compare two strings is a modified word
(1) Problem Description: If two strings of characters are the same, but the order is not the same, is considered a brother string, asking how to quickly match the brother string (e.g., bad and ADB is brother string).
(2) Note: The 1th discusses the problem of string inclusion, but do not assume that two strings are included with each other, such as AABCDE and EDCBA, but they are not modified words.
(3) Solution:
Idea of a
To assign a prime number to each letter, you can determine whether the product of the prime number of two strings is equal. As with the prime method, time complexity is O (n+m), which requires dealing with large integers.
/************************************************************************* > File Name:test3.cpp > Author:s Onglee ************************************************************************/#include <iostream> #
include<string> #include <stdint.h>//#include <cstdint>//c++11 using namespace std;
BOOL Compare (string s1, string s2) {unsigned int primenum[26] = {2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,
53,59,61,67,71,73,79,83,89,97,101};
uint64_t ch = 1;
for (int i=0; i<s1.size (); ++i) {ch = ch*primenum[s1[i]-' a '];
for (int i=0; i<s2.size (); ++i) {ch = ch/primenum[s2[i]-' a '];
if (ch = = 1) return true;
else return false;
int main () {string S1 = ' abandon ';
String s2 = "Banadon";
if (Compare (S1, S2)) cout << "They are brother words!" << Endl;
else cout << "They aren ' t brother words!" << Endl;
return 0;
}
Idea Two
Sort two strings in alphabetical order to see if the sorted strings are equal and, if they are equal, a brother string (a modified word). The time efficiency of this method varies according to the sort algorithm you use. Of course, you can write your own sort algorithm, where we use the sort () function of the C + + STL to sort the strings.
/*************************************************************************
> File Name:test4.cpp
> Author:songlee
************************************************************************/
#include < iostream>
#include <algorithm>
#include <string>
using namespace std;
Custom Order function (two-yuan predicate)
bool MyFunction (char I, Char j)
{return
i > J;
}
BOOL Compare (string s1, string s2)
{
//The S1,S2 is sorted using a generic algorithm, sort () using the Quick Sort algorithm
sort (s1.begin (), S1.end (), MyFunction);
Sort (S2.begin (), S2.end (), myfunction);
if (!s1.compare (S2))//Equality returns 0 return
true;
else return
false;
int main ()
{
string s1 = ' abandon ';
String s2 = "Banadon";
if (Compare (S1, S2))
cout << "They are brother words!" << Endl;
else
cout << "They aren ' t brother words!" << Endl;
return 0;
}
Third, the dictionary to find all the modified Word set (emphasis)
(1) Problem Description: Given an English dictionary, find all the modified word sets.
(2) Solution:
Idea of a
The quickest way to think about this problem is to compare each word with the other words in the dictionary. However, assuming that a comparison takes at least 1 microseconds, a dictionary with 200,000 words will cost: 200000 words x 200000 comparison/Word X 1 microseconds/comparison = 40000x10^6 seconds = 40,000 seconds ≈ 11.1 hours. The number of comparisons is too high, resulting in inefficiencies, and we need to find more efficient methods.
Idea Two
Identifies each word in the dictionary so that the word in the same modified part of speech has the same identity and then sets the word with the same identity. Sort each word alphabetically, and sort the resulting string as the word's identity. Then the problem-solving process can be divided into three steps: The first step, read into the dictionary file, the word is sorted to be identified; the second step is to sort all the words in the order they are identified; The third step is to place each word in the same modified part of speech in the same line.
There is an identity-word (key-value) pair, and it's easy to think of the associated container map in C + +, and the benefit of using map is:
dynamic management of memory, container size dynamic change;
The word corresponds to its logo one by one, and the word for the same identifier (key) is appended directly to the value (values);
You do not need to sort by identity because map is automatically sorted by keyword (the default ascending order).
So, after each word and its logo is stored in map, you can traverse the output directly, each map element is a set of modified words.
C + + Implementation code is as follows:
/************************************************************************* > File Name:test5.cpp > Author:s Onglee ************************************************************************/#include <iostream> # include<fstream>//File I/O #include <map>//Map #include <string>//String #include <algori
Thm>//Sort using namespace std;
/* *map is an associative container in C + + * Key word is not repeatable * map<string, string> Word;
/* Custom comparison functions (for sorting)/bool MyFunction (char I, Char j) {return i < J;
* * * For each word sorting * after the string as a keyword, the original word as a value * into the map/void Sign_sort (const char* DIC) {//File flow Ifstream in (DIC);
if (!in) {cout << "couldn ' t Open File:" + string (DIC) << Endl;
Return
} string Aword;
String asign;
while (in >> aword) {asign = Aword;
Sort (Asign.begin (), Asign.end (), myfunction);
If the identity does not exist, create a new map element, if it exists, add it behind the value word[asign] + = Aword + ""; }
In.close ();
/* * Write output file/void Write_file (const char* file) {ofstream out (file);
if (!out) {cout << "couldn ' t Create file:" + string (file) << Endl;
Return
map<string, String>::iterator begin = Word.begin ();
Map<string, String>::iterator end = Word.end ();
While [begin!= end] {out << begin->second << ' \ n ';
++begin;
} out.close ();
int main () {string dic;
string outfile;
cout << "Please input dictionary name:";
Cin >> DIC;
cout << "Please input output filename:";
Cin >> outfile;
Sign_sort (Dic.c_str ());
Write_file (Outfile.c_str ());
return 0;
}
Attached: (2012.5.6 Baidu Internship test question) A word Exchange letter position, can get another word, such as army->mary, become a brother word. Provide a word and find its brother in the dictionary. Describes the data structure and query process.
Solution: If preprocessing is not allowed, then we can only iterate through the entire dictionary, calculating the identity of each word and the identification of the given word. If preprocessing is allowed, we can add all words to a map as described above, and then output the corresponding value of the keyword (the identifier of the given word), which contains all the siblings of the word.
It is believed that the example in this paper can help readers to better master the techniques of data structure and algorithm in C + +.