Topic:
There are 10 million text messages, there are duplicates, in the form of a textual file saved, one line. Please use 5 minutes to find the top 10 duplicates.
struct Tnode
{
Byte* Ptext;
Memory address directly pointing to the file map
DWORD dwcount;
Calculator, recording the same number of messages for this node
tnode* childnodes[256];
Child node data, because the ASCII value of one letter cannot exceed 256, so the child node cannot be more than 256
Tnode ()
{
Initializing members
}
~tnode ()
{
Freeing resources
}
};
int NIndex is the letter subscript
void Createchildnode (tnode* pnode, const byte* ptext, int nIndex)
{
if (Pnode->childnodes[ptext[nindex]] = = NULL)
{
If this child node does not exist, it is created. Tnode constructor//should have initialization code
For ease of handling, it is also possible to add this node to an array while creating it.
Pnode->childnodes[ptext[nindex]] = new Tnode;
}
if (Ptext[nindex + 1] = = ' + ')
{
This text message has been completed, counter plus 1, and save this SMS content
pnode->childnodes[ptext[nindex]]->dwcount++;
Pnode->childnodes[ptext[nindex]]->ptext = Ptext;
}
Else
{
if (ptext[nindex]!= ') is not finished, create the next level node
CreateNode (Pnode->childnodes[ptext[nindex]], Ptext, NIndex + 1);
}
}
Create the root node, ptexts is the SMS array, Dwcount is the number of SMS//10 million
void Createrootnode (const byte** ptexts, DWORD dwcount)
{
Tnode RootNode;
for (DWORD dwindex = 0; dwindex < dwcount, dwindex++)
{
CreateNode (&rootnode, Ptexts[dwindex], 0);
}
All nodes are sorted by Dwcount value
Take the first 10 nodes and display the results
}
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Find the top 10 repetitions from 10 million SMS