Find the top 10 repetitions from 10 million SMS

Source: Internet
Author: User

Topic:

There are 10 million text messages, there are duplicates, in the form of a textual file saved, one line. Please use 5 minutes to find the top 10 duplicates.


struct Tnode
{
Byte* Ptext;
Memory address directly pointing to the file map
DWORD dwcount;
Calculator, recording the same number of messages for this node
tnode* childnodes[256];
Child node data, because the ASCII value of one letter cannot exceed 256, so the child node cannot be more than 256
Tnode ()
{
Initializing members
}
~tnode ()
{
Freeing resources
}
};
int NIndex is the letter subscript
void Createchildnode (tnode* pnode, const byte* ptext, int nIndex)
{
if (Pnode->childnodes[ptext[nindex]] = = NULL)
{
If this child node does not exist, it is created. Tnode constructor//should have initialization code
For ease of handling, it is also possible to add this node to an array while creating it.
Pnode->childnodes[ptext[nindex]] = new Tnode;
}
if (Ptext[nindex + 1] = = ' + ')
{
This text message has been completed, counter plus 1, and save this SMS content
pnode->childnodes[ptext[nindex]]->dwcount++;
Pnode->childnodes[ptext[nindex]]->ptext = Ptext;
}
Else
{
if (ptext[nindex]!= ') is not finished, create the next level node
CreateNode (Pnode->childnodes[ptext[nindex]], Ptext, NIndex + 1);


}


}
Create the root node, ptexts is the SMS array, Dwcount is the number of SMS//10 million
void Createrootnode (const byte** ptexts, DWORD dwcount)
{
Tnode RootNode;
for (DWORD dwindex = 0; dwindex < dwcount, dwindex++)
{
CreateNode (&rootnode, Ptexts[dwindex], 0);
}
All nodes are sorted by Dwcount value
Take the first 10 nodes and display the results
}


Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Find the top 10 repetitions from 10 million SMS

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.