Analysis on the questions of Tencent's face: repeated text messages

Source: Internet
Author: User
Tags hash min

There are 10 million messages, there are duplicates, as a text file to save, one line, there are duplicates.

There are 10 million messages, there are duplicates, as a text file to save, one line, there are duplicates.

Please use 5 minutes to find the top 10 repetitions.

Analysis:

The general approach is to sort first, and then iterate through it to find the top 10 repetitions. But the order of the algorithm is the lowest complexity of NLGN.

You can design a hash_table, hash_map<string, Int>, read 10 million text messages sequentially, load into the hash_table table, and count the number of repetitions, while maintaining a maximum of 10 SMS tables.

More Wonderful content: http://www.bianceng.cnhttp://www.bianceng.cn/Programming/sjjg/

This traversal can be used to find the top 10, the complexity of the algorithm is O (n).

Implemented as follows:

#include <iostream> #include <map> #include <iterator> #include <stdio.h> using namespace std  
      
;   
#define HASH __gnu_cxx #include <ext/hash_map> #define uint32_t unsigned int #define uint64_t unsigned long int    struct Strhash {uint64_t operator () (const std::string& str) const {uint32_t b  
           = 378551;  
           uint32_t a = 63689;  
      
           uint64_t hash = 0;  
              for (size_t i = 0; i < str.size (); i++) {hash = hash * a + str[i];  
           A = a * b;  
        return hash; } uint64_t operator () (const std::string& str, uint32_t field) Const {uint32_t b = 3  
           78551;  
           uint32_t a = 63689;  
           uint64_t hash = 0;  
              for (size_t i = 0; i < str.size (); i++) {hash = hash * a + str[i]; A = A * b;  
           hash = (hash<<8) +field;  
        return hash;  
}  
};  
        struct namenum{string name;  
        int num;  
Namenum (): num (0), name ("") {}};  
        int main () {hash::hash_map< string, int, strhash > names;  
        hash::hash_map< string, int, strhash >::iterator it;  
        Namenum NAMENUM[10];  
        String L = "";  
                while (Getline (CIN, L)) {it = Names.find (l);  
                if (it!= names.end ()) {names[l] + +;  
                        else {names[l] = 1;  
                NAMES[L] = 1;  
        } int i = 0;  
        int max = 1;  
        int min = 1;  
        int minpos = 0;  
           for (it = Names.begin (); it!= names.end (); + + it) {if (I < 10) {             Namenum[i].name = it->first;  
                        Namenum[i].num = it->second;  
                        if (It->second > Max) max = it->second;  
                                else if (It->second < min) {min = it->second;  
                        Minpos = i;  
                        } else {if (It->second > Min)  
                                {namenum[minpos].name = it->first;  
                                Namenum[minpos].num = it->second;  
                                int k = 1;  
                                min = Namenum[0].num;  
                                Minpos = 0; while (K <) {if (Namenum[k].num < min) {min = namenum[  
                                                K].num;  
                                        Minpos = k;  
                                K + +;  
      
        }} i++;  
        } i = 0;  
        cout << "MaxLength (string,num):" << Endl; while (I <) {cout << "(" << namenum[i].name.c_str () << "," << N  
                Amenum[i].num << ")" << Endl;  
        i++;  
return 0; }

Compile with g++ as follows:

g++ Main.cpp-o Main

SMS text file is: msg.txt

Running:./main < Msg.txt

The output results are:

MaxLength (String,num):
(Little Mother's Square, 4)
(Agricultural machinery Parts maintenance, 5)
(Red-Sheng Supermarket, 6)
(Dragon Creek Hotel, 8)
(Zhang Kee Dumpling Hall, 3)
(Friendship Inn, 3)
(Pearl Communication, 3)
(Jinyuan Hotel, 3)
(Dongting Natural Spring, 2)
(Qingyuan Supermarket, 3)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.