Implementation of C + +--simple processing of Chinese words

Source: Internet
Author: User

Well, now we're going to read it, and then we'll look at some details about how to store things.

First, the comparison function cannot pass in the address of the char*, but it can accept a string class.

However, if two long string classes are to be compared, the time complexity will rise to O (length), which is very uneconomical. Therefore, using a double-hash method, using H1, H2 two hashes to represent a particular string, the collision probability can be reduced to a basic negligible. It is not difficult to find that double-hashed words are more complex than O (2), greatly reducing the complexity of time.

Then, what container is used for storage. There are generally two types: (P1 and P2 are the prime numbers that may be used for hashing)

The first is a two-dimensional array, the first dimension represents H1, and the second dimension represents H2. In order to save space The second dimension is stored with vectors, so the time complexity of inserting and querying is O (log (p2)).

The second kind, drop directly into the map, insert, query the time complexity is O (CNT) (where CNT indicates the number of different words)

So I used the second one directly, because it was simple to implement, and the complexity was basically the same. (because the vector constant is large)

1#include <cstdio>2#include <iostream>3#include <string>4#include <cstring>5#include <algorithm>6#include <map>7 8 #defineTF Second9 using namespacestd;Ten Const intMod1 =19997; One Const intMOD2 =30001; A Const intBin =1<<9; -  - structWord { the     stringSt; -     intH1, H2; -InlineBOOL operator< (ConstWord &x)Const { -         returnH1 = = X.h1? H2 < X.H2:H1 <x.h1; +     } -  +     #defineX (int) st[i] A     #defineWeight 3001 atInlinevoidCalc_hash () { -         intLen =st.length (), TMP, I; -          for(i = TMP =0; i < Len; ++i) -(tmp *= Weight) + = (x <0? x + bin:x))%=mod1; -H1 =tmp; -          for(i = TMP =0; i < Len; ++i) in(tmp *= Weight) + = (x <0? x + bin:x))%=mod2; -H2 =tmp; to     } +     #undefX -     #undefWeight the }; *typedef Map <string,int>map_for_words; $ typedef map_for_words:: iterator iter_for_words;Panax Notoginseng  - map_for_words data; the Word W; +  A intMain () { theFreopen ("test.in","R", stdin); +Ios::sync_with_stdio (false); -      while(Cin >>w.st) { $ W.calc_hash (); $Data[w.st] + =1; -     } - iter_for_words it; the      for(it = Data.begin (); It! = Data.end (); + +it) -cout << it, first <<' '<< IT-TF <<Endl;Wuyi     return 0; the}
View Code

Effect (it seems to say):

Input:

Output:

(Don't ask me why this interface is so funny ...) This is the terminal's say)

Implementation of C + +--simple processing of Chinese words

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.