Publish my inverted index-C/C ++-chinaunix.net-

Source: Internet
Author: User

Publish my inverted index-C/C ++-chinaunix.net-

] Publish my inverted Indexes [Copy link] 0 0

Redor

Redor Offline

Space points
0
Credit points
277
UID
66168
Read Permission
30
Points
1565
Post
1058
Excellent
11
Available points
1567
Expert points
0
Online time
714 hours
Registration Time
2003-07-19
Last login
2013-03-11
 

Well-off home

A well-off family with 1565 points is required, and 435 points are required from the next level.
Post
1058
Topic
123
Excellent
11
Available points
1567
Expert points
0
Online time
714 hours
Registration Time
2003-07-19
Last login
2013-03-11
    • Webshell
    • Friend
    • Blog
    • Message
Forum badges:
0

Direct elevator

1Building [Add to Favorites (0)] [Report] Posted on 16:01:15 |View the author only|Reverse browsing

Http://libibase.googlecode.com/

Main functions:
Parse HTML
Chinese word segmentation (maximum reverse matching, implemented using trie)
Generate a forward document (the format defined by myself is as follows)
Generate inverted indexes (Block Storage, bytecode CompressionAlgorithm, The body and snapshot are compressed using zlib)
Submit query string search (only vector space model is implemented, and dynamic summarization is not completed yet)
Currently, there is only one command line test tool, hibase.
The package comes with a 10 million Chinese Dictionary (GZIP format under the doc directory, which needs to be unlocked when used)
For more information, see readme.

The next step is testing and optimization. Because there are many macros During writing, compilation is still a little slow...

To learn one piece can add my MSN/gtail: sounos@gmail.com

By the way, paste an instance:
I used wget to download the chinaunix homepage to the/data/html directory./data/dict is my dictionary.

    1. . /hibase -- basedir =/tmp -- dict =/data/dict/dict.txt -- add -- Doc =/data/html/index.html -- url = http://www.chinaunix.net/-- date = "Thu, 03 Jul 2008 10:12:18 GMT "-- charset =" GBK "-- Query -- Request =" chinaunix "-- topn = 1000
    2. parsing document [[url] http://www.chinaunix.net/#/url.pdf] Time used: 16825 microseconds
    3. adding document [[url] http://www.chinaunix.net/#/url.pdf] Time used: 47955 microseconds
    4. parse query time used: 36
    5. Read hits [1] posting time used: 1897
    6. caculated 1 Documents time used: 22
    7. read 1 Documents content time used: 1404
    8. (0) title [chinaunix.net = the world's largest Linux/Unix application and developer community = IT people's online home]
    9. summary [(null)]
    10. URL [[url] http://www.chinaunix.net/#/url#]
    11. size [84892] date [Thu, 03 Jul 2008 10:12:18 GMT]
    12. Search [chinaunix] Time used: 3502

copy Code

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.