Publish my inverted index-C/C ++-chinaunix.net-
] Publish my inverted Indexes [Copy link] 0 0 |
Redor Redor Offline
-
Space points
-
0
-
Credit points
-
277
-
UID
-
66168
-
Read Permission
-
30
-
Points
-
1565
-
Post
-
1058
-
Excellent
-
11
-
Available points
-
1567
-
Expert points
-
0
-
Online time
-
714 hours
-
Registration Time
-
2003-07-19
-
Last login
-
2013-03-11
Well-off home A well-off family with 1565 points is required, and 435 points are required from the next level.
-
Post
-
1058
-
Topic
-
123
-
Excellent
-
11
-
Available points
-
1567
-
Expert points
-
0
-
Online time
-
714 hours
-
Registration Time
-
2003-07-19
-
Last login
-
2013-03-11
- Webshell
- Friend
- Blog
- Message
-
Forum badges:
-
0
|
Direct elevator 1Building [Add to Favorites (0)] [Report] Posted on 16:01:15 |View the author only|Reverse browsing
Http://libibase.googlecode.com/ Main functions: Parse HTML Chinese word segmentation (maximum reverse matching, implemented using trie) Generate a forward document (the format defined by myself is as follows) Generate inverted indexes (Block Storage, bytecode CompressionAlgorithm, The body and snapshot are compressed using zlib) Submit query string search (only vector space model is implemented, and dynamic summarization is not completed yet) Currently, there is only one command line test tool, hibase. The package comes with a 10 million Chinese Dictionary (GZIP format under the doc directory, which needs to be unlocked when used) For more information, see readme. The next step is testing and optimization. Because there are many macros During writing, compilation is still a little slow... To learn one piece can add my MSN/gtail: sounos@gmail.com By the way, paste an instance: I used wget to download the chinaunix homepage to the/data/html directory./data/dict is my dictionary.
- . /hibase -- basedir =/tmp -- dict =/data/dict/dict.txt -- add -- Doc =/data/html/index.html -- url = http://www.chinaunix.net/-- date = "Thu, 03 Jul 2008 10:12:18 GMT "-- charset =" GBK "-- Query -- Request =" chinaunix "-- topn = 1000
- parsing document [[url] http://www.chinaunix.net/#/url.pdf] Time used: 16825 microseconds
- adding document [[url] http://www.chinaunix.net/#/url.pdf] Time used: 47955 microseconds
- parse query time used: 36
- Read hits [1] posting time used: 1897
- caculated 1 Documents time used: 22
- read 1 Documents content time used: 1404
- (0) title [chinaunix.net = the world's largest Linux/Unix application and developer community = IT people's online home]
- summary [(null)]
- URL [[url] http://www.chinaunix.net/#/url#]
- size [84892] date [Thu, 03 Jul 2008 10:12:18 GMT]
-
- Search [chinaunix] Time used: 3502
copy Code |
|