Open-source: cloud search for Chinese word segmentation (yunxunmi Chinese Word Segmentation), Word Segmentation of 4.4 million and 0.1 million words, and computing frequency not more than 1 second

Source: Internet
Author: User

Open-source: cloud search for Chinese word segmentation (yunxunmi Chinese Word Segmentation), Word Segmentation of 4.4 million and 0.1 million words, and computing frequency not more than 1 second

Yunxunmi Chinese word segmentation refers to splitting a Chinese Character Sequence into separate words. Cloud searching for Chinese word segmentation refers to the process of re-composing word Sequences Based on certain specifications. Chinese Word Segmentation is also the core technology of search engines, including full-text indexing, retrieval, sorting weight calculation, text classification, and clustering. Therefore, the accuracy and efficiency of Chinese word segmentation directly determine whether a search engine is good or bad. Currently, many search engines, including Google, cannot recognize dialects in different parts of China, which directly leads to a wide variety of search results, based on the above reasons, I have worked hard to develop a cloud search for Chinese Word Segmentation and open source. I hope more people of insight will benefit from this.
There are five Chinese Word Segmentation Lexicon in Alibaba Cloud search, which are
Ciku862.16.dat has about 0.86 million common Chinese Words
Ciku1248500.dat has about 1.2 million common Chinese words, including popular Internet words
Ciku1408964.dat has about 1.4 million frequently used Chinese words, including popular Internet words, ancient and modern Chinese and foreign celebrities, frequently used regions
Ciku2786019.dat has a total of about 2.7 million common Chinese words, including popular Internet words, ancient and modern Chinese and foreign celebrities, all over the country and Dialects
Ciku4002.16.dat has a total of about 4 million frequently used Chinese words, including popular Internet words, ancient and modern Chinese and foreign celebrities, all over the country and dialects, music, movies, proprietary words in various industries, Tang poetry, Song Ci, Yuanqu, and sishu, three character classics, astronomy, geography, military, female, classical Chinese and other Chinese Words

When the program starts, it will load the dictionary. It takes about a dozen seconds to complete Chinese word segmentation. The default dictionary used by the program is ciku2786019.dat, it is in the same directory as the Alibaba Cloud search Chinese Word Segmentation executable program. If you need to experience the Chinese word segmentation effect of other word libraries, you can replace the ciku In the Alibaba Cloud search Chinese Word Segmentation execution program directory. DAT file. Restart the program!

Alibaba Cloud search for Chinese Word Segmentation developer Tom Baoxing has any questions or suggestions in Hangzhou on April 9, 2013. Please contact qqgroup: 204725117
Cloud search Chinese Word Segmentation is fully open-source and can be used without any restrictions!

The splitting result is as follows:
Chinese word segmentation, 11, vocabulary, 10, cloud search, 8, Chinese, 7, dat, 7, commonly used, 6, including, 6, total, 6, left and right, 5, 10 thousand, 5, Dictionary, 4, popular, 4, Internet, 4, dialect, 3, ancient and modern Chinese and foreign, 3, celebrity, 3, And, 2, open source, 2, 2, all over the country, 2, ciku2786019, 2, sequence, 2, search engine, 2, program, 2, re, 2, run, 2, classical Chinese, 1, replaceable, 1, word, 1, separate, 1, ciku4000000, 1,120, 1, word order, 1, so, 1, directory, 1, execution program, 1, reason, 1, one by one, 1, will direct, 1, text, 1, executable program, 1, in the same, 1, there is any, 1, according to a certain, 1, music, 1, used, 1, accuracy, 1, other, 1, astronomy, 1, myself, 1, and, 1, ciku1248500, 1, Tang poetry, 1, current, 1, Chinese character, 1, also, 1, qqgroup, 1, good or bad, 1, Will, 1, load, 1, decision, 1, refers to, 1, female, 1, more, 1, several seconds, 1, various industries, 1, column, 1, combined into, 1, that is, 1, region, 1,140, 1, process, 1. Rescue star, 1, unidentifiable, 1, almost, 1, waiting, 1, respectively, 1, location, 1, word, 1, word, 1, word, 1, core:, 1, search result, 1, four books, 1, many, 1, or suggestion, 1, engine, 1, Professional, 1, positive, 1, year, 1. Full text index, 1, clustering, 1, complete, 1, can, 1, above, 1, developed, 1, ciku1408964, 1, and, 1, word, 1, question: 1. In the directory, 1, yunxunmi, 1, developer, 1, and, 1, yes, 1, weight, 1, contact, 1, one, 1, about required, 1, Yuanqu, 1, segmentation, 1,400, 1, ciku860000, 1, Xiaobai, 1, later, 1, Google, 1, Hope, 1, file, 1, ciku, 1, word Segmentation, 1, start, 1, each, 1, Chinese, 1, at, 1, geography, 1, experience, 1, if needed, 1,, 1, effect, 1, on, 1, movie, 1, benefit from, 1, search, 1, cut into, 1, use, 1, full-blown, 1, default, 1,270, 1, sort, 1. Search, 1, can be arbitrary, 1, without any restrictions, 1, 10, 1, insight, 1, Hangzhou, 1, specification, 1, etc., 1, start the program, 1, Military, 1, 3-character Sutra, 1, is based on, 1, for China, 1, hard work, 1, efficiency, 1, computing, 1, classification, 1, directly resulting in, 1, technology, 1, can, 1, In, 1, number, 1, continuous, 1, 5, 1, will be one, 1

The Alibaba Cloud search Chinese Word Segmentation operation is as follows:

:
Search for Chinese Word Library .rar
Search for Chinese Word Segmentation code .rar
Alibaba Cloud search Chinese Word splitting execution program .rar

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.