Original: http://xiaoxia.org/2011/12/18/map-reduce-program-of-rmm-word-count-on-hadoop/Running a MapReduce program based on RMM Chinese word segmentation algorithm on Hadoop 23 repliesI know the title of this article is very "academic", very vulgar, people seem to be a very cow B or a very loaded paper! In fact, it is just an ordinary experiment report, and this article does not study the
? PhpRMM word segmentation algorithm classSplitWord {var $ TagDicArray (); var $ RankDicArray (); var $ SourceStr; var $ ResultStr; var $ SplitChar; separator var $ SplitLen4; reserved word length var $ MaxLen7; the maximum text in the dictionary. The value here is byte.
? Php // RMM word segmentation algorithm class SplitWord {var $ TagDic = Array (); var $ RankDic = Array (); var $ SourceStr = ''; var $ ResultStr = ''; var $ SplitChar = ''; // delim
This procedure is based on RMM Chinese word segmentation thought, the simple Chinese word segmentation, the procedure still has many loopholes, hope the big God pointing .... Optimized the next garbled problem
/**
* Based on RMM Chinese word segmentation (inverse matching method)
* @author Tangpan
* @date 2013-10-12
* @version 1.0.0
**/
Class Splitwor
RMM Segmentation Algorithm Class
RMM segmentation algorithm
Class splitword{
var $TagDic = Array ();
var $RankDic = Array ();
var $SourceStr = ';
var $ResultStr = ';
var $SplitChar = '; Separator
var $SplitLen = 4; Reserved word length
var $MaxLen = 7; Dictionary maximum Chinese text, where the value is the largest index of a byte array
var $MinLen = 3;
:
[root@xuexi ~]# which mvalias mv='mv -i' /bin/mv
If the name defined is the same as the name of the original Command (for example, the alias ls = 'LS-l'), If You Want To explicitly use the original command, you can delete aliases, use absolute paths, or use escape characters to restore commands.
The alias command is a temporary alias definition. to define an alias that takes effect for a long time, write the alias Definition Statement to/etc/profile or ~ /. Bash_profile or ~ /. Bashrc.
that the definition of Linkrels is registry of Link relations. When I wrote this article, it was also limited by the current progress of Atom, and Atom is currently the leader in Level 3 rest services.The Meaning of levels (the meaning of the levels)I should emphasize that the Richardson Maturity Model (RMM) is a good way to think about what elements are in rest, but it does not directly define levels in rest. Roy Fielding also clarified this point:
the font file to the compilation options
Modify the frameworks/base/data/fonts/Android. mk file:
Copy_from: = \
Droidsansmono. TTF \
My. TTF \
......
3.3 modify the external/skia/src/ports/SKFontHost-android.app File
For example, add my. TTF to gfbnames:
4. Add framework string
Add a new values-my-RMM folder under frameworks/base/CORE/RES/, create a strings. xml file, and put the translation content of frameworks in this file:
5. Add app string
Tra
.
If the match fails, the last word of the matching field is removed, and the remaining string is used as the new matching field for re-matching. Repeat the above process until all words are split.
1.2 reverse maximum matching algorithm RMM
This algorithm is a reverse thinking of forward maximum matching. If the matching fails, the first word of the matching field is removed. The experiment shows that the reverse maximum matching algorithm is
RMM. The basic principle of the RMM method is the same as that of the MM method. The difference is that the direction of word segmentation is the opposite to that of the MM method, and the word segmentation dictionary is also different. The reverse maximum matching method starts scanning from the end of the processed document. Each time, the 2I character (I string) at the end is taken as the matching field
(RMM). The method of the word segmentation process and mm method is the same, the difference is from the end of the sentence (or article) processing, each time the match is unsuccessful, the first character is removed. The statistical results show that the error rate of this method is 1/245.Word-wise traversal method. The words in the dictionary are searched for the entire material to be processed verbatim in the order of long to short descending, un
In the thesaurus, for example:
Tom
John doe
Harry
The article is:Zhang San stole Harry's hammer and smashed John Doe's head with a hammer.
The results are then extracted:Dick and Harry Harry
There could be more than 1000 words in the thesaurus.
Don't know what third-party tools are available to make performance acceptable?
Reply content:
In the thesaurus, for example:TomJohn doeHarry
The article is:Zhang San stole Harry's hammer and smashed John Doe's head with a hammer.
The results a
It is in the word library, for example: Zhang San, Li Si, Wang Wu. the article is: Zhang San stole Wang Wu's hammer, smashed Li Si's head with a hammer, and then extracted the result: there may be more than one thousand words in the word library of Michael Jacob, 4, and 5. I don't know what third-party tools are used to make performance acceptable? Is the word library, such:
Zhang San
Li Si
Wang Wu
The article is:John stole Wang's hammer and hit Li's head with a hammer.
Then the extracted
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.