Original: http://xiaoxia.org/2011/12/18/map-reduce-program-of-rmm-word-count-on-hadoop/Running a MapReduce program based on RMM Chinese word segmentation algorithm on Hadoop 23 repliesI know the title of this article is very "academic", very vulgar, people seem to be a very cow B or a very loaded paper! In fact, it is just an ordinary experiment report, and this article does not study the
? PhpRMM word segmentation algorithm classSplitWord {var $ TagDicArray (); var $ RankDicArray (); var $ SourceStr; var $ ResultStr; var $ SplitChar; separator var $ SplitLen4; reserved word length var $ MaxLen7; the maximum text in the dictionary. The value here is byte.
? Php // RMM word segmentation algorithm class SplitWord {var $ TagDic = Array (); var $ RankDic = Array (); var $ SourceStr = ''; var $ ResultStr = ''; var $ SplitChar = ''; // delim
This procedure is based on RMM Chinese word segmentation thought, the simple Chinese word segmentation, the procedure still has many loopholes, hope the big God pointing .... Optimized the next garbled problem
/**
* Based on RMM Chinese word segmentation (inverse matching method)
* @author Tangpan
* @date 2013-10-12
* @version 1.0.0
**/
Class Splitwor
RMM Segmentation Algorithm Class
RMM segmentation algorithm
Class splitword{
var $TagDic = Array ();
var $RankDic = Array ();
var $SourceStr = ';
var $ResultStr = ';
var $SplitChar = '; Separator
var $SplitLen = 4; Reserved word length
var $MaxLen = 7; Dictionary maximum Chinese text, where the value is the largest index of a byte array
var $MinLen = 3;
the original Command (for example, the alias ls = 'LS-l'), If You Want To explicitly use the original command, you can delete aliases, use absolute paths, or use escape characters to restore commands.
The alias command is a temporary alias definition. to define an alias that takes effect for a long time, write the alias Definition Statement to/etc/profile or ~ /. Bash_profile or ~ /. Bashrc. The first one is valid for all users, and the last two are valid for the corresponding users. After modi
also limited by the current progress of Atom, and Atom is currently the leader in Level 3 rest services.The Meaning of levels (the meaning of the levels)I should emphasize that the Richardson Maturity Model (RMM) is a good way to think about what elements are in rest, but it does not directly define levels in rest. Roy Fielding also clarified this point: Level 3 RMM is a precondition for rest. Like many of
(encoding)index = Len (Sent)j = 0list = []While index >= 0:For I in range (dictmaxlength, 0,-1):j = Index-iIf J Sub = Sent[j:index]If Len (sub) > 1:If Dctdict.has_key (Sub.encode (encoding)):List.append (Sub.encode (encoding))index = Index-iBreakElseIf not sub.encode (encoding) = = "":List.append (Sub.encode (encoding))index = Index-iBreakList.reverse ()Return "". Join (list)‘‘‘The less the dictionary word, the single dictionary word, the total number of words the better‘‘‘def segmenter (Sent):
-android.app File
For example, add my. TTF to gfbnames:
4. Add framework string
Add a new values-my-RMM folder under frameworks/base/CORE/RES/, create a strings. xml file, and put the translation content of frameworks in this file:
5. Add app string
Translate each app, create the values-my-RMM folder under the res directory of each app, and put the translated strings. XML in it;
6. re-build the entire pro
used as the new matching field for re-matching. Repeat the above process until all words are split.
1.2 reverse maximum matching algorithm RMM
This algorithm is a reverse thinking of forward maximum matching. If the matching fails, the first word of the matching field is removed. The experiment shows that the reverse maximum matching algorithm is better than the forward maximum matching algorithm.
1.3 bidirectional maximum matching (bi-direc
last character of the matching field is a Chinese character,
Then
① Remove the last word of the matching field;
② The length of the matching field is reduced by 2;
Otherwise
① Remove the last byte of the matching field;
② The length of the matching field is reduced by 1;
B) Jump to step 3 );
Otherwise
A) if the last character of the matching field is a Chinese character,
Then the value of the current position counter is increased by 2;
Otherwise, the value of the counter at the current position
, reverse scan, and bidirectional scan. The matching principle mainly includes the maximal match, the minimum match, the word matching and the best match.Maximum matching method (MM). The basic idea is: assume that the longest entry in the Automatic Word segmentation dictionary contains the number of Chinese characters is I, then take the processed material in the current string sequence of the first I character as a matching field, look up the word segmentation dictionary, if there is such a wo
segmentation algorithms, such as: Forward maximum matching method (MM), reverse maximum matching method (RMM), word-by-step traversal matching method, establishment of segmentation mark method, forward best matching method and reverse best matching method. In recent years, many new methods are proposed to improve the accuracy of participle and the speed of word segmentation. Such as: The generation of test method through the interaction between lexic
related to the recognized event pattern to Business events. We will introduce this selection mechanism later. Alternatively, you can filter the MQ LLM message before it is sent to the event mode using the Business events eXtreme Scale.
By pooling the high capacity, low-latency messaging capabilities of MQ LLM, and event pattern recognition for Business event, you can create a compelling solution for the client environment that requires these features.
Big picture
Let's take a look at the arc
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.