Original: http://xiaoxia.org/2011/12/18/map-reduce-program-of-rmm-word-count-on-hadoop/Running a MapReduce program based on RMM Chinese word segmentation algorithm on Hadoop 23 repliesI know the title of this article is very "academic", very vulgar, people seem to be a very cow B or a very loaded paper! In fact, it is just an ordinary experiment report, and this article does not study the
? PhpRMM word segmentation algorithm classSplitWord {var $ TagDicArray (); var $ RankDicArray (); var $ SourceStr; var $ ResultStr; var $ SplitChar; separator var $ SplitLen4; reserved word length var $ MaxLen7; the maximum text in the dictionary. The value here is byte.
? Php // RMM word segmentation algorithm class SplitWord {var $ TagDic = Array (); var $ RankDic = Array (); var $ SourceStr = ''; var $ ResultStr = ''; var $ SplitChar = ''; // delim
This procedure is based on RMM Chinese word segmentation thought, the simple Chinese word segmentation, the procedure still has many loopholes, hope the big God pointing .... Optimized the next garbled problem
/**
* Based on RMM Chinese word segmentation (inverse matching method)
* @author Tangpan
* @date 2013-10-12
* @version 1.0.0
**/
Class Splitwor
RMM Segmentation Algorithm Class
RMM segmentation algorithm
Class splitword{
var $TagDic = Array ();
var $RankDic = Array ();
var $SourceStr = ';
var $ResultStr = ';
var $SplitChar = '; Separator
var $SplitLen = 4; Reserved word length
var $MaxLen = 7; Dictionary maximum Chinese text, where the value is the largest index of a byte array
var $MinLen = 3;
the original Command (for example, the alias ls = 'LS-l'), If You Want To explicitly use the original command, you can delete aliases, use absolute paths, or use escape characters to restore commands.
The alias command is a temporary alias definition. to define an alias that takes effect for a long time, write the alias Definition Statement to/etc/profile or ~ /. Bash_profile or ~ /. Bashrc. The first one is valid for all users, and the last two are valid for the corresponding users. After modi
also limited by the current progress of Atom, and Atom is currently the leader in Level 3 rest services.The Meaning of levels (the meaning of the levels)I should emphasize that the Richardson Maturity Model (RMM) is a good way to think about what elements are in rest, but it does not directly define levels in rest. Roy Fielding also clarified this point: Level 3 RMM is a precondition for rest. Like many of
1. zenoss
Zenoss is an enterprise-level open-source server and network monitoring tool. It is most notable for its virtualization and cloud computing monitoring capabilities. It is hard to see that other old monitoring tools have this function.2. ossim
Ossim is short for open source security information management (Open Source security information management). It has a complete Siem function and provides an open source detection tool.ProgramPackage and an associated engine are designed
(encoding)index = Len (Sent)j = 0list = []While index >= 0:For I in range (dictmaxlength, 0,-1):j = Index-iIf J Sub = Sent[j:index]If Len (sub) > 1:If Dctdict.has_key (Sub.encode (encoding)):List.append (Sub.encode (encoding))index = Index-iBreakElseIf not sub.encode (encoding) = = "":List.append (Sub.encode (encoding))index = Index-iBreakList.reverse ()Return "". Join (list)‘‘‘The less the dictionary word, the single dictionary word, the total number of words the better‘‘‘def segmenter (Sent):
-android.app File
For example, add my. TTF to gfbnames:
4. Add framework string
Add a new values-my-RMM folder under frameworks/base/CORE/RES/, create a strings. xml file, and put the translation content of frameworks in this file:
5. Add app string
Translate each app, create the values-my-RMM folder under the res directory of each app, and put the translated strings. XML in it;
6. re-build the entire pro
developer, when variables are named after their 97 cats or their functions are called "dosomething ()," they look like they do not know what they are doing, and it makes their code harder to maintain in the process.10: Have fun
Want to be stuck on the career treadmill? Hate your job. what it takes to move up in this business is not merely dogged determination to bring home an ever growing paycheck but an actual enjoyment of your work. if you do not like your work and you are a junior developer,
We can use the agent service for SNMP protocol. The SNMP protocol proxy has many advantages, and there are also many disadvantages. Next we will explain the content in this regard. First, let's discuss the basic information. Many proxies on the network serve us. As long as we enable UDP/161,162 ports, these proxies will use the Management Information Base (MIB) if you are interested, you can find a bunch of Network Management software with the SNMP (Simple Network Management Protocol) Agent list
to go out. The solution is to go to work early-avoid wasting time on the road to do more. For others, the bottleneck may be a computer with an outdated operating system-slow and often crashes. The solution is to upgrade the operating system, with future gains greater than the upfront cost of investment awareness.4 Tips for doing moreThe following tips will help you identify and eliminate potential bottlenecks to increase productivity in your daily work life:Arm yourself with the right tools.It'
Technology and application
OSI and TCP/IP model, subnetting, VLAN, VTP, STP, ACL, OSPF and other network technologies
240
World top 500 Enterprise Desktop Engineer Course
Enterprise Common computer hardware and software failure, Enterprise Client security, Outlook mail client, Lotus notes mail client, print scanning machine enterprise application, projector enterprise application, service desk, event management, problem management, etc. ITIL Foundation,
It Sounds Insane(Crazy).But a study of IT professionalS (PRO) released Wednesday suggests that Microsoft should is able to hits that Targe T with ease.40% plan to start rollingout (promo) for Windows Ten this year.That's in Stark(obviously) contrast(contrast) to the largely failed Windows 8 release from 2012.If Spiceworks ' survey is correct, and 73% of businesses adopt(adopted) Windows Ten, that would make Windows 10 The most quickly deployed version
used as the new matching field for re-matching. Repeat the above process until all words are split.
1.2 reverse maximum matching algorithm RMM
This algorithm is a reverse thinking of forward maximum matching. If the matching fails, the first word of the matching field is removed. The experiment shows that the reverse maximum matching algorithm is better than the forward maximum matching algorithm.
1.3 bidirectional maximum matching (bi-direc
last character of the matching field is a Chinese character,
Then
① Remove the last word of the matching field;
② The length of the matching field is reduced by 2;
Otherwise
① Remove the last byte of the matching field;
② The length of the matching field is reduced by 1;
B) Jump to step 3 );
Otherwise
A) if the last character of the matching field is a Chinese character,
Then the value of the current position counter is increased by 2;
Otherwise, the value of the counter at the current position
, reverse scan, and bidirectional scan. The matching principle mainly includes the maximal match, the minimum match, the word matching and the best match.Maximum matching method (MM). The basic idea is: assume that the longest entry in the Automatic Word segmentation dictionary contains the number of Chinese characters is I, then take the processed material in the current string sequence of the first I character as a matching field, look up the word segmentation dictionary, if there is such a wo
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.