Many of the data sets used for text processing have reached TB, PB, or even larger scale, and traditional stand-alone methods are difficult to effectively deal with these data. In recent years, the MapReduce computing framework has been widely accepted and used by both academia and industry, which can solve the problem of parallel processing of large-scale data in concise form and distributed scheme. At present, MapReduce has been used in natural language processing, machine learning and large-scale map processing and other fields. This paper first makes a simple introduction to MapReduce, and analyzes its characteristics, advantages and disadvantages; Then, the paper classifies and collates the application of MapReduce in various aspects of text processing in recent years, and finally makes some introductions and prospects for the research of MapReduce system and performance.
Keywords: text processing; MapReduce; distributed computing; overview; Hadoop
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.