MapReduce will be a new friend to the server

Source: Internet
Author: User
Keywords Server can become will express

In the future, when the administrator builds the server, he is likely to install the server cluster, database, middleware software MapReduce.

Dionysios Logothetis, a researcher at the University of California, San Diego, said at a recent Usenix annual meeting that MapReduce can be used to analyze log data directly on the server, without having to analyze it individually on each cluster, or to dramatically shorten the time it took to analyze data.

MapReduce structure

With this approach, "data analysis can be transferred from a professional cluster to a log server to avoid costly data migration costs," Logothetis said. MapReduce was first launched by Google Inc., which is increasingly being used to analyze large-scale data across servers and nodes. At present, it mainly serves as an integral part of the hadoophttp://www.aliyun.com/zixun/aggregation/14345.html "> Data processing platform."

Although most of the mapreduce are used in professional clusters, the researchers say the framework version of the analysis software can also be part of a Web server. The current Business Web page about the user's detailed log information can provide data support for advertising positioning, Web site security monitoring and debugging.

According to statistics, a single server that provides services for a busy E-commerce site can generate valuable log data from 1MB to 10MB per second. Within a day, it generates several 10 Sao bytes of valuable data. On average, 1000 such servers generate 86TB of data information in a single day. For example, a famous social networking site, Facebook, produces 100TB of data a day.

Typically, a site such as Facebook collects data from different servers and then loads it into the Hadoop cluster and uses MapReduce to analyze its results.

MapReduce instance

The previous method of "storing and querying first" has many drawbacks. The transmission of data on different servers consumes a lot of bandwidth resources and poses a huge pressure on the network. Facebook discards 80% of its log data before data analysis. By using this new technology, the data will not need to be transmitted again, and there will be no serious problem of data loss.

MapReduce may become a standard outfit for future servers, which will analyze the data and communicate the results of the analysis to the central data collection point. The researchers call this method "In-situ MapReduce" (IMR).

IMR is designed to complement, rather than replace, the traditional cluster architecture in order to complete the subsequent analysis of log data and other data in distributed storage systems. As a program, IMR can replicate all MapReduce APIs and perform mapreduce similar functions, name filtered data, and assemble analysis results. The difference is that it can be based on the latest data on the basis of continuous analysis.

Currently, researchers have created a IMR prototype that allows users to specify the range of data that needs to be parsed, such as all the information collected over the last 60 seconds. In addition, users can also set how often to submit analysis results and transfer, such as can be specified every 15 seconds analysis.

Logothetis says that Web servers may spend most of their resources on what they should be doing, called services to users. But IMR can use the rest of the loop to process the log data.

It is reported that the researchers specifically developed a plan to establish a balance between processing speed and result integrity, if the need to obtain results faster, then each server can ignore some time-consuming data, where a less complete, but still meaningful results. If you want a comprehensive analysis, it will take a long time and a lot of server resources to complete.

Of course, an organization that runs only a small number of servers may not benefit from IMR. But big operators, such as search engines, social networks and e-commerce sites, will experience the value of IMR.

 

(Responsible editor: admin)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.