Breaking Yahoo record--Microsoft 60 SEC Processing 1401GB Data
Source: Internet
Author: User
KeywordsMust Yahoo Microsoft compare Inter
The Microsoft Institute recently broke the data-finishing speed record that was maintained by Yahoo. The 9-person team at the >microsoft Institute successfully completed 1401GB data http://www.aliyun.com/zixun/aggregation/11208.html in just 60 seconds. Their tests are based on minutesort benchmarks. Minutesort is the amount of data that is sorted in a minute. Microsoft has adopted a new distributed computing system (Flat Datacenter Storage) to speed up data processing.
It is worth mentioning that Microsoft's system uses 250 hosts (1033 disks), while Yahoo's record-making system uses 1406 hosts (5624 disks).
Microsoft believes that flat Datacenter storage can use its technical advantages to help Bing improve performance, and in the future Microsoft believes that flat Datacenter storage can make a difference in the field of machine learning. Currently the most popular processing technology in the field of large data processing is Hadoop and mapreduce, but now it seems that Microsoft's flat Datacenter storage technology is more advantageous. (terminator/Compilation)
Detailed test results
Click to view larger image
Extended Reading
Minutesort is a comparison of the amount of data that is sorted in a minute, graysort the sort rate (tbs/minute) when sorting large data (at least 100TB). The benchmark rules are specific as follows:
The input data must exactly match the data generated by the data generator
When the task starts, the input data cannot be in the operating system's file cache
Input and output data are not compressed
Output cannot override input
Output file must be stored on disk
You must calculate the CRC32 of each key/value pair for the input and output data, a total of 128-bit checksums, and of course, the input and output must correspond to equal
If the output is divided into multiple output files, then it must be completely orderly, that is, the output file must be connected to a completely ordered output
Start and distribute programs to the cluster also to be recorded in the calculation time
Any sampling should also be recorded in the calculated time
Yahoo researchers used Hadoop to arrange 1TB data in 62 seconds, and 1PB data in 16.25 hours.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.