Disk performance equally strong, spark breaking large data benchmark records

Source: Internet
Author: User
Keywords nbsp; disk sturdy large data

The Apache Spark is the most popular large data processing framework today. Spark's performance and speed are vastly superior to mapreduce and easier to use, and spark already has a large user and contributor community, which means spark more compliant with the next generation of large data applications for low latency, real-time processing, and iterative computing, The tendency to replace MapReduce.

But many people think that spark only in the memory computing environment than mapreduce performance. Recently in order to spark name, Spark Commercial company Databrick in the disk environment to spark did Graysort run the test (below).

Databrick's Daytona Graysort test environment uses a total of 6,600 cores from 206 servers in the Amazon cloud, with test data up to 100TB and tested for just 23 minutes, breaking the record previously maintained by Yahoo, Yahoo was using a 2100-node Hadoop cluster, a total of more than 50,000 kernels to complete the 100TB data test (72 minutes).

To demonstrate Spark's performance in reliably handling large datasets, Databrick also added an informal test (pictured above), with 190 servers processing 1PB data for 4 hours. Arsalan Tavakoli, Databricks's customer marketing director, said many of the companies ' large data processing scale was far more than 1PB, and that people who were skeptical of Spark's expansion in the production environment should see Alibaba's spark cluster expand to hundreds of PB.

Databrick's Graysort benchmark is hdfs as a storage layer, with test data from the Databricks cloud, stored on Amazon's S3 or HDFs (AWS instance). Databrick's official website released more test details in Friday, including test methods and credibility.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.