We are proud to announce that Apache Spark won the 2016 CloudSort Benchmark (both Daytona and Indy category). A joint team from Nanjing University, Alibaba Group, and Databricks Inc. entered the competition using NADSort, a distributed sorting program built on top of Spark, and set a new world record as the most cost-efficient way to sort 100TB of data.
They sorted 100TB of data using only $144 USD worth of public cloud resources, beating the previous record that cost $451 USD by the University of California, San Diego.
This adds to the 2014 GraySort record Spark won, and validates Spark as the most efficient data processing engine.
For more information, see the Databricks blog article (in English) written by Spark committer Reynold Xin, or the Nanjing University press release (in Chinese).