Abstract:
MapReduce is a distributed parallel computing framework for large-scale data processing with extensive applications. Hadoop MapReduce is the most widely employed open-sou...Show MoreMetadata
Abstract:
MapReduce is a distributed parallel computing framework for large-scale data processing with extensive applications. Hadoop MapReduce is the most widely employed open-source implementation of MapReduce framework for its flexible customization and simple usage. To avoid the relatively slow running task, called a straggler task, slowing down the job, MapReduce speculatively backups the straggler task on another node to execute aiming to reduce the job's finish time. Although there have been many speculative execution strate-gies in heterogeneous environments, they all do not consider the impact of dynamic system load on the running time of tasks. They may make mistakes in determining stragglers. In our paper, we propose a novel speculative execution strategy in heterogeneous environments, ERUL, to im-prove the estimation of tasks' rest time. ERUL also overcomes some drawbacks of LATE that mislead the speculative execution in some cases. The experimental result indicates that, our Hadoop-ERUL strategy not only works more accurately in the estimation of running tasks' remaining execution time, but also reduces 26% job's running time compared to Hadoop-LATE.
Published in: 2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming
Date of Conference: 13-15 July 2014
Date Added to IEEE Xplore: 07 October 2014
ISBN Information: