Handling Partitioning Skew in MapReduce using LEEN - Université de Rennes Accéder directement au contenu
Article Dans Une Revue Peer-to-Peer Networking and Applications Année : 2013

Handling Partitioning Skew in MapReduce using LEEN

Résumé

MapReduce is emerging as a prominent tool for big data processing. Data locality is a key feature in MapRe- duce that is extensively leveraged in data-intensive cloud systems: it avoids network saturation when processing large amounts of data by co-allocating computation and data stor- age, particularly for the map phase. However, our studies with Hadoop, a widely used MapReduce implementation, demonstrate that the presence of partitioning skew (Parti- tioning skew refers to the case when a variation in either the intermediate keys’ frequencies or their distributions or both among different data nodes) huge amount of data transfer during the shuffle phase and leads to significant unfairness on the reduce input among different data nodes. As a result, the applications severe performance degrada- tion due to the long data transfer during the shuffle phase along with the computation skew, particularly in reduce phase. In this paper, we develop a novel algorithm named LEEN for locality-aware and fairness-aware key partition- ing in MapReduce. LEEN embraces an asynchronous map and reduce scheme. All buffered intermediate keys are parti- tioned according to their frequencies and the fairness of the expected data distribution after the shuffle phase. We have integrated LEEN into Hadoop. Our experiments demon- strate that LEEN can efficiently achieve higher locality and reduce the amount of shuffled data. More importantly, LEEN guarantees fair distribution of the reduce inputs. As a result, LEEN achieves a performance improvement of up to 45 % on different workloads.
Fichier principal
Vignette du fichier
PPNA.pdf (1.57 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00822973 , version 1 (28-06-2016)

Identifiants

  • HAL Id : hal-00822973 , version 1

Citer

Shadi Ibrahim, Hai Jin, Lu Lu, Bingsheng He, Gabriel Antoniu, et al.. Handling Partitioning Skew in MapReduce using LEEN. Peer-to-Peer Networking and Applications, 2013. ⟨hal-00822973⟩
357 Consultations
583 Téléchargements

Partager

Gmail Facebook X LinkedIn More