A Survey of Data-Intensive Scientific Workflow Management

Abstract : Nowadays, more and more computer-based scientific experiments need to handle massive amounts of data. Their data processing consists of multiple computational steps and dependencies within them. A data-intensive scientific workflow is useful for modeling such process. Since the sequential execution of data-intensive scientific workflows may take much time, Scientific Workflow Management Systems (SWfMSs) should enable the parallel execution of data-intensive scientific workflows and exploit the resources distributed in different infrastructures such as grid and cloud. This paper provides a survey of data-intensive scientific workflow management in SWfMSs and their parallelization techniques. Based on a SWfMS functional architecture, we give a comparative analysis of the existing solutions. Finally, we identify research issues for improving the execution of data-intensive scientific workflows in a multisite cloud.
Complete list of metadatas

Cited literature [143 references]  Display  Hide  Download

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01144760
Contributor : Patrick Valduriez <>
Submitted on : Sunday, July 21, 2019 - 4:52:32 PM
Last modification on : Monday, July 22, 2019 - 9:37:57 AM

File

jogc2015.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Ji Liu, Esther Pacitti, Patrick Valduriez, Marta Mattoso. A Survey of Data-Intensive Scientific Workflow Management. Journal of Grid Computing, Springer Verlag, 2015, 13 (4), pp.457-493. ⟨10.1007/s10723-015-9329-8⟩. ⟨lirmm-01144760⟩

Share

Metrics

Record views

947

Files downloads

117