Skip to Main content Skip to Navigation
Conference papers

A Distributed Framework for Low-Latency OpenVX over the RDMA NoC of a Clustered Manycore

Abstract : OpenVX is a standard proposed by the Khronos group for cross-platform acceleration of computer vision and deep learning applications. OpenVX abstracts the target processor architecture complexity and automates the implementation of processing pipelines through high-level optimizations. While highly efficient OpenVX implementations exist for shared memory multi-core processors, targeting OpenVX to clustered manycore processors appears challenging. Indeed, such processors comprise multiple compute units or clusters, each fitted with an on-chip local memory shared by several cores. This paper describes an efficient implementation of OpenVX that targets clustered manycore processors. We propose a framework that includes computation graph analysis, kernel fusion techniques, RDMA-based tiling into local memories, optimization passes, and a distributed execution runtime. This framework is implemented and evaluated on the 2nd-generation Kalray MPPA (R) clustered manycore processor. Experimental results show that super-linear speed-ups are obtained for multi-cluster execution by leveraging the bandwidth of on-chip memories and the capabilities of asynchronous RDMA engines.
Document type :
Conference papers
Complete list of metadata

Cited literature [24 references]  Display  Hide  Download
Contributor : Laurent Jonchère Connect in order to contact the contributor
Submitted on : Monday, April 8, 2019 - 1:54:14 PM
Last modification on : Thursday, January 20, 2022 - 12:54:08 PM
Long-term archiving on: : Wednesday, July 10, 2019 - 12:59:06 PM


Files produced by the author(s)



Julien Hascoet, Benoît Dupont de Dinechin, Karol Desnos, Jean-Francois Nezan. A Distributed Framework for Low-Latency OpenVX over the RDMA NoC of a Clustered Manycore. IEEE High Performance Extreme Computing Conference (HPEC 2018), Sep 2018, Waltham, MA, United States. ⟨10.1109/hpec.2018.8547736⟩. ⟨hal-02049414⟩



Les métriques sont temporairement indisponibles