Attending this event?
Back To Schedule
Tuesday, October 3 • 16:45 - 17:45
Taro: Task graph-based Asynchronous Programming Using C++ Coroutine

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Log in to leave feedback.
Task graph computing system (TGCS) plays an essential role in high-performance computing. Unlike loop-based models, TGCSs encapsulate function calls and their dependencies in a top-down task graph to implement irregular parallel decomposition strategies that scale to large numbers of processors, including manycore central processing units (CPUs) and graphics processing units (GPUs). As a result, recent years have seen a great deal amount of TGCS research, just name a few: Taskflow, oneTBB, Kokkos-DAG, and HPX. However, one common challenge faced by TGCSs is the issue of synchronization within each task. For instance, in scenarios where a task involves executing GPU operations, a CPU thread typically needs to wait until the GPU completes the operations before proceeding further. This synchronization overhead can hinder performance and limit the overall scalability of TGCSs.

The introduction of C++ coroutines in C++20 has revolutionized asynchronous programming, offering improved concurrency and expressiveness. However, integrating TGCS with C++ coroutines presents several challenges. Firstly, existing TGCS solutions are not compatible with C++ coroutines, as the coroutine paradigm deviates from traditional C++ programming. This incompatibility makes it difficult to seamlessly incorporate coroutines into existing TGCS frameworks. Secondly, C++ coroutine programming is extremely difficult and requires a solid understanding of the underlying concepts and mechanisms. The introduction of a new paradigm adds complexity and a steep learning curve for developers. Lastly, while C++ coroutines offer a powerful mechanism for managing asynchronous operations, designing and implementing an efficient scheduler to leverage their capabilities remains challenging. To fully exploit the benefits of C++ coroutines, there is a need for a specialized scheduler that can handle large numbers of coroutines and make optimal use of hardware resources.

To address these challenges, we present Taro: Task-Graph-Based Asynchronous Programming using C++ Coroutine. Taro offers a task-graph-based programming model for C++ coroutines, simplifying the expression of complex control flows and reducing development complexity. Additionally, Taro incorporates an efficient work-stealing scheduling algorithm tailored for C++ coroutines, minimizing unnecessary context switches, CPU migrations, and cache misses.

In this session, I will introduce Taro's programming model and demonstrate how Taro can enable efficient multitasking between CPU and GPU tasks, avoiding blocking wait on CPU threads for GPU tasks to finish. I will show the example code for using Taro. Finally, I will demonstrate how our solution can improve the performance of a real-world RTL simulation workload and microbenchmarks. Taro will be open-source and available on GitHub.

avatar for Dian-Lun Lin

Dian-Lun Lin

PhD candidate, University of Wisconsin-Madison
I’m a fourth-year Ph.D. student at the Department of Electrical and Computer Engineering at the University of Wisconsin-Madison. My research interests focus on parallel computing and GPU computing using C++ and CUDA. During my recent three-year Ph.D. studies, I have published three... Read More →

Tuesday October 3, 2023 16:45 - 17:45 MDT
Cottonwood 2/3