Wednesday
Room 1
15:00 - 16:00
(UTC+02)
Talk (60 min)
Block-Based Parallel Programming
Parallel programming can be intimidating, but doesn’t need to be! There's a new paradigm for parallel programming that's newcomer-friendly, highly productive, and performant: block-based programming.
Block-based programming models divides inputs into local arrays (tiles) that are processed concurrently by groups of threads (blocks). Users write sequential array-centric code, and the framework handles parallelization, synchronization, and data movement behind the scenes. Block-based models have been around for a long time, but in recent years, they've grown in popularity for GPU programming in languages such as [Triton](https://openai.com/index/triton/), [JAX/Pallas](https://docs.jax.dev/en/latest/pallas/index.html), and [Warp](https://nvidia.github.io/warp/modules/tiles.html), aiming to make parallelism more accessible and increase portability.
In this example-driven talk, we'll cover the basics of block-based programming in both Python and C++. We'll present cuTile, NVIDIA's new block-based programming model for Python, C++, and other languages, and Tile IR, the new compiler stack that it is built with. We'll reveal new details about this new technology for the first time in this talk. We'll compare and contrast block-based models with traditional parallel programming models.
We'll look at a variety of examples, including a new tile-based [LLAMA3](https://github.com/meta-llama/llama3)-based large language model demo, a stencil code, and an FFT solver.
In this session, you'll:
- Learn the best practices for writing block-based parallel applications for CPUs and GPUs.
- Gain insight into the performance of block-based code and how it actually gets executed.
- Discover how to reason about and debug block-based applications.
- Understand the differences between block-based and traditional parallel programming and when each paradigm should be used.
By the end of the session, you'll understand how block-based programming enables more intuitive, portable, and efficient development of high-performance, data-parallel applications.