Wednesday
Room 2
11:40 - 12:40
(UTC+02)
Talk (60 min)
Are You Smarter Than A Branch Predictor?
A single unpredictable branch can quietly dominate your execution time. In this (very) high energy, interactive session (part performance talk, part game show), the audience becomes the branch predictor. Participants will see small C++ snippets drawn from real-world production code and vote on which version is faster. Stress balls included for correct answers :) .
Each example of illustrative pseudocode and patterns will focus on branch prediction and elimination. Yes, they're microbenchmarks. But they expose patterns that show up in real systems, especially when profiling points to control flow as your bottleneck.
Examples will use animated visuals, annotated assembly, and performance counter data to explore how modern CPUs handle control flow. We'll look at different architectures: from out-of-order x86 cores to in-order ARM and embedded processors to see how each behaves under pressure. We will also look at how different popular compilers, from GCC to MSVC, transform the same source code in surprisingly different ways.
Through these examples, we’ll dig into branch prediction and the costs of misprediction, indirect calls and virtual dispatching, conditional moves versus branches (including when and why compilers tend to emit CMOV and when they don't), and more! Each example will be connected to real world trade-offs: what -O2 and -O3 already do for you, determinism and worst-case latency concerns, code size constraints in certain systems, and why the same branch can behave very differently across a variety of architectures.
We’ll measure hardware, inspect branch counters, and discuss how to design trustworthy microbenchmarks without fooling ourselves.
These examples will come from real-world production scenarios that have surprised even experienced engineers. The talk has been refined through multiple internal runs and fits comfortably within the allotted time. While examples are shown in C++, the principles apply to any language targeting modern CPUs.
Attendees will leave with:
- A concrete mental model of how branches behave on real processors
- Practical patterns for reducing mispredictions and improving worst-case latency
- A clearer understanding of when not to micro-optimize
- The confidence to validate changes using proper profiling tools
