Performance engineering - being friendly to your hardware

Practical software does not run in an abstract vacuum, it runs on underlying hardware platforms. Practical software engineering does not exist in an abstract vacuum either. The software layer sits in between the domain specific requirements on top and the underlying runtime platforms below.

Platform

Technique

Many interesting developments have happened on all three of those layers over the years, and while contemporary hardware has gone a long way forward, it often suffers from the attention deficit caused by an overshadowing flood of advancements and “advancements” in the software part of the universe. This new shiny programming language is safe, performant, and solves a backlog of problems that have been dragging for long. While that new shiny programming paradigm automagically relieves from dealing with low level details and the toolchain is plain amazing. The hardware side brings into this fistfight a set of new architectures, ISAs, and hardware abstractions – just to stay on par with the software side. Looks perfect? What else would an engineer dream about, no?

Not really. Let’s take a look at the contemporary commodity hardware platforms of today, and also at the trendy software engineering waves of today, and try to sense how and why it could (and frequently does) cross out the potential benefits of hardware advancements – and what could be done to actually be friendly to your underlying hardware, and at what cost.

This is a set of somewhat separate topics, bound together into a common logical set of performance engineering.

How do language constructs such as references, lambdas, inheritance, object representation, runtime checks, and selected STL examples map to the actual runnable platform level code, and at what cost.

The notion of out-of-order execution, claims on whether OoO is superior and there is no need to look at the level of instruction selection. What specifically is out of order in the context of contemporary x86 platforms (surprisingly, it is not instructions), and what impact does it have to the overall performance. A brief look into where the complexity lies inside the contemporary high performance execution cores and sockets, and why aspects such as variable length instruction encoding are trivial to resolve.

Memory hierarchy operation, logical, physical, and geometrical address spaces, their relationship and translations, memory performance in virtual address spaces, and clever ways to hide capacitor array’s latency.

Vectorization and why it is still not there yet universally everywhere, and why it won’t be?

Data dependencies and what could be done about it.

How could one help the compiler to do the right thing, and yet more important – how could one stay away from making things more complex for the compiler to do the right thing.

Branching control.

ABI aspects, parameter passing, compilation unit scope and its impact.

The claims on the imminent obsolescence of x86 and how the new wave of ARM and RISC-V upstarts will overtake it once again after previous not so successful attempts.

This is a reasonably simplified talk on a set of interrelated topics for generalist software engineering audience, motivated by the trends of handwaving on how Rust, Go, Zig, D, and the species yet unseen will take over C with classes and solve all the outstanding problems in the field of performance, security, time to market, and global warming too. A viewpoint is from someone dealing with performance optimization in domains of network engineering, electronic trading, and high performance computing. No specialty background is required and expected from the target audience, although the depth of coverage is targeted for senior engineering staff sufficiently proficient in the contemporary software engineering ecosystem. The content itself would be a lightened-up version of specialty workshops organized for hardware engineers on how software expects to see the platform, and for software engineers on what the underlying hardware actually does; not presented publicly previously. Content could be tweaked according to the target audience profile if you would see a need for that, however it is not expected to be an entry-level talk at all. The level of possible technical opinionated argumentation, reaching up to controllable flaming, is also possible, either via a format of a workshop-style presentation, or a panel discussion.

NDC { TechTown }

Performance engineering - being friendly to your hardware

Ignas Bagdonas