2024 EuroLLVM Developers' Meeting

Table of Contents

About

The Euro LLVM Developers’ Meeting is a bi-annual gathering of the entire LLVM Project community. The conference is organized by the LLVM Foundation and many volunteers within the LLVM community. Developers and users of LLVM, Clang, and related subprojects will enjoy attending interesting talks, impromptu discussions, and networking with the many members of our community. Whether you are a new to the LLVM project or a long time member, there is something for each attendee.

To see the agenda, speakers, and register, please visit the Event Site here: https://llvm.swoogo.com/2024eurollvm

What can you can expect at an LLVM Developers’ Meeting?

What types of people attend?

The LLVM Developers’ Meeting strives to be the best conference to meet other LLVM developers and users.

For future announcements or questions: Please visit the LLVM Discourse forums. Most posts are in the Announcements or Community categories and tagged with usllvmdevmtg.

Program

Keynote

Does LLVM implement security hardenings correctly? A BOLT-based static analyzer to the rescue?

Kristof Beyls [Slides] [Video]
In the past decade, security has become one of the 3 key areas in compiler design and implementation, next to correctly translating to assembly and optimization. In comparison to general correctness and optimization, we're lacking tools to test correct code generation of security hardening features. This presentation shows the results of an experiment to build a prototype binary static analyzer for 2 security hardening features (pac-ret, stack clash) using BOLT. The results are promising and I propose to integrate this into the upstream BOLT project to enable us to implement higher-quality security mitigations in LLVM and other compilers.

How Slow is MLIR

Mehdi Amini, Jeff Niu [Slides] [Video]
This talk will dig into the performance aspects involved in implementing a compiler with MLIR. We're interested here in the compile-time performance (the efficiency of the compiler implementation) instead of the generated code. We will go through implementation details of MLIR and quantify the cost of common operations (traversing or mutating the IR). We will then expose some anti-patterns that we unfortunately commonly see in MLIR-based compilers. Finally we will go through a few elements that are impacting the performance of the IR: for example the threading model of MLIR, how to use resources for zero-overhead management of large constants, taking advantage of the Properties custom storage on operations, or the aspect related to Type/Attribute intrinsic to the storage in the MLIRContext.

Tutorials

Zero to Hero: Programming Nvidia Hopper Tensor Core with MLIR's NVGPU Dialect

Guray Ozen [Video]
NVIDIA Hopper Tensor Core brings groundbreaking performance, requiring the utilization of new hardware features like TMA, Warpgroup level MMA, asynchronous barriers (mbarriers), Thread Block Cluster, and more. Despite having a compiler with these features, crafting a fast GEMM kernel remains challenging. In this talk, we will initially discuss the NVGPU and NVVM dialects, where the Hopper features have been implemented. Following that, we will delve into the implementation of multistage GEMM and warp-specialized GEMM, as used by libraries like Cutlass. Here, we will leverage MLIR's Python bindings to meta-program the IR.

Technical Talks

Revamping Sampling-Based PGO with Context-Sensitivity and Pseudo-Instrumentation

Wenlei He [Slides] [Video]
This talk describes CSSPGO, a context-sensitive sampling-based PGO framework with pseudo-instrumentation. It leverages pseudo instrumentation to improve profile quality without incurring the overhead of traditional instrumentation. It also enriches profile with context-sensitivity to aid more effective optimizations through a novel profiling methodology using synchronized LBR and stack sampling. We will also share how CSSPGO is used to lift performance of Meta's server workloads.

Deep Dive on MLIR Interfaces

Mehdi Amini [Slides] [Video]
This talk will walk through the details of implementation of interfaces in MLIR. The interfaces (OpInterfaces, DialectInterfaces, TypeInterfaces, and AttributeInterfaces) are key components of MLIR extensibility. They are composed of a convenient user API through ODS (TableGen) as well as C++ wrappers. However, there are many layers of indirection underlying their implementation, which are quite difficult to grasp with. It is a common complaint that it is impossible to debug or trace the code and understand how everything is fitting together.

Temporal Profiling and Orderfile Optimization for Mobile Apps

Ellis Hoag [Slides] [Video]
Traditional PGO can improve CPU-bound applications, but it doesn't work well for some mobile applications which are more concerned with startup time and binary size. We recently extended LLVM's IRPGO framework to support Temporal Profiling to measure an app's startup behavior. We've also created a new algorithm to generate orderfiles called Balanced Partitioning which uses temporal profiles to reduce .text section page faults during startup and can even reduce compressed binary size. And finally, we have a tool to measure an iOS app's page faults on a device to showcase our results. This talk will be useful to anyone interested in understanding how IRPGO can order functions to improve start performance and compressed size.

Enable Hardware PGO for both Windows and Linux

Wei Xiao [Slides] [Video]
In this talk, we will discuss how to enable hardware PGO by extending sampling-based PGO with enriched profiles. We will postmortem some real cases to demonstrate hardware PGO can expose more optimization opportunities than instrumentation-based PGO and thus provide better performance. Moreover, we will discuss how to enable hardware PGO on Windows based on the latest Intel VTune SEP.

Swift/C++ Interoperability

Egor Zhdan [Slides] [Video]
Swift/C++ interoperability enables incrementally incorporating Swift - a memory-safe language - into existing C++ codebases and has been used to gradually adopt Swift in large C++ projects, including the Swift compiler itself.

Leveraging LLVM Optimizations to Speed up Constraint Solving

Benjamin Mikek [Slides] [Video]
SLOT is a new tool which uses existing LLVM optimization passes to speed up SMT constraint solvers like Z3. Our strategy is to translate SMT constraints into LLVM IR, apply the optimizer, and then translate back. We find that SLOT speeds up average solving times by up to 2x for floating-point and bitvector constraints, and increases the number of constraints solved at fixed timeouts by up to 80%.

Structured Code Generation From the Ground Up

Alex Zinenko [Slides] [Video]
Native high-level code generation support in MLIR is largely based on structured code generation. This talk demystifies structured code generation in MLIR by introducing relevant concepts bottom-up from individual arithmetic operations on scalars, to SIMD operations on vectors, to manipulations on multi-dimensional tensors. Examples and illustrations show that this approach boils down to concepts present in modern hardware, though with slightly different terminology.

Contextual Instrumented-Based Profiling for Datacenter Applications

Mircea Trofin [Slides] [Video]
We present an Instrumentation-Based Profile Guided Optimization (PGO) technique that produces contextual profiles. This technique proves competitive with tip-of-tree instrumented PGO in binary size, runtime, memory overhead, and profile size when applied to a real production binary. We conclude with challenges and approaches to incorporating contextual profiles into LLVM.

C++ Modules: Build 'Em All with CMake and Clang

Alexy Pellegrini [Slides] [Video]
CMake now supports building C++ modules with Clang. This talk will cover what C++ modules are, how to build and integrate them in your projects, and the main challenges and limitations encountered.

Mojo debugging: extending MLIR and LLDB

Walter Erquinigo, Billy Zhu [Slides] [Video]
Modular has made strides in bringing debugging support for Mojo in LLDB. This talk covers the challenges faced in extending MLIR and LLDB for proper language debugging using DWARF, as well as Modular's open-source contributions. We will explore the approach to creating a user-centric debugging experience, focusing on VS Code.

Faster Compilation with GlobalISel: Skipping LLVM-IR

Tobias Stadler [Slides] [Video]
In a GlobalISel-based backend, LLVM-IR is translated to generic Machine IR (gMIR), which is then selected into target instructions. This talk covers emitting gMIR directly, skipping the LLVM-IR generation and improving compile-times by ~20%. The presentation will explore working with gMIR, how common IR constructs are lowered for GlobalISel, and discuss LLVM's instruction selectors' performance.

Experiences building a JVM using LLVM ORC JIT

Markus Böck [Slides] [Video]
JLLVM is a Java virtual machine built with LLVM, featuring a multi-tier system with an interpreter, JIT compiler, relocating garbage collector, and On-Stack replacement. This talk covers the system architecture, ORC JIT, JITLink, and using LLVM for implementing features like garbage collection and deoptimization with On-Stack replacement.

Teaching MLIR concepts to undergraduate students

Mathieu Fehr, Sasha Lopoukhine [Slides] [Video]
We present a compiler from a simple programming language to RISC-V, implemented entirely in MLIR. This course is used to teach undergraduate students modern compilation concepts and tools at the University of Edinburgh. It guides students through the whole compilation pipeline, from parsing to assembly generation, with all intermediate representations in MLIR IR.

Simplifying, Consolidating & Documenting LLDB's Scripting Functionalities

Mohamed Ismail Bennani [Slides] [Video]
This presentation explores current challenges in LLDB's scripting capabilities, emphasizing opportunities for improvement such as enhanced discoverability, updated documentation, and minimized maintenance costs. It delves into advancements in the LLDB Python module, as well as in LLDB Scripted Interface Dispatch method, ensuring a seamless conversion from private types to their scripting counterpart.

Incremental Symbolic Execution for the Clang Static Analyzer

Balázs Benics [Slides] [Video]
This talk presents a technique to speed up subsequent Clang Static Analyzer (CSA) runs on mostly unchanged code. CSA takes more time to complete than simply compiling the source code, which poses challenges for quick developer feedback. We propose a method to reuse bulk analysis for incremental and localized changes, only re-analyzing changed parts that matter.

Accurate Coverage Metrics for Compiler-Generated Debugging Information

J. Ryan Stinnett [Slides] [Video]
This talk proposes new metrics for measuring coverage of local variables in debugging information produced by compilers, aimed at improving the quality of debugging information for optimized programs. These metrics could encourage language implementations to generate better debugging data.

Optimizing RISC-V code size: Zcmt and Zcmi extensions

Gábor Márton [Video]
This presentation explores how linker relaxations optimize executable binaries on the RISC-V architecture, focusing on reducing code size and enhancing efficiency. We delve into the Zcmt and Zcmi extensions and how they complement broader optimization strategies such as LTO and post-link optimizations.

Computing Bounds of SSA Values in MLIR

Matthias Springer [Slides] [Video]
This talk presents the MLIR `ValueBoundsConstraintSet` infrastructure, which computes lower/upper bounds of SSA values or dynamic dimensions in terms of other SSA values. We showcase its use for vectorizing tensor IR and hoisting memory allocations from loops.

MLIR Vector Distribution

Kunwar Grover, Harsh Menon [Video]
We present a vector distribution framework for MLIR, which lowers computation over n-D vector types to target hardware like tensor cores or virtual ISAs. This talk covers experiences from the IREE compiler and discusses moving parts of the work upstream into MLIR.

Lifting CFGs to structured control flow in MLIR

Markus Böck [Slides] [Video]
This talk explores how MLIR models higher-level control flow operations by lifting control flow graphs (CFGs) to structured control flow. We detail recent upstream implementations and their use cases, covering input constraints and guarantees offered by the algorithm.

MLIR Linalg Op Fusion - Theory & Practice

Javed Absar [Slides] [Video]
This talk covers essential concepts of the Linalg dialect in MLIR and focuses on Linalg Op Fusion, providing insights into Linalg ops, transformations, and how fusion can optimize performance.

Efficient Data-Flow Analysis on Region-Based Control Flow in MLIR

Weiwei Chen [Slides] [Video]
This talk presents an efficient Sparse Conditional Constant Propagation (SCCP) algorithm using a structured region-based control flow model for MLIR. This model is applicable for various data-flow analyses, making it easy to debug and efficient.

LLVM-IR-Dataset-Utils - Scalable Tooling for IR Datasets

Aiden Grossman, Ludger Paehler [Slides] [Video]
This talk introduces LLVM-IR-Dataset-Utils, a tool that builds LLVM IR-based datasets for developing machine-learned heuristics and validating optimization strategies. We explore its applications for heuristics validation, correctness testing, and compile-time performance tracking.

Panels

Carbon: An experiment in different tradeoffs

Chandler Carruth, Jon Ross-Perkins, Richard Smith [Video]
This panel is an opportunity to ask the team working on Carbon about any and all of the tradeoffs and experiments that they're undertaking, how the project and experiment are progressing, and more. A group of active members of the Carbon project will share what we've learned so far, including both things we're excited about and would recommend LLVM and other projects to look at, as well as things that haven't gone so well. We'll also be able to talk about what we have left to do, how we plan to approach it, and places where we likely need help.

Student Technical Talks

Better Performance Models for MLGO Training

Viraj Shah [Slides] [Video]
This talk presents the development of a performance model capable of accurately modeling longest latency cache misses and including the resulting overhead in throughput and reward signal calculation. The work also explores different ways to balance model accuracy and feasibility in training and usage.

Transform-dialect schedules: writing MLIR-lowering pipelines in MLIR

Rolf Morel [Slides] [Video]
This talk introduces the Transform dialect, which exposes MLIR transformations as ops. It shows how transform operations can be composed into reusable schedules and entire MLIR-lowering pipelines can be declaratively specified using this feature.

How expensive is it? Big data for ML cost modeling

Aiden Grossman [Slides] [Video]
This presentation describes tooling and processes to create accurate learned cost models by using a large dataset, ComPile, and benchmarking infrastructure like llvm-exegesis. The approach enables training on a more representative set of basic blocks for improved model accuracy.

Sign Extension Optimizations inside LLVM

Panagiotis Karouzakis [Slides] [Video]
This talk explores optimizing sign extensions in LLVM IR, particularly for programs running on 64-bit architectures from 32-bit. It discusses how sign extensions can be eliminated by analyzing what upper bits are needed and how dynamic programming optimization can be applied from abstract syntax trees to LLVM IR.

High Performance FFT Code Generation through MLIR Linalg Dialect and Micro-kernel

Yifei He [Video]
This talk covers a compilation framework that can automatically generate high-performance Fast Fourier Transform (FFT) code using MLIR Linalg Dialect and micro-kernels. FFT libraries are a critical component in high-performance computing (HPC) software.

Quick Talks

Implementing MIR Pattern Matching & Rewriting for GlobalISel Combiners

Pierre van Houtryve [Slides] [Video]
GlobalISel combiners relied on ad-hoc C++ code for combiner rules despite using TableGen. Pierre worked on adding MIR patterns with PatFrag-like systems and type inference to the GlobalISel combiners, enabling combiner rules to be written directly in TableGen.

Enhancing clang-linker-wrapper to support SYCL/DPC++

Alexey Sachkov [Slides] [Video]
Alexey Sachkov discusses changes Intel made to the clang-linker-wrapper to support SYCL device code linking and wrapping. The talk highlights device code handling, metadata propagation, and other features introduced to the tool.

Parallelizing applications with indirect memory writes in MLIR

Pablo Antonio Martinez, Hugo Trachino [Slides] [Video]
This work introduces a method to automatically parallelize loops with indirect memory writes in MLIR. The approach shows up to 4.9x speedup across benchmarks and addresses the challenge of data races in AI and HPC applications.

Arcilator for ages five and up: flexible self-contained hardware simulation made easy

Théo Degioanni [Slides] [Video]
Théo Degioanni introduces Arcilator, a simulator for hardware modeled in CIRCT dialects. This talk explains the new dialect-based interface for Arcilator, eliminating the need for heavy C++ wrappers, and showcases its use cases.

3 years of experience with the LLVM security group -- successes and remaining challenges

Kristof Beyls [Slides] [Video]
Kristof Beyls presents a summary of the LLVM security group's achievements over the past three years, discussing successes and remaining challenges, such as threat modeling and supply chain attacks, while also proposing areas of improvement.

LLDB: What's in a Register?

David Spickett [Slides] [Video]
David Spickett introduces a new feature in LLDB 18 that leverages Clang's Abstract Syntax Tree to disassemble register contents and help developers avoid manually interpreting register values, simplifying debugging tasks.

Practical fuzzing for C/C++ compilers

Oliver Stannard [Slides] [Video]
Oliver Stannard shares an overview of fuzzing techniques used to test Clang and GCC compilers. He covers open-source fuzzers like csmith and custom code generators, along with techniques to generate useful bug reports from fuzzer failures.

Repurposing LLVM analyses in MLIR: Also there and back again across the tower of IRs

Henrich Lauko [Slides] [Video]
Henrich Lauko explains how legacy LLVM analyses can be reused in MLIR using the concept of a "tower of IRs." This allows analysis outcomes from LLVM IR to be seamlessly applied to MLIR dialects, streamlining the integration process.

Life with Opaque Pointers from a Frontend Perspective

Sebastian Neubauer [Slides] [Video]
Sebastian Neubauer discusses challenges frontends face due to the opaque pointer transition, sharing experience moving SPIR-V and DXIL frontends to opaque pointers. The talk also highlights solution patterns to ease the transition.

Debug information for macros

Adrian Prantl [Slides] [Video]
Adrian Prantl explains how Swift and C preprocessor macros are represented in debug information and showcases how Swift’s macro expansions are handled in LLDB and other debuggers using LLVM DWARF extensions.

From C++ ranges to shorter template names: A C++ Debugging journey

Michael Buch [Slides] [Video]
Michael Buch outlines the recent improvements made to LLDB's expression evaluator to better support C++ debugging, including the addition of default template arguments and Clang's preferred_name attribute for enhanced variable view presentation.

Target-aware vectorization for irregular loops or instruction patterns

Wei Wei, Mindong Chen [Slides] [Video]
Wei Wei and Mindong Chen introduce a target-aware vectorization approach for irregular loops and instruction patterns, focusing on generating irregular vector instructions and discussing trade-offs in implementation strategies.

Mitigating lifetime issues for C++20 coroutines

Utkarsh Saxena [Slides] [Video]
Utkarsh Saxena explores lifetime issues in C++20 coroutines, particularly those related to reference parameters, and introduces the `[[clang::coro_lifetimebound]]` attribute to extend lifetime bound analysis and improve coroutine code safety.

Loop Iteration Space Splitting

Ashutosh Nema [Slides] [Video]
Ashutosh Nema presents loop iteration space splitting as a framework to enable various optimizations, discussing the scenarios where loop splitting facilitates performance improvements beyond the elimination of induction range checks, there are additional scenarios where employing loop splitting could facilitate further optimizations.

A Wishlist for Faster LLVM Back-ends

Alexis Engelke [Slides] [Video]
LLVM's back-end is often associated with high performance but long compilation times, even for unoptimized builds. This talk shows where compile-time within the LLVM back-end is spent and outlines some ideas for future improvements.

Lightning Talks

The Road to Github Actions: Migrating LLVM’s CI

Aiden Grossman [Slides] [Video]
This talk covers LLVM’s migration to Github Actions for precommit CI, highlighting challenges, future directions, and community involvement in improving the CI infrastructure.

Multilib Configuration Files

Peter Smith [Slides] [Video]
Peter Smith describes the configuration file-based multilib implementation in Clang and shares experience using it in an embedded toolchain.

Carbon's high-level semantic IR

Richard Smith [Slides] [Video]
Richard Smith introduces Carbon’s Semantics IR, discussing challenges and benefits of using a linear execution-based model for program representation during type-checking.

Enabling Loop Vectorization for Compressing Store Pattern

Tejas Joshi [Slides] [Video]
Tejas Joshi presents LLVM’s newly enabled vectorization for compressing store patterns, showing performance improvements across applications.

Automatic Proxy App Generation through Input Capture and Generation

Johannes Doerfert, Ivan R. Ivanov [Slides] [Video]
Johannes Doerfert and Ivan R. Ivanov introduce a framework for capturing LLVM IR function inputs or generating synthetic input, facilitating relocatable and reproducible benchmark runs.

How we use MLIR to test ReRAM cells

Maximilian Bartel [Slides] [Video]
Maximilian Bartel explains how the linalg and transform dialects in MLIR, along with the Python execution engine, are used to test ReRAM cells for neuromorphic computing devices.

Automatic Retuning of Floating-Point Precision

Ivan R. Ivanov, William S. Moses [Video]
This talk presents a new pass in the Enzyme framework to automatically change floating-point precision in applications, optimizing for performance without sacrificing accuracy.

OpenSSF Scorecard - Do we need to improve our security practices?

Marius Brehler [Slides] [Video]
Marius Brehler discusses the OpenSSF Scorecard tool and its integration into the LLVM project, highlighting actions to improve the project’s security score and best practices.

Posters

Developing an LLVM Backend for VLIW RISC-V Vector Extension Architectures

Hao-Chun Chang [Poster]
An experimental VLIW RISC-V target with Vector extension is presented. The poster summarizes the LLVM compiler implementation process for the target, with Swing Modulo Scheduling enabled to enhance performance. LMUL issues are discussed, along with the approach to handle them, and experimental performance results are shown.

Hybrid Execution: Combining Ahead-of-Time and Just-in-Time Compilation of LLVM Bitcode

Christoph Pichler [Poster]
This poster presents an approach combining Ahead-of-Time (AOT) and Just-in-Time (JIT) compilation for LLVM bitcode using GraalVM. The goal is to improve warm-up performance by identifying suitable code for native execution, avoiding JIT overhead.

Dynamic Evolution of Instruction Set Simulators: A Practical Approach with 'ALPACA'

Nicholas Fry [Poster]
ALPACA is a CIRCT MLIR approach to generating Instruction Set Simulators (ISS) from RTL/HLS descriptions of accelerator architectures. This poster covers automatic generation of state update functions and how the ISS dynamically evolves with hardware implementations.

PoTATo: Points-to Analysis via Domain-Specific MLIR Dialect

Robert Konicar [Poster]
PoTATo is a unifying framework designed for points-to analysis through a domain-specific MLIR dialect. It simplifies memory effects representation and optimizes the analysis process using general MLIR tooling.

VAST: MLIR Compiler for C/C++

Henrich Lauko [Poster]
VAST is an MLIR-based compiler designed for program analysis of C/C++. This poster introduces its architecture and the stack of intermediate representations (IRs), focusing on its applications in static analysis, language transpilation, and decompilation.

IR Around the World: Statistical Analysis of a Massive Multi-Language Corpus of IR

Khoi Nguyen, Andrew Kallai [Poster]
This poster presents statistical analyses of the generated IR and optimization pipelines across multiple languages. The results aim to inform further investigations into pass pipeline optimization and compile-time performance.

Solving Phase Ordering with Off-Policy Deep Reinforcement Learning Algorithms

Oliver Chang [Poster]
This work addresses the phase ordering problem in LLVM compilers using off-policy deep reinforcement learning (DRL). The use of Double Deep Q-learning within the Compiler Gym framework is demonstrated to reduce IR instruction count while using a lightweight neural network and memory buffer.

Code of Conduct

The LLVM Foundation is dedicated to providing an inclusive and safe experience for everyone. We do not tolerate harassment of participants in any form. By registering for this event, we expect you to have read and agree to the LLVM Code of Conduct.

Contact

To contact the organizer, email events@llvm.org.