2022 LLVM Developers' Meeting

Table of Contents

About

The LLVM Developers’ Meeting is a bi-annual gathering of the entire LLVM Project community. The conference is organized by the LLVM Foundation and many volunteers within the LLVM community. Developers and users of LLVM, Clang, and related subprojects will enjoy attending interesting talks, impromptu discussions, and networking with the many members of our community. Whether you are a new to the LLVM project or a long time member, there is something for each attendee.

Program

Keynote

Paths towards unifying LLVM and MLIR

Nicolai Hähnle [Slides] [Video]
Where do you see the LLVM project 10 years from now? Intermediate representation (IR) plays a central role in this question. LLVM IR can be represented on top of MLIR's data structures, but in practice it uses its own data structures. That creates a barrier in compilation pipelines and has other downsides. Is there hope for unification on a single set of data structures? How can we move towards such a goal? Let me show you a framework for thinking about these questions and some concrete ideas for how we can move in the right direction.

Implementing Language Support for ABI-Stable Software Evolution in Swift and LLVM

Doug Gregor [Slides] [Video]
Unlike its peer languages, Swift has made the deliberate decision to embrace a stable Application Binary Interface (ABI) along with native code compilation, such that separately-compiled software modules can evolve independently without breaking binary compatibility. Come learn about the impact that a stable ABI has on the design of a programming language and its implementation in LLVM.

Technical Talks

Implementing the Unimplementable: Bringing HLSL's Standard Library into Clang

Chris Bieneman [Slides] [Video]
The HLSL programming language has a rich library of built in types that model semantics which can't be written in HLSL. Clang's implementation of HLSL leverages existing extensions and abstractions with a few tweaks here and there to implement the unimplementable datatypes in valid Clang ASTs.

Heterogeneous Debug Metadata in LLVM

Scott Linder [Slides] [Video]
An alternative debug information representation for LLVM is proposed, which removes classes of redundant representations of semantically equivalent expressions and makes expression evaluation context-free. These changes open the possibility of general support for heterogeneous architectures, as well as more aggressive optimizations.

Clang, Clang: Who's there? WebAssembly!

Paulo Matos [Slides] [Video]
An introduction to the Reference Types and Garbage Collection proposal along with what already has been upstreamed and how we propose to integrate the trickier bits into Clang/LLVM.

MC/DC: Enabling easy-to-use safety-critical code coverage analysis with LLVM

Alan Phipps [Slides] [Video]
Modified Condition/Decision Coverage (MC/DC) is a comprehensive code coverage criterion that is extremely useful in weeding out hidden bugs and guaranteeing robustness. MC/DC is very handy for average developers as well as those in the safety-critical embedded Industrial, Automotive, and Aviation markets where it is required. In this talk, I will show how we extended LLVM’s Source-based Code Coverage infrastructure to support MC/DC by tracking test vectors, which represent the sequential true/false evaluation of conditions through a boolean expression.

What does it take to run LLVM Buildbots?

David Spickett [Slides] [Video]
Many of us have broken a Buildbot at least once, but do you know what goes into running them? Why are there so many configurations and who are the people behind it all? Attend this talk to see behind the scenes of one of the largest providers of LLVM Buildbots.

llvm-gitbom: Building Software Artifact Dependency Graphs for Vulnerability Detection

Bharathi Seshadri, Ed Warnicke [Slides] [Video]
GitBOM is an open-source initiative to construct a verifiable Artifact Dependency Graph (ADG) and enable automatic, verifiable artifact resolution. In this talk, we will explain about GitBOM and demonstrate a use case on CVE detection using llvm-gitbom. Given a version of OpenSSL, we will show how we detect if this version has any vulnerabilities that are not fixed and what if any, have been fixed in that version.

CuPBoP: CUDA for Parallelized and Broad-range Processors

Ruobing Han [Slides] [Video]
We propose and build a framework that executes CUDA programs on non-NVIDIA devices without relying on any other programming languages. In particularly, compared with existing CUDA on CPU frameworks, our framework achieves the highest coverage and performance on X86, AArch64, and RISC-V.

Uniformity Analysis for Irreducible CFGs

Sameer Sahasrabuddhe [Slides] [Video]
We present a definition of thread convergence that is reasonable for targets that execute threads in groups (e.g., GPUs). This is accompanied by a definition of uniformity (i.e., when do different threads compute the same value), and a *uniformity analysis* that extends the existing divergence analysis to cover irreducible control-flow.

Using Content-Addressable Storage in Clang for Caching Computations and Eliminating Redundancy

Steven Wu, Ben Langmuir [Slides] [Video]
In this presentation, we introduce a Content-Addressable Storage (CAS) library for LLVM and use it to create a compilation caching system for Clang. We increase cache hits between related compiler invocations by caching fine-grained actions/requests that prune and canonicalize their inputs.

Direct GPU Compilation and Execution for Host Applications with OpenMP Parallelism

Shilei Tian, Joseph Huber [Slides] [Video]
In this talk, we present a direct GPU compilation scheme that leverages the portable target offloading interface provided by LLVM/OpenMP. The prototype allows users to compile for, and test on, the GPU without explicitly handling kernel launches, data mapping, or host-device synchronization.

Linker Code Size Optimization for Native Mobile Applications

Gai Liu [Slides] [Video]
Modern mobile applications have grown rapidly in binary size, which restricts user growth and updates for existing users. The proposed optimizations are generic and could be incorporated into popular linkers as optimization passes.

Minotaur: A SIMD Oriented Superoptimizer

Zhengyang Liu [Slides] [Video]
Minotaur is a synthesis-based superoptimizer for the LLVM intermediate representation, focusing on optimizing LLVM’s portable vector operations and intrinsics specific to Intel AVX extensions. Speedups of up to 1.4x were observed in micro-benchmarks.

ML-based Hardware Cost Model for High-Level MLIR

Dibyendu Das [Slides] [Video]
We develop a machine learning-based cost model for high-level MLIR, predicting target variables such as CPU/GPU/xPU utilization and enabling better optimization during compilation.

VAST: MLIR for program analysis of C/C++

Henrich Lauko [Slides] [Video]
Program analysis has specific requirements for compiler toolchains that are usually unsatisfied. Ideally, an analysis tool would pick the best-fit representation that preserves interesting semantic features. Such a representation would know the precise relationships between low-level constructs in IR and the analyzed source code. LLVM IR is rarely the best fit representation for program analysis. In this talk, we will look at how we can improve the situation using an MLIR infrastructure called VAST. VAST is an MLIR library for multi-level C/C++ representation. With VAST, an analysis does not need to commit to a single best fit. Instead, an analysis can have simultaneous visibility into multiple progressions of the code, from very high-level down to very low-level.

MLIR for Functional Programming

Siddharth Bhat [Slides] [Video]
In this talk, we discuss the implementation, upstreaming, and community concerns of adopting LLVM and MLIR within the Lean4 proof assistant, and more broadly, discuss takeaways for MLIR to have strong support for functional programming languages. We walk through the process of creating a new MLIR-based backend for Lean4, a dependently typed programming language. We demonstrate our MLIR dialect which encodes core functional programming concepts within the SSA style. We also address worries around MLIR adoption in the Lean4 community and discuss how the MLIR community could help with adoption for functional programming languages.

SPIR-V Backend in LLVM: Upstream and Beyond

Michal Paszkowski, Alex Bezzubikov [Slides] [Video]
SPIR-V is a binary intermediate language commonly used for GPU computations and targeted by many projects (including OpenCL, OpenMP, and SYCL). This talk covers the process of upstreaming the SPIR-V GlobalISel-based backend, addressing issues stemming from the high-level design of SPIR-V, and discussing steps required to maintain it in-tree. We also touch on extensibility, support for other APIs/SPIR-V flavors, and the ongoing effort to unify methods of lowering builtins across GPU targets.

IRDL: A Dialect for dialects

Mathieu Fehr, Théo Degioanni [Slides] [Video]
IRDL is a dialect for representing IR definitions. It allows users to define dialects in a declarative style, enabling dynamic registration of dialects using dynamic dialects recently introduced in MLIR. This talk also presents two lower-level dialects, IRDL-SSA and IRDL-Eval, along with their lowerings. These enable optimizations on operation verifiers not currently handled by ODS, simplifying the generation of dialects through metaprogramming or external languages like Python.

Automated translation validation for an LLVM backend

Nader Boushehrinejad Moradi [Slides] [Video]
This talk introduces ARM-TV, an automated bug-finding tool for LLVM’s AArch64 backend. ARM-TV builds on Alive2, a bounded translation validator for LLVM’s optimization passes. The tool has uncovered 17 new miscompilation bugs in the SelectionDAG and GlobalISel backends, many of which have been fixed. The presentation outlines the current state of the prototype and plans for its enhancement.

llvm-dialects: bringing dialects to the LLVM IR substrate

Nicolai Hähnle [Slides] [Video]
llvm-dialects is an add-on to LLVM that allows defining dialects and transitioning gradually to their use within a compiler stack built on LLVM IR. This talk is aimed at those interested in leveraging MLIR-like features without rewriting existing compiler stacks.

YARPGen: A Compiler Fuzzer for Loop Optimizations and Data-Parallel Languages

Vsevolod Livinskii [Slides] [Video]
YARPGen is a generative compiler fuzzer designed to stress-test loop optimizations. Its tests include optimization prerequisites and data access patterns required to trigger optimizations. This talk highlights YARPGen's ability to find over 120 bugs in compilers like Clang, GCC, ISPC, and DPC++, as well as proprietary compilers.

RISC-V Sign Extension Optimizations

Craig Topper [Slides] [Video]
The 64-bit RISC-V target differs from other 64-bit targets as it lacks 32-bit sub-registers or i32 as a legal type. This talk explores the challenges in generating optimal assembly for C code with prevalent 32-bit integers, along with optimizations and custom passes added to improve RISC-V code generation.

Execution Domain Transition: Binary and LLVM IR can run in conjunction

Jaeyong Ko, Sangrok Lee [Slides] [Video]
This talk addresses challenges in analyzing multi-CPU architectural IoT malware through static and dynamic analysis. It showcases cross-architectural malware analysis using the LLVM interpreter by lifting code to LLVM IR and resolving slow execution issues with execution domain transition. The talk concludes with a demo.

Tutorials

Using LLVM's libc

Sivachandra Reddy, Michael Jones, Tue Ly [Slides] [Video]
LLVM's libc is a sanitizer friendly green field libc which will eventually serve as a full drop-in-replacement for the system libc. While it is not yet ready to be a drop-in-replacement, it has enough functionality that one can start using it in their projects and avail themselves of its benefits in production contexts. In this tutorial, we will talk about how we have used modern C++ to implement a sanitizer instrumentable libc which can be easily decomposed and custom tuned. We will also talk about how it is being used in production contexts at Google. There has been a lot of interest in the LLVM community in putting together an LLVM only toolchain. We will demonstrate how one can build and package the libc in order to put together such a toolchain and use it in their projects.

Sunho Kim [Slides] [Video]
JITLink is a new JIT linker in LLVM developed to eliminate limitations in LLVM's JIT implementation. With JITLink, it is not required to use special compilation flags or workarounds to load code into the JIT, since most of the object file features including small code model and thread local storage are fully implemented. This tutorial will explain how to use JITLink by working on a windows JIT application that just-in-time links to third-party static libraries. The tutorial will also dig into internals of JITLink by working on a JITLink plugin managing SEH exception tables.

Panels

Machine Learning Guided Optimizations (MLGO) in LLVM

Johannes Doerfert (moderator), Petr Hosek, Chris Cummins, Aiden Grossman, Mircea Trofin, Zoom: Yundi Qian, Ondrej Sykora, Dibyendu Das, Amir Ashouri, Mostafa Elhoushi, S. VenkataKeerthy [Slides] [Video]
The panel brings together: compiler engineers working on ML-guided optimizations in LLVM, product engineers applying such optimizations in production, and researchers exploring the area.

Panel discussion on Best practices with toolchain release and maintenance

Aditya Kumar, Petr Hosek , Jeremy Stenglein , Han Zhu [Video]
With the proliferation of vendors shipping custom llvm toolchain, it would be great to bring in toolchain distributors and share each other's experience. We’ll focus the discussion on: - Integration testing - Keeping compatibility with GNU toolchain - Challenges of keeping up with upstream - Changes in upstream llvm-project that will help

Static Analysis in Clang

Gabor Horvath, Bruno Cardoso Lopes, Artem Dergachev, Yitzhak Mandelbaum, Dmitri Gribenko [Video]
The Clang ecosystem has multiple static analysis tools. The compiler can produce easy to understand error and warning messages. The Clang Static Analyzer (CSA) is capable of finding bugs that span across multiple function calls using symbolic execution. Clang Tidy can help modernize large code bases using automatic code rewrites. While there are some out of tree Clang-based static analysis tools, CSA and Clang Tidy were the go-to solutions for the static analysis needs of the community. However, during the last year, a couple of RFCs surfaced on the mailing list to add a dataflow analysis framework to Clang and introduce a MLIR based new IR. Come and join this panel discussion to learn how to get involved in the ongoing static analysis projects, what the new proposals mean for our loved and proven tools, and what does the future holds for static analysis in Clang. You will have the opportunity to ask questions from some of the code owners of these tools, and authors of the new proposals.

High-level IRs for a C/C++ Optimizing Compiler

Bruno Lopes, Alex Zinenko, Ivan Baev , Johannes Doerfert, Chris Lattner, Mehdi Amini [Video]
Most C/C++ optimizing compilers employ multiple intermediate representations (IRs). LLVM IR has been the cornerstone of C/C++ LLVM-based compilers for many years. However, optimizations involving loop nests, data layout, or multidimensional arrays, for example, challenge the existing LLVM infrastructure. The panelists will discuss higher-level (HL) IRs for optimizing compilers, primarily from C/C++ and optimization/analysis perspective. We will ask our expert panel to share their experience and insights on: What optimizations are easier to implement and maintain with HL IR? - Must-have and good-to-have features in HL IR for optimizing compilers - Agreement on MLIR as HL IR for C/C++ optimizing compilers? - Other motivations for HL IR (in addition to run-time performance) - e.g. security, debuggability? - Promising HL IR initiatives for C/C++ compilers Both experts and newcomers are welcome to attend. Send questions to the organizers prior to the conference to allow consideration.

Quick Talks

LLVM Education Initiative

Chris Bieneman, Mike Edwards, Kit Barton [Slides] [Video]
Interested in expanding the LLVM community through education? Interested in better documentation, tutorials, and examples? Interested in sharing your knowledge to help other engineers grow? Come learn about the proposal for a new LLVM Education working group!

Enabling AArch64 Instrumentation Support In BOLT

Elvina Yakubova [Slides] [Video]
BOLT is a post-link optimizer, built on top of the LLVM. It achieves performance improvement by optimizing application's code layout based on execution profile gathered by a sampling profiler, such as Linux perf tool. In case when necessary advanced hardware counters for precise profiling are not available on some target platforms, one may collect profile by instrumenting binary. In this talk, we will cover changes essential for enabling instrumentation support in BOLT for a new target platform using AArch64 as an example.

Approximating at Scale: How strto<float> in LLVM’s libc is faster

Michael Jones [Slides] [Video]
The string to float conversion functions are deceptively simple. You pass them a string of digits, and they return the floating point value closest to that string. The process of finding that value as quickly as possible is very complex, and in this talk I will describe how the implementation in LLVM’s libc works. The focus will be mainly on the three conversion algorithms used, specifically W.D Clinger’s Fast Path, the Eisel-Lemire fast_float algorithm, and Nigel Tao’s Simple Decimal Conversion. I will explain the overview of how they work and how they fit together to create a complete strto<float> implementation. Finally, I’ll demonstrate how this makes it faster than existing libc implementations, specifically about 15% faster than glibc.

MIR support in llvm-reduce

Matthew Arsenault [Slides] [Video]
Bugpoint has long existed to assist in reducing LLVM IR testcases, but lacked an equivalent tool for reducing code generation passes. Recently llvm-reduce gained support for reducing MIR. This talk will discuss the current status and future improvements, difficulties MIR presents compared to the higher level IR, and my experience using it to reduce register allocation failures in large test cases.

Interactive Crashlogs in LLDB

Med Ismail Bennani [Slides] [Video]
While we'd all prefer if programs never crashed, the logs captured from those crashes can help troubleshoot bugs and get your program up and running again. At Apple, diagnostic data gets captured into a crash report: a detailed textual representation of the program's state when it crashed. Thanks to the addition of interactive crashlogs, developers can now load crash reports into LLDB and interact with them like a regular lldb session, using all the techniques they're already familiar with to debug the issue.

clang-extract-api: Clang support for API information generation in JSON

Zixu Wang [Slides] [Video]
This talk introduces clang-extract-api, a new tool to collect and serialize API information from header files, that enables downstream tooling, like documentation generation, to inspect API symbols without having to understand the clang AST.

Using modern CPU instructions to improve LLVM's libc math library

Tue Ly [Slides] [Video]
LLVM libc's math routines aim to be both performant and correctly rounded according to the IEEE 754 standard. Modern CPU instruction sets include many useful instructions for mathematical computations. Effectively utilize these instructions could boost the performance of your math functions' implementations significantly. In this talk, we will discuss about how 2 families of such instructions, fused-multiply-add (FMA) and floating point rounding, are used in LLVM's libc for x86-64 and ARMv8 architectures allowing us to have comparable performance to glibc while achieving accuracy for all rounding modes.

Challenges Of Enabling Golang Binaries Optimization By BOLT

Vasily Leonenko, Vladislav Khmelevskyi [Slides] [Video]
Golang is a very specific language, which compiles to an architecture-specific binary, but also uses its own runtime library, which in turn uses a version-specific data structures to support internal mechanisms like garbage collection, scheduling, reflection and others. BOLT is a post-link optimizer – it rearranges code and data locations in the output binary, so Golang-specific tables should also be updated according to performed modifications. In this talk, we will cover the status of current implementation of Golang support in BOLT, achieved optimization effect and challenges of enabling Golang binaries optimization by BOLT.

Inlining for Size

Kyungwoo Lee [Slides] [Video]
Inlining for size is critical in mobile apps as app size continues to grow. While a link-time optimization (LTO) largely minimizes the app size at minimum size optimization (-Oz), a scalable link-time optimization (ThinLTO) misses many inline opportunities because each module inliner works independently without modeling the size cost globally. We first show how to use the ModuleInliner with LTO. Then, we describe how to improve inlining with ThinLTO by extending the bitcode summary, followed by a global inline analysis. We also explain how to overcome import restrictions, often appearing in Objective-C or Swift, by pre-merging bitcode modules. We reduced the code size by 2.8% for SocialApp, 4.0% for ChatApp, and 3.0% for Clang, compared to -Oz with ThinLTO.

Automatic indirect memory access instructions generation for pointer chasing patterns

Przemysław Ossowski [Slides] [Video]
This short talk provides an example how newly introduced feature into real HW can be adopted into Clang and LLVM and thanks to it easily available for the user. Indirect Memory Access Instructions (IMAI) can provide significant performance improvement but its usability is limited with particular HW restrictions. This talk will present how we tried to reconcile HW limitations, complexity of IMAI and ease of use by handling dedicated pragma in Clang and applying Complex Patterns in DAG in LLVM Backend.

Todd Snider [Slides] [Video]
Embedded-application systems have limited memory, so user control over placement of functions and variables is important. The programmer uses a linker script to define a memory configuration and specify placement constraints on input sections that contain function and variable definitions. With LTO enabled, it is critical that the compiler incorporate link-time placement information into the LTO recompile (Edler von Koch - LLVM 2017). This talk discusses a compiler and linker implementation that roughly follows the ideas presented in Edler von Koch, highlighting differences in our implementation that offer significant advantages.

Expecting the expected: Honoring user branch hints for code placement optimizations

Stan Kvasov, Vince Del Vecchio [Slides] [Video]
LLVM's __builtin_expect, and a variant we recently added, __builtin_expect_with_probability, allow source code control over branch weights and can boost performance with or without PGO via hot/cold splitting. But in LLVM optimization, it's not always intuitive how to update branch weight metadata with control flow changes. We talk about recent issues with losing branch weights in SimplifyCFG and possible improvements to the infrastructure for maintaining branch weights.

CUDA-OMP — Or, Breaking the Vendor Lock

Joseph Huber, Johannes Doerfert [Slides] [Video]
In this talk we show that performance portability and interoperability are achievable goals even for existing (HPC) software. Through compiler and runtime augmentation, we can run off-the-shelf CUDA programs efficiently on AMD GPUs and further debug them on the host, all without modifications of the original source code. As a side-effect, a modern LLVM/Clang will provide a compilation environment in which CUDA and OpenMP offload are fully interoperable, allowing the use of both in the same project, even the same kernel, without intrinsic overheads.

Thoughts on GPUs as First-Class Citizens

Johannes Doerfert [Slides] [Video]
In this short talk we will ramble about some of the discrepancies between GPU and CPU targets as well as the accompanying infrastructure. While we briefly mention ongoing efforts to rectify some of the problems, we'll mainly focus on the areas where solutions are sparse and efforts are required.

Building an End-to-End Toolchain for Fully Homomorphic Encryption with MLIR

Alexander Viand [Slides] [Video]
Fully Homomorphic Encryption (FHE) allows a third party to perform arbitrary computations on encrypted data, learning neither the inputs nor the computation results. However, the complexity of developing an efficient FHE application currently limits deploying FHE in practice. In this talk, we will first present the underlying challenges of FHE development that motivate the development of tools and compilers. We then discuss how MLIR has been used by three different efforts, including one led by us, to significantly advance the state of the art in FHE tooling. While MLIR has brought great benefits to the FHE community, we also want to highlight some of the challenges experienced when introducing the framework to a new domain. Finally, we conclude by discussing how the ongoing efforts could be combined and unified before potentially being up-streamed.

Lightning Talks

LLVM Office Hours: addressing LLVM engagement and contribution barriers

Kristof Beyls [Slides] [Video]
As part of registering for the 2021 LLVM dev meeting, participants were asked to answer a few questions about how the LLVM community could increase engagement and contributions. Out of the 450 people replying, the top 3 issues mentioned were "sometimes people aren't receiving detailed enough feedback on their proposals"; "people are worried to come across as an idiot when asking a question on the mailing list/on record"; "People cannot find where to start; where to find documentation; etc." These were discussed in the community workshop at the 2021 LLVM dev meeting, and a summary of that discussion was presented by Adelina Chalmers as a keynote session, see 2021 LLVM Dev Mtg "Deconstructing the Myth: Only real coders contribute to LLVM!? - Takeaways." One of the solutions suggested to help address those top identified barriers from the majority of participants is introducing the concept of "office hours." We have taken steps since then to make "office hours" a reality. In this lightning talk, I will talk about what issues "office hours" is aiming to address; how both newbies and experienced contributors can get a lot of value out of them; and where we are in implementing this concept and how you can help for them to be as effective as possible.

Improved Fuzzing of Backend Code Generation in LLVM

Peter Rong [Slides] [Video]
Fuzzing has been an effective method to test software. However, even with libFuzzer, the LLVM backend is not sufficiently fuzzed nowadays. The difficulties are twofold. First, we lack a better way to monitor program behavior; edge coverage is not effective when the backend heavily relies on the target descriptor, where data flow is more important than control flow. Second, the mutation method is naive and ineffective. We designed a new tool to better fuzz the LLVM backend and found numerous missing features inside AMD. We also found many bugs in LLVM upstream, eight of which have been confirmed, and two of which are fixed.

Interactive Programming for LLVM TableGen

David Spickett [Slides] [Video]
Interactive programming with Jupyter is a game changer for learning. The ability to have your code and documentation in one place, always up-to-date and extendable. See how this is being applied to a core part of LLVM, TableGen, and why we should embrace the concept.

Efficient JIT-based remote execution

Anubhab Ghosh [Slides] [Video]
In this talk, we demonstrate a shared memory implementation and its performance improvements for most use cases of JITLink. We demonstrate the benefits of a separate executor process on top of the same underlying physical memory. We elaborate on how this work will be useful to larger projects such as clang-repl and Cling.

FFTc: An MLIR Dialect for Developing HPC Fast Fourier Transform Libraries

Yifei He [Slides] [Video]
Discrete Fourier Transform (DFT) libraries are one of the most critical software components for scientific computing. Inspired by FFTW, a widely used library for DFT HPC calculations, we apply compiler technologies for the development of HPC Fourier transform libraries. In this work, we introduce FFTc, a domain-specific language, based on Multi-Level Intermediate Representation (MLIR), for expressing Fourier Transform algorithms. FFTc is composed of: A domain-specific abstraction level (FFT MLIR dialect), a domain-specific compilation pipeline, and a domain-specific runtime (work in progress). We present the initial design, implementation, and preliminary results of FFTc.

Recovering from Errors in Clang-Repl and Code Undo

Purva Chaudhari, Jun Zhang [Slides] [Video]
In this talk, we outline the PTU-based error recovery capability implemented in Clang and available in Clang-Repl. We explain the challenges in error recovery of templated code. We demonstrate how to extend the error recovery facility to implement restoring the Clang infrastructure to a previous state. We demonstrate the `undo` command available in Clang-Repl and the changes required for its reliability.

10 commits towards GlobalISel for PowerPC

Kai Nacke, Amy Kwan, Nemanja Ivanovic [Slides] [Video]
We share our experiences with the first steps to implement GlobalISel for the PowerPC target.

Nonstandard reductions with SPRAY

Jan Hueckelheim [Slides] [Video]
We present a framework that allows non-standard floating point reductions in OpenMP, for example to ensure reproducibility, compute roundoff estimates, or exploit sparsity in array reductions.

Type Resugaring in Clang for Better Diagnostics and Beyond

Matheus Izvekov [Slides] [Video]
In this presentation, we talk about the effort to implement type resugaring in Clang. This is an economical way to solve, for the majority of cases, diagnostic issues related to the canonicalization of template arguments during instantiation. The infamous 'std::basic_string' appearing on the diagnostics when the user wrote 'std::string' is the classic example.

Swift Bindings for LLVM

Egor Zhdan [Slides] [Video]
Using LLVM APIs from a different language than C++ has often been necessary to develop compilers and program analysis tools. However, LLVM headers rely on many C++ features, and most languages do not provide interoperability with C++. As part of the ongoing Swift/C++ interoperability effort, we have been creating Swift bindings for LLVM APIs that feel convenient and natural in Swift, with the purpose of using the bindings to implement parts of the Swift compiler in Swift. In this talk, I will present our current status and what we were able to accomplish so far.

Min-sized Function Coverage with IRPGO

Ellis Hoag, Kyungwoo Lee [Slides] [Video]
IRPGO has a mode to collect function entry coverage, which can be used for dead code detection. When combined with Lightweight Instrumentation, the binary size and performance overhead should be small enough to be used in a production setting. Unfortunately, when building an instrumented binary with -Oz, the “.text" size overhead is much larger than what we’d expect from the injected instrumentation instructions alone. In fact, even if we block instrumentation for all functions we still get a 15% “.text" size overhead from extra passes added by IRPGO. This talk explores the flags we can use to create a function entry coverage instrumented binary with a “.text" size overhead of 4% or smaller.

High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs in Polygeist/MLIR

Ivan R. Ivanov [Slides] [Video]
We extend Polygeist/MLIR to succinctly represent, optimize, and transpile CPU and GPU parallel programs. Through the use of our new operations (e.g. memory effects-based barrier) and transformations, we can successfully transpile GPU Rodinia and PyTorch benchmarks to efficiently run on the CPU faster than their existing CPU parallel versions.

Tools for checking and writing non-trivial DWARF programs

Chris Jackson [Slides] [Video]
DWARF expressions describe how to recover the location or value of a variable which has been optimized away. They are expressed in terms of postfix operations that operate on a stack machine. A DWARF program is encoded as a stream of operations, each consisting of an opcode followed by a variable number of literal operands. Some DWARF programs are difficult to interpret and check for correctness in their assembly-language format. Currently, checking a DWARF expression requires the building of an executable with debuginfo and running the executable in a debugger, such as LLDB. We propose and have begun a fun project to construct a small suite of tools to aid in construction and checking of non-trivial DWARF programs.

Analysis of RISC-V Vector Performance Using MCA Tools

Michael Maitland [Slides] [Video]
The llvm-mca tool performs static performance analysis on basic blocks and the llvm-mcad tool performs dynamic performance analysis on program traces. These tools allow us to gain insights on how sequences of instructions run on different subtargets. In this talk, I will discuss the shortcomings of these tools when they are tasked to report on RISC-V programs containing vector instructions, how we have extended these tools to generate more accurate reports for RISC-V vector programs, and how these improved reports can be used to make meaningful improvements to scheduler models and assist performance analysis.

Optimizing Clang with BOLT using CMake

Amir Ayupov [Slides] [Video]
Advanced build configuration with BOLT for faster Clang.

Exploring OpenMP target offloading for the GraphCore architecture

Jose M Monsalve Daiz [Slides] [Video]
GraphCore is a mature and well documented architecture that features a MIMD execution model. Unlike other market players, GraphCore systems are available, its compiler infrastructure is based on LLVM, and it allows direct compilation to the device. Furthermore, the Poplar SDK is a C++ library that can be directly used with the current OpenMP Offloading Runtime (i.e. libomptarget). In this short presentation, we describe the strategy we are currently using to explore compilation of OpenMP Offloading support for the GraphCore architecture.

Student Technical Talks

Merging Similar Control-Flow Regions in LLVM for Performance and Code Size Benefits

Charitha Saumya [Slides] [Video]
In this talk, we will discuss about Control-flow Melding (CFM) and its implementation in LLVM. CFM is a new compiler transformation that exploits both instruction and control-flow similarity to improve performance and reduce code size. CFM uses a hierarchical region and instruction alignment approach to merge common code fragments. CFM is implemented as an LLVM-IR transformation pass and our evaluation suggests its utility in multiple applications.

Alive-mutate: a fuzzer that cooperates with Alive2 to find LLVM bugs

Yuyou Fan [Slides] [Video]
We developed a new fuzzer, Alive-mutate, that randomly alters an LLVM module and then invokes the Alive2 translation validation tool to see if the mutated module is optimized correctly. Alive-mutate achieves high throughput by avoiding the creation of invalid IR and also by running in the same address space as Alive2, keeping OS-related overhead out of our fuzzing loop. We support 9 different kinds of mutation and have used Alive-mutate to find 23 LLVM bugs including 10 miscompilation bugs in the AArch64 backend and 5 crashes in the instruction combiner.

Enabling Transformers to Understand Low-Level Programs

William S. Moses, Zifan Guo [Slides] [Video]
This talk explores the application of Transformers to learning LLVM, which can open up new possibilities in optimization. Low-level programs like LLVM tend to be more verbose than high-level languages to precisely specify program behavior and provide more details about microarchitecture, all of which make it difficult for machine learning. We apply Transformer models to translate from C to both unoptimized (-O0) and optimized (-O1) LLVM IR and discuss various techniques that can boost model effectiveness. On the AnghaBench dataset, our model achieves a 49.57% verbatim match and BLEU score of 87.68 against Clang -O0 and 38.73% verbatim match and BLEU score of 77.03 against Clang -O1.

LAGrad: Leveraging the MLIR Ecosystem for Efficient Differentiable Programming

Mai Jacob Peng [Slides] [Video]
Automatic differentiation (AD) is a central algorithm in machine learning and optimization. This talk introduces LAGrad, a reverse-mode source-to-source AD system that differentiates tensor operations in the linalg, scf, and tensor dialects of MLIR. LAGrad leverages the value semantics of linalg-on-tensors in MLIR to simplify the analyses required to generate adjoint code that is efficient in terms of both run time and memory consumption. LAGrad also combines AD with MLIR’s type system to exploit structured sparsity patterns such as lower triangular tensors. We compare performance results to Enzyme, a state of the art AD system, on Microsoft’s ADBench suite. Our results show speedups of up to 2x relative to Enzyme and in some cases use 30x less memory.

Posters

Coming Soon.