2024 LLVM Developers' MeetingTable of ContentsAboutProgramKeynoteRust ❤️ LLVMState of Clang as a C and C++ CompilerTutorialsUsing MLIR from C and PythonLLVM Supply Chain Security: From developer’s desk to user’s device.A Beginners’ Guide to SelectionDAGSupport a new Calling Convention in LLVMTechnical TalksModern Embedded Development with LLVMWhat we learned from building Mojo’s optimization pipelineFloating Point in LLVM: the Good, the Bad, and the AbsentHigher-Level Linker Scripts for Embedded SystemsTowards Useful Fast-MathLLVM libc math library - Current status and future directionsExploiting MLIR Abstractions for Hardware VerificationAdvancing SPIR-V Backend Stability: Navigating GlobalISel CompromisesNew llvm-exegesis Support for RISC-V Vector ExtensionLoop Vectorisation: a quantitative approach to identify/evaluate opportunitiesEnhance SYCL offloading support to use the new offloading modelThe State of Pattern-Based IR Rewriting in MLIRVectorization in MLIR: Towards Scalable Vectors and Matrices (Part 2)Efficient Coroutine Implementation in MLIRLLVM Premerge Testing: Current State and Next StepsFine-grained compilation caching using llvm-casRelease Engineering Strategies: How LLVM and GCC Navigate Development and MaintenanceTwo Compilers, One Language, No SpecificationMaking upstream MLIR more friendly to programming languages: current upstream limitations, the ptr dialect, and the road aheadHand-In-Hand: LLVM-libc and libc++ code sharing.Challenges in Using LLVM as a Quantum Intermediate Representation(Offload) ASAN via Software Managed Virtual MemoryA C++ Toolchain for Your GPUWhen unsafe code is slow - Automatic Differentiation in RustA new constant expression interpreter for ClangGeneric implementation strategies in Carbon and ClangJSIR - Adversarial JavaScript Detection With MLIRClang Modules at ScaleShardy: An MLIR-based Tensor Partitioning System for All DialectsSwift Explicitly-Built ModulesSimplifying GPU Programming with Parametric Tile-Level Tensors In Mojolean-mlir: A Workbench for formally verifying Peephole Optimizations for MLIRImproving optimized code line table qualityImplementing Linear / Non-destructible Types in Vale and MojoAdding Pointer Authentication ABI support for your ELF platformManifesto for faster build timesMitigating use-after-free security vulnerabilities in C and C++ with language support for type-isolating allocatorsPanelsIs MLIR feature complete? Production ready?Student Technical TalksHalf-precision in LLVM libcDynamicAPInt: Infinite-Precision Arithmetic for LLVMFPOpt: Balancing Cost and Accuracy of Floating-Point Computations in LLVM IRA data-driven approach to debug info qualityGISel for Scalable Vectors: Expanding the HorizonThe syntax dialect: creating a parser generator with MLIRQuick TalksInstrumenting MLIR Based ML Compilers for GPU Performance Analysis and OptimizationPyDSL: A MLIR DSL for Python developersEmbedding Domain-Specific Languages in C++ with PolygeistVector-DDG (Vector Data Dependence Graph): For Better Visualization and Verification of Vectorized LLVM-IRFrom Fallbacks to Performance: Towards Better GlobalISel Performance on AArch64 Advanced SIMD PlatformsExtending MLIR Dialects for Deep Learning CompilersUnlocking High Performance in Mojo through User-Defined DialectsSpeeding up Intel Gaudi deep-learning accelerators using an MLIR-based compilerQuidditch: An End-to-End Deep Learning Compiler for Occamy using IREE & xDSLAtomic Reduction OperationsLLVM Governance UpdateWhy You Should Use ScudoBuilding glibc with LLVMRISC-V Support into LLVM’s libc: Challenges and Solutions for 32-bit and 64-bitBenchmarking Clang on Windows on Arm: Building and Running SPEC 2017Lightning TalksUsing llvm-libc in LLVM Embedded Toolchain for ArmHey, do you want a RISC-V debugger? - Enabling RISC-V support in LLDBMD5 Checksums in LLDBExperiments with two-phase expression evaluation for a better debugging experienceFlang UpdatePostersFuzzlang: Generating Compilation Errors to Teach ML Code FixesThe XLG framework: an MLIR replacement for ASTsaccfg: Eliminating Setup Overhead for Accelerator DispatchMLIR and Pytorch: A Compilation Pipeline targeting Huawei's Ascend BackendDeveloping an HLSL intrinsic for the SPIR-V and DirectX backendsNew Headergenxdsl-gui: A Playground for the Compiler Optimization GameAutostack: a novel approach to implementing shared stack for image size savingsMLIR Interfaces for Generic High-Level Program RepresentationsCode of ConductContactConference Dates: October 22-24, 2024Location: Santa Clara, CaliforniaEvent Site: https://llvm.swoogo.com/2024devmtgAboutThe LLVM Developers’ Meeting is a bi-annual gathering of the entire LLVM Project community. The conference is organized by the LLVM Foundation and many volunteers within the LLVM community. Developers and users of LLVM, Clang, and related subprojects will enjoy attending interesting talks, impromptu discussions, and networking with the many members of our community. Whether you are a new to the LLVM project or a long time member, there is something for each attendee.To see the agenda, speakers, and register, please visit the Event Site here: https://llvm.swoogo.com/2024devmtgWhat can you can expect at an LLVM Developers’ Meeting?Technical TalksThese 20-30 minute talks cover all topics from core infrastructure talks, to project’s using LLVM’s infrastructure. Attendees will take away technical information that could be pertinent to their project or general interest.TutorialsTutorials are 50-60 minute sessions that dive down deep into a technical topic. Expect in depth examples and explanations.Lightning TalksThese are fast 5 minute talks that give you a taste of a project or topic. Attendees will hear a wide range of topics and probably leave wanting to learn more.Quick TalksQuick 10 minute talks that dive a bit deeper into a topic, but not as deep as a Technical Talk.Student Technical TalksGraduate or Undergraduate students present their work using LLVM.PanelsPanel sessions are guided discussions about a specific topic. The panel consists of ~3 developers who discuss a topic through prepared questions from a moderator. The audience is also given the opportunity to ask questions of the panel.What types of people attend?Active developers of projects in the LLVM Umbrella (LLVM core, Clang, LLDB, libc++, compiler_rt, flang, lld, MLIR, etc).Anyone interested in using these as part of another project.Students and ResearchersCompiler, programming language, and runtime enthusiasts.Those interested in using compiler and toolchain technology in novel and interesting ways.The LLVM Developers’ Meeting strives to be the best conference to meet other LLVM developers and users.For future announcements or questions: Please visit the LLVM Discourse forums. Most posts are in the Announcements or Community categories and tagged with usllvmdevmtg.ProgramKeynoteRust ❤️ LLVMNikita Popov [Slides] [Video]This talk is about how Rust uses LLVM, how the two projects interact, and the challenges and opportunities that arise.State of Clang as a C and C++ CompilerAaron Ballman [Slides] [Video]Come along with Clang's lead maintainer on a whirlwind tour of what new standard C and C++ language features have been added to Clang in the past 1-2 years, an overview of what standards-related work the community is actively implementing for Clang 20 and beyond, and discussion of what challenges the community is facing and could use help with.TutorialsUsing MLIR from C and PythonAlex Zinenko [Slides] [Video]MLIR, like the rest of LLVM, is primarily written in C++. However, the C++ API is known to be complex and unstable. Moreover, both quick prototyping and deep integration with client frameworks calls for uses of different languages to work with MLIR, most often Python for its simplicity and C for its ubiquity. This talk will present the MLIR C API and demonstrate how it is used to construct Python bindings. Attendees of this talk will learn how to expose custom dialects in both C and Python as well as how to leverage C API to interact with MLIR from different languages.LLVM Supply Chain Security: From developer’s desk to user’s device.Tom Stellard [Slides] [Video]Compilers are often the last line of defense in securing against dangerous exploits. Compiler features like "Stack Protector", "Safe Stack", and "Stack Clash Protection" can help protect code from common programmer mistakes and thwart malicious actors trying to take advantage of buggy code. However, even the most sophisticated protection can be rendered useless if the compiler itself can’t be delivered safely to its users. In this talk we take a look at LLVM’s approach to supply chain security. How we get code from a developer’s desk safely into the hands of users. We’ll look at our release process, our access policies, and our project infrastructure to see how we are approaching supply chain security and what we can do to make it better.A Beginners’ Guide to SelectionDAGAlex MacLean, Justin Fargnoli [Slides] [Video]SelectionDAG is a crucial framework within LLVM for lowering LLVM IR into efficient machine code. However, little beginner-friendly documentation exists for it. This talk fills that void by covering the framework's architecture, target-specific optimizations, and integration with LLVM IR and Machine IR. By the end of the session, participants will be well-prepared to make their first contributions to backends using the SelectionDAG framework.Support a new Calling Convention in LLVMBrandon Wu [Slides] [Video]Vector processors play a big role among high performance computing applications such as image processing, machine learning as well as gaming, thus handling vector registers efficiently is important in terms of performance. Calling Convention(CC) is one of the most significant aspects that can affect the execution speed and spills can be avoided if vector registers between procedural calls are controlled and assigned efficiently. In this talk, we will show the design and implementation of RISC-V Vector Calling Convention for C/C++.Technical TalksModern Embedded Development with LLVMPetr Hosek [Slides] [Video]Modern day embedded development is anything but modern. I believe that we can change the way embedded development is done by using a modern toolchain based on LLVM and adopting practices that have become commonplace for C/C++ user-space software development. In this talk, I will cover our experience from migrating several internal and external baremetal projects to the Clang toolchain, the issues we encountered and the opportunities we discovered.What we learned from building Mojo’s optimization pipelineWeiwei Chen [Slides] [Video]Mojo is a programming language for heterogenous compute built on top of MLIR and LLVM. Like many other programming languages and compiler systems built this way, the LLVM pipeline is often the bottleneck for compilation time. In this talk, we will share our strategies for wrestling with LLVM and leveraging MLIR passes in our pipeline design to significantly reduce compilation time without sacrificing generated code performance. As a result, we cut time spent in LLVM from 80% to 20% of overall Mojo compilation time.Floating Point in LLVM: the Good, the Bad, and the AbsentJoshua Cranmer [Slides] [Video]This talk will cover the current state of semantics in floating-point for LLVM: the things that work, the things that don't work, and the things that are just plain missing entirely.Higher-Level Linker Scripts for Embedded SystemsDaniel Thornburgh [Slides] [Video]Embedded systems intrinsically place idiosyncratic constraints on the memory addresses of code and data. These constraints are typically met by explicitly placing sections using linker scripts. This talk explores these constraints and introduces section classes, a new LLD feature that provides a higher-level way to control placement. This largely removes the toil of updating linker scripts in response to changes to code/data sizes.Towards Useful Fast-MathAndy Kaylor [Slides] [Video]Fast-math semantic modes provide a way to selectively relax the default rules for numeric consistency used by the compiler. Relaxing these rules can improve performance, but it can also introduce accuracy errors. This talk will describe a technique to track down the cause of such errors and introduce a proposal for new LLVM optimizer infrastructure to make the debugging process easier.LLVM libc math library - Current status and future directionsTue Ly [Slides] [Video]LLVM libc math library aim to be a correctly rounded, performant, and complete C23 math library that supports various targets and use-cases. In this talk, we will go over some of the recent milestones and production usages of the LLVM libc math library, and our plans and directions for the near future.Exploiting MLIR Abstractions for Hardware VerificationBea Healy, Luisa Cicolini [Slides] [Video]Hardware verification is a fundamental, yet often painful, step of hardware design. This talk will discuss how MLIR can accelerate this process through the CIRCT infrastructure, an MLIR hardware compiler containing dialects that describe hardware at both high and low levels of abstraction. We will describe how to generate models for verification from such high-level abstractions - specifically from the finite state machine (FSM) dialect - to check properties at higher levels and optimize the overall verification procedure.Advancing SPIR-V Backend Stability: Navigating GlobalISel CompromisesMichal Paszkowski, Vyacheslav Levytskyy [Slides] [Video]This presentation outlines the recent advancements and ongoing challenges in the development of the SPIR-V backend, which has become a crucial component for supporting OpenCL, SYCL/DPC++, and soon Vulkan inside LLVM. The talk highlights the inherent complexities of generating SPIR-V, a higher-level representation compared to LLVM IR, through conventional GlobalISel translation schema. Key issues such as the translation of opaque pointers, pointer and builtin type inference, and the integration of new SPIR-V extensions are discussed. The session will cover strategies for ensuring backward compatibility with older LLVM IR, maintaining [g]MIR correctness across passes, and verifying SPIR-V binaries through GitHub actions using existing LIT tests and external tools. Plans to further integrate SPIR-V into LLVM with a new frontend are also presented.New llvm-exegesis Support for RISC-V Vector ExtensionMin Hsu [Slides] [Video]llvm-exegesis has been instrumental in calibrating LLVM's scheduling models using hardware-collected metrics, such as instruction latency. In this talk, we'll unveil the first-ever llvm-exegesis support for RISC-V vector (RVV) instructions. We'll explore the challenges of scaling llvm-exegesis to accommodate the extensive range of RVV opcodes and configurations, and how we've significantly enhanced its efficiency for use in pre-silicon hardware development environments like FPGA. Our work not only advances RISC-V but also benefits the broader LLVM community by improving the quality of scheduling models with llvm-exegesis.Loop Vectorisation: a quantitative approach to identify/evaluate opportunitiesSjoerd Meijer [Slides] [Video]In previous LLVM developer conferences, there have been several presentations and discussions on loop vectorisation, focusing on the progress of the VPlan infrastructure and vectorisation for specific back-ends. In this talk, we aim to take a different approach by identifying patterns and types of loops that the loop vectoriser cannot vectorise. Specifically, we want to: i) identify the deficiencies and missing features of the loop vectoriser, ii) group these deficiencies and find common root causes for missed vectorisation opportunities, and iii) develop a vectorisation plan to enhance code- generation quality based on these insights. Therefore, the contributions of this work and talk include: 1. A quantitative approach to find loop vectorisation opportunities and evaluate deficiencies, 2. A presentation of benchmark numbers for two loop-based benchmarks TSVC-2 and RAJAPerf, 3. A first analysis of loop vectoriser deficiencies and opportunities. 4. Thoughts on measuring and evaluating compiler changes with the LLVM test-suite. Although we concentrated on AArch64 platforms for our results, most of your findings are broadly applicable.Enhance SYCL offloading support to use the new offloading modelRavi Narayanaswamy [Slides] [Video]Driven by Intel, SYCL compiler is an LLVM-based project that implements support for the SYCL language. In our downstream implementation, We (Intel) have made several changes to the clang-linker-wrapper tool to support SYCL device code linking and wrapping. This talk includes discussion of key changes to clang-linker-wrapper tool to enable JIT/AOT compilation flow for SYCL offloading, addition of a new sycl-post-link library, SYCL specific options passed to clang-linker-wrapper, and use of existing mechanism for propagating SYCL specific metadata from the compiler to SYCL runtime library.The State of Pattern-Based IR Rewriting in MLIRMatthias Springer [Slides] [Video]Pattern-based IR rewriting through the greedy pattern rewriter and the dialect conversion framework is widely used and one of the core mechanisms of MLIR. This talk will touch upon the parts of their APIs that have evolved/changed over the last years. The main part of the talk is a set of best practices that programmers can follow when designing pattern-based rewrites. Finally, this talk will briefly touch upon a new, simpler dialect conversion driver (without pattern rollback and automatic materializations) that is currently in development.Vectorization in MLIR: Towards Scalable Vectors and Matrices (Part 2)Andrzej Warzyński [Slides] [Video]This presentation builds on the "Vectorization in MLIR" talk delivered at LLVM Dev Meeting 2023, delving deeper into the Linalg Vectorizer's capabilities within MLIR. The Linalg Vectorizer combines a simple tiling and basic-block vectorization approach with advanced concepts such as vector masking or support for scalable vectors. In this follow- up session, we will explore the implementation details of the Linalg Vectorizer, focusing on how it handles Linalg operations beyond linalg.matmul, which was covered in the previous talk. We'll also compare various vectorization pre-processing strategies—such as masking, peeling, and padding—and demonstrate how to effectively apply these strategies. Additionally, we will address the unique challenges posed by scalable vectors, including our approach to extending value-bounds analysis to accommodate these complexities. Specifically, we'll provide an update on the ongoing support for the Scalable Matrix Extension (SME), a CPU extension that enables 2D scalable vectors.Efficient Coroutine Implementation in MLIRSteffi Stumpos [Slides] [Video]Because of the growing need to offload compute to GPUs and other types of customized hardware, asynchronous programming has become a necessary feature of modern programming languages. In this talk we will share our experience in designing and implementing the asynchronous programming feature in Mojo, an MLIR based language. We will reflect on our poor experience trying to use LLVM’s coroutines and walk through how we mitigated the same deficiencies when rewriting the passes in MLIR.LLVM Premerge Testing: Current State and Next StepsLucile Rose Nihlen [Slides] [Video]Presenting a detailed technical overview of the current LLVM Premerge Infrastructure, the challenges it is facing and their proposed solutions, and a roadmap for the future development of the system.Fine-grained compilation caching using llvm-casShubham Rastogi [Slides] [Video]Last year, we demonstrated how debug information can be efficiently represented in fine-grained caching. Since then, we have used this technology to build a drop-in replacement for ccache. It is built into clang, and therefore supports advanced features necessary for real-world use, such as caching of Clang modules. Debug information can make up to 90% of an object file, therefore, efficiently handling it is paramount for the size of a build cache, and its replay speed. This year, we further improved the size of debug information by redesigning the CAS schema, and applying compression. We also made improvements to the debug info decoder to maximize replay performance.Release Engineering Strategies: How LLVM and GCC Navigate Development and MaintenanceTom Stellard, David Edelsohn [Slides] [Video]LLVM and GCC have evolved very different policies for release engineering (development cycle, scheduling, type of maintenance, duration of maintenance) that impacts the manner in which it is consumed and deployed by individual users, packagers (Linux distributions and product-versions of the compilers), and downstream projects. This session explores the differing approaches and if either compiler should consider changes in its policies to better address the needs of its user community or to make the development and release engineering process more efficient. The session will describe the release engineering process of the two compiler ecosystems followed by a conversation among the attendees about potential paths forward.Two Compilers, One Language, No SpecificationChris Bieneman [Slides] [Video]The High Level Shader Language (HLSL) is a popular cross-platform programming language for GPUs focused on realtime graphics applications. Through its over 20 year life HLSL has had two reference compilers, but no specification. As HLSL support in Clang progresses, the implementers are working through challenges caused by the lack of a specification and the language inconsistencies between the reference implementations.Making upstream MLIR more friendly to programming languages: current upstream limitations, the ptr dialect, and the road aheadMehdi Amini, Fabian Mora Cordero [Slides] [Video]In this talk, we discuss the existing limitations of upstream MLIR for representing generic programming languages and propose solutions to address some of these issues. As part of our proposed solutions, we will discuss the potential modularization of the LLVM dialect. We also present the status of the recently upstreamed ptr dialect as a first step in the modularization of LLVM and the lessons learned from this ongoing upstreaming process. Finally, we discuss the road ahead for making MLIR more attractive for compiler developers for programming languages, like finally standardizing the C/C++ calling convention on MLIR.Hand-In-Hand: LLVM-libc and libc++ code sharing.Michael Jones, Christopher Di Bella [Slides] [Video]Have you ever wondered how libc++ implements functions that are similar to libc? For example, consider std::from_chars(float) and strtof. They both take a string and output a float, but from_chars also takes an end pointer and a format argument. They operate similarly behind the scenes. Their interfaces mean that from_chars can’t easily be implemented in terms of strtof. This talk will explain Project Hand-In-Hand, an LLVM-libc and libc++ collaboration to share the internal code for very similar functions that have incompatible interfaces.Challenges in Using LLVM as a Quantum Intermediate RepresentationAndrew Litteken [Slides] [Video]The most efficient programming and compilation paradigms for quantum computing are still being explored as new applications and architectures are developed. This talk lays out several challenges of performing quantum compilation, and how the Intel Quantum Compiler meets the requirements of quantum computation by using the LLVM infrastructure. Additionally, we will discuss how the Intel Quantum Compiler makes the quantum intermediate representation more approachable to users who are not familiar with compiler infrastructure including the Function Language Extension for Quantum (FLEQ) and Quantum Circuit Object framework.(Offload) ASAN via Software Managed Virtual MemoryJohannes Doerfert [Slides] [Video]We present LLVM/Offload Sanitizer, an address sanitizer (ASAN) designed for GPUs. In contrast to the classic CPU ASAN, the offload sanitizer will avoids high memory overhead, and even memory traffic, via software managed virtual memory. Allocations result in "virtual pointers" that are checked and translated prior to accesses but without buffer zones and, in case of shared and local GPU memory, even without additional memory traffic. Our performance results show huge advantages against the classic CPU design ported to the GPU, however, we will details various pitfalls that make efficient sanitization of GPU code especially hard. Finally, we will show some initial comparison of this design against classic ASAN on the CPU.A C++ Toolchain for Your GPUJoseph Huber [Slides] [Video]This project seeks to treat the GPU as a standard hosted target by porting the LLVM C library, compiler runtime, and C++ runtime all to run on the GPU. We show how LLVM/Clang can be used to compile regular, freestanding C++ to target the GPU as well as show how to use this to create GPU libraries that can be stacked upon eachother.When unsafe code is slow - Automatic Differentiation in RustManuel Drehwald [Slides] [Video]Automatic Differentiation was accepted as an experimental Rust feature for HPC, ML, and Scientific Computing applications. We present Rust-Enzyme, an LLVM-based autodiff tool and show that differentiating idiomatic Rust can lead to significantly better performance than differentiating similar C++ code. We will discuss rustc, LLVM, JAX, Enzyme and C++ limitations to explain benchmark differences and prove that old performance assumptions for libraries and languages won't hold when compiler-based automatic differentiation is applied.A new constant expression interpreter for ClangTimm Baeder [Slides] [Video]In this talk, we will look at the development status of the new bytecode interpreter for constant expressions in Clang, implementation challenges as well as future development plans.Generic implementation strategies in Carbon and ClangRichard Smith [Slides] [Video]A dive into the generics implementation in the Carbon toolchain and the templates implementation in Clang. This talk will contrast the approaches taken and discuss some benefits of each direction.JSIR - Adversarial JavaScript Detection With MLIRZhixun Tan [Slides] [Video]Adversarial JavaScript is everywhere - web pages, mobile apps, browser extensions… you name it! To better support the detection of adversarial JavaScript, Google is using MLIR to develop JSIR, a JavaScript intermediate representation. JSIR needs to support dataflow analysis to extract suspicious signals from the code, source-to-source transformations to simplify obfuscated code, and even binary-to-source transformations for bytecode decompilation. In this talk, we describe several IR design choices we made due to the unique requirements of adversarial code analysis.Clang Modules at ScaleMichael Spencer, Ian Anderson [Slides] [Video]This presentation shares some of the things we've learned in the more than 10 years of deploying Clang modules in the SDKs. We explain what what a modular header actually is, what it takes to build thousands of modules, and the issues you may encounter when using Clang modules at scale.Shardy: An MLIR-based Tensor Partitioning System for All DialectsBart Chrzaszcz, Zixuan Jiang [Slides] [Video]Generative AI models are so large that the tensor programs they are represented as are required to be chunked (partitioned) into programs on thousands of hardware accelerators. Within Google DeepMind these models are being partitioned across TPU super clusters of over 4096 devices. In this presentation, we present a new MLIR tensor propagation system we have been developing and deploying to train these large AI models. We’ve defined our own dialect that expresses tensor shardings and compiler transformation rules as MLIR attributes. It is MLIR dialect agnostic, and has improved debugging capabilities and more configurability to the propagation algorithm over past systems.Swift Explicitly-Built ModulesArtem Chikin [Slides] [Video]Swift relies on modules exclusively for units of code distribution and library interface. Swift’s interoperability with C, ObjectiveC, and C++ leads to a heavy use of the concept of modules in the C family of languages. Building on top of Clang & LLVM infrastructure for dependency scanning, Swift is undergoing a transition to an Explicitly- Built Modules compilation model where all dependencies are discovered upfront, pre-built, and are specified as explicit inputs to compilation. This talk will describe this approach, its benefits compared to the prior Implicit Module Loading model, lessons learned along the way, and the extensive use of Clang infrastructure to support Swift’s interoperability with modules in the C family of languages.Simplifying GPU Programming with Parametric Tile-Level Tensors In MojoAhmed Taei [Slides] [Video]Today’s AI GPU workloads are dominated by operations such as matrix multiplication (matmul) and flash-attention, with state-of-the-art implementations designed to leverage the compute and memory hierarchy of modern GPUs at a tile-level granularity. Expressing these algorithms at this level, rather than using the low-level SIMT (Single Instruction, Multiple Threads) model, presents a significant challenge for kernel developers. In this talk, we will demonstrate how Mojo, a systems programming language built on MLIR, addresses this challenge through its powerful metaprogramming capabilities. Mojo enables the creation of simple yet powerful composable abstractions for parametric Tensor types, which can be tiled, distributed across the compute hierarchy, and vectorized. Additionally, the language provides GPU library authors with direct access to MLIR, making it easier for library authors to specialize high-level library operations for specific hardware targets, which facilitates the efficient development of state-of-the-art GPU kernels that outperform vendor libraries like cuBLAS.lean-mlir: A Workbench for formally verifying Peephole Optimizations for MLIRAlex Keizer, Siddharth Bhat [Slides] [Video]We aim to combine the convenience of automation with the versatility of ITPs for verifying peephole rewrites across domain-specific IRs. Our tool (lean-mlir) built in the Lean proof assistant provides: (a) a user-friendly frontend, (b) scaffolding for defining and verifying peephole rewrites, and (c) proof automation for semi-automatically verifying common compiler IR patterns. In this talk, we will showcase our work in bringing an Alive- style workflow for peephole optimizations over an LLVM style IR in Lean. We will sketch out our future vision, with the goal of making formal verification a core part of the day- to-day compiler development workflow. We hope to engage the community into providing formal semantics for many of the more complex IRs in the MLIR ecosystem.Improving optimized code line table qualityOrlando Cazalet-Hyams [Slides] [Video]Debug line tables are a key part of many development processes, including SPGO, debugging, and crash dumps. When optimising code, LLVM struggles to maintain attribution of source-lines to instructions, or to make debugger-stepping behaviours similar to the debugging unoptimised code. We propose techniques to solve to these issues, and present our evaluation of how they perform.Implementing Linear / Non-destructible Types in Vale and MojoEvan Ovadia [Slides] [Video]Linear types are the secret ingredient to ensuring "liveness": the guarantee that desired future operations will happen. With them, you can solve caching problems, guarantee the completion of futures, ensure messages to other threads are actually handled, and a lot of other unexpected benefits. We'll talk about what they are, how they're implemented in Vale, and how we can add them to the Mojo compiler to bring them into the mainstream.Adding Pointer Authentication ABI support for your ELF platformAnton Korobeynikov [Slides] [Video]Recently the majority of the patches required to support Pointer Authentication C/C++ ABI were ported from downstream implementation for arm64e platform and were submitted & integrated into LLVM mainline (and are included in the LLVM 19 release). We have complemented them with the required changes to enable pointer authentication on ELF platforms. In this talk we will present the current status of Pointer Authentication ABI for ELF platforms, its components, their specifics and the different choices that platform should make to deploy the said ABI. We will also discuss the required changes that platforms must undertake beyond the compiler toolchain and present some proofs of concept implementations based on the Musl library.Manifesto for faster build timesAlexandre Ganea, Francisco Cabrita [Video]Build times and developer iteration are important to you? Wait no more! This talk will discuss a user's point of view and then sketch out a plan for reducing the compilation times of large C++ projects. We will discuss how the LLVM fundations could be incrementaly changed to achieve this goal, and how collaboration could be shaped.Mitigating use-after-free security vulnerabilities in C and C++ with language support for type-isolating allocatorsOliver Hunt [Slides] [Video]Type based segregation of heap allocations has long been acknowledged as an effective mechanism to mitigate memory safety vulnerabilities in real world C and C++. A core problem in the general deployment of segregating allocators is the lack of language level support, such that all adoption must be manual, and existing code must be manually updated to adopt new allocator APIs. In this talk we will be presenting our work to address this problem through our proposed typed memory operations extension for Clang, and our proposal for typed allocation support in the C++ language specification.PanelsIs MLIR feature complete? Production ready?Alex Zinenko, Stella Laurenzo, Renato Golin, Tobias Grosser, Mehdi Amini, Chris Lattner [Video]Once the most fast-paced part of the LLVM source tree, the MLIR project is slowing down significantly both in the amount and complexity of changes committed. The project had a few open meetings since the start of this year as opposed to more than a dozen the year before, 1183 commits tagged with "[mlir]" were made to the tree in the first seven months of 2024 as opposed to 2045 during the same period of 2023, etc. At the same time, the increasing amount of work is focused on downstream projects using MLIR, ranging from in- tree CIR and Flang, to incubated CIRCT and Polygeist, to out-of-tree OSS projects like IREE and XLA, to the many proprietary stacks. Are these the signs of MLIR reaching a certain maturity level? Or are these the warning signs of the worrying community disengagement? Should we declare MLIR feature-complete and redirect larger changes to client projects or, on the contrary, actively lift the common parts from downstreams? What is preventing individuals and teams from collaborating more actively in the open? This panel brings together leaders from academia, start-ups and established industry players to discuss their takes on these and other hot questions about MLIR strategy.Student Technical TalksHalf-precision in LLVM libcNicolas Celik [Slides] [Video]C23 defines new floating-point types, such as _Float16, which corresponds to the IEEE 754 standard's binary16 interchange format, also known as half-precision floating-point or FP16. C23 also defines variants of the C standard library's math functions for these new types. This talk will present the implementation of _Float16 math functions in LLVM libc, their performance, and the challenges encountered while implementing them.DynamicAPInt: Infinite-Precision Arithmetic for LLVMArjun Pitchanathan [Slides] [Video]We announce a new class, DynamicAPInt, that can perform infinite-precision integer arithmetic. Unlike APInt, the user does not have to specify a particular maximum size. We also provide a more friendly user interface with overloaded operators. Finally, the class implements a small-value optimization making it significantly faster than APInt when operating on small values. In particular, we see a 2.8x speedup on an addition microbenchmark where the values always stay small. We describe the performance optimizations that we applied to achieve this level of performance.FPOpt: Balancing Cost and Accuracy of Floating-Point Computations in LLVM IRSiyuan Brant Qian [Slides] [Video]This talk introduces FPOpt, an optimization pass integrated into the LLVM-based Enzyme automatic differentiation framework, which automatically discovers improvements for floating-point programs using Herbie and tries to maximize overall accuracy of programs subject to a customizable computation cost budget. To make optimization decisions that respect realistic program behaviors, FPOpt leverages numeric profile data generated by a specialized logging functionality in Enzyme.A data-driven approach to debug info qualityEmil Pedersen [Slides] [Video]Debugging optimized code is frustrating: often, variables are missing. But some of these variables could be salvaged. In this talk, we will present a new analysis that detects when variables are lost in the compiler. This has two advantages: It allows to focus work on fixing optimizer passes that lose the most debug variables, and, by running it on real-world code, it also makes it easy to find concrete test cases where variable debug info is lost. We will use the work done on passes in the Swift frontend as an example, where we were able to increase the number of variables available in LLDB using this approach.GISel for Scalable Vectors: Expanding the HorizonJiahan Xie [Slides] [Video]Discover the groundbreaking implementation of GISel for scalable vectors, targeting the RISC-V vector extension. This talk delves into the challenges and solutions of supporting scalable vector ALU and load/store instructions, offering insights and best practices for LLVM developers working on GISel for other targets.The syntax dialect: creating a parser generator with MLIRFabian Mora-Cordero [Slides] [Video]This talk presents the syntax dialect, an MLIR dialect for formal language analysis and parser generation, and its associated tools. The syntax dialect is an MLIR dialect that can represent regular and context-free languages and parsing expression grammars (PEG). We found that using MLIR simplified the introduction of complex concepts like macros over formal languages, as we can reuse passes like function inlining to handle their intricacies. Finally, we discuss future work of creating lowerings to other MLIR dialects to be able to dynamically create and JIT lexers and parsers by using the MLIR execution engine.Quick TalksInstrumenting MLIR Based ML Compilers for GPU Performance Analysis and OptimizationCorbin Robeck [Slides] [Video]Correlating GPU kernel performance bottleneck analysis information back to program source within modern machine learning frameworks, that use MLIR and JIT style kernels, remains a challenge as it can often be difficult to attribute the performance issue to specific points within the compiler tool chain and the various lowering passes (Python/C++, GPU kernel source, multiple MLIR IRs, LLVM IR, and architecture specific ISA). In this talk we give an overview of a developed set of open source and extendible GPU kernel instrumentation passes to address this issue and how they can be integrated within popular MLIR based machine learning compilers.PyDSL: A MLIR DSL for Python developersKai Ting Wang [Slides] [Video]This talk introduces new improvements to PyDSL, a compiler research project that transforms a subset of Python down to MLIR which was originally introduced in an MLIR ODM in December 2023. While the existing MLIR infrastructure is essential to our optimization stack, it does not yet provide a language that can describe MLIR program behaviors that also benefits end-developer productivity. As such, PyDSL aims to bridge this gap by providing a faithful Python-based syntax and programming style to writing MLIR programs. The presentation will review aspects of PyDSL and introduce new ways we manage typing, translate Python syntax into MLIR, and improve the modularity and usability of the language.Embedding Domain-Specific Languages in C++ with PolygeistLorenzo Chelini [Slides] [Video]Domain-specific languages (DSLs) and compilers allow high-level abstraction and optimal performance by directly mapping abstractions to hardware. DSLs are becoming more prevalent, spanning fields from linear algebra to quantum computing, yet they often remain isolated, complicating multi-domain application optimization and integration with C++ codebases. In this talk, we propose embedding DSLs and their optimizations into general- purpose code (C or C++) using Polygeist. Our approach leverages modern compiler technology and facilitates domain-specific compilation, bridging the gap between specialized and general-purpose programming.Vector-DDG (Vector Data Dependence Graph): For Better Visualization and Verification of Vectorized LLVM-IRSumukh Bharadwaj, Raghesh Aloor [Slides] [Video]We propose Vector-DDG (Vector Data Dependence Graph), a tool to visualize and verify the complicated data flow in vectorized LLVM-IR. The visualization helps to understand the vectorized IR better and to further improve the quality of the same. The automatic verification helps improve the developer productivity by catching the vectorization errors early.From Fallbacks to Performance: Towards Better GlobalISel Performance on AArch64 Advanced SIMD PlatformsMadhur Amilkanthwar [Slides] [Video]In this talk, we will present our work on enhancing the Global Instruction Selection (GISel) framework for AArch64 Advanced SIMD platforms. We addressed its fallback to the traditional SelectionDAG due to incomplete support for certain instructions and patterns. We will present our experience with using GISel on TSVC-2, RajaPerf, and LLVM Test Suite benchmarks, which identify fallbacks across GISel due to a lack of support for various SVE instructions and AAPCS ABI. Our contributions include eliminating fallbacks in GISel, particularly for the TSVC-2 benchmark, by introducing patches across the phases of GISel. We also present our work on optimizations of GISel’s generated code which has significantly closed the performance gap between GISel and SelectionDAG on Advanced SIMD- based AArch64 platforms, especially for the TSVC-2 benchmark. These advancements mark an important step forward in improving the GISel framework, bringing us one step closer to making it default. However, we also acknowledge that further effort is required for full SVE support and tuning for other workloads.Extending MLIR Dialects for Deep Learning CompilersCharitha Saumya, Jianhui Li [Slides] [Video]This talk discusses the design of XeTile, a dialect developed for expressing and compilation of deep learning kernels. XeTile demonstrates that with a few critical extensions, MLIR dialects can be used as building blocks to support deep learning compiler development for high-performant code generation. With the "Tile" data type and a few operations, XeTile dialect greatly simplifies the lowering of dense operations. Any tile- based GEMM-like algorithms can easily be expressed in a few lines of code, including advanced optimizations like cooperative load/prefetch, K-slicing, and software pipelining.Unlocking High Performance in Mojo through User-Defined DialectsMathieu Fehr, Jeff Niu [Slides] [Video]Traditionally, a clear separation exists between language libraries and compiler intermediate representations (IRs): libraries are typically limited to API calls that the compiler cannot reason with, while IRs consist of instructions that only the compiler can analyze and transform. Embedded DSLs typically blur this line, as they often use the host language introspection mechanism, or macro system, to include their own compiler. In this talk, we will present how we merge the concept of libraries and embedded DSLs by providing in Mojo first-class support for extending its MLIR-based compiler.Speeding up Intel Gaudi deep-learning accelerators using an MLIR-based compilerJayaram Bobba [Slides] [Video]Middle-end optimizations play a critical role in generating high-performance code for deep learning accelerators. In this talk, we will present an MLIR-based fusing compiler that generates optimized LLVM IR from high-level graph IR, which is then compiled by an LLVM backend for execution on tensor processing cores in Intel Gaudi deep learning (DL) accelerator. This compiler has been in use for the past three generations of Gaudi products and provides around 54% average performance improvements at a model-level. The talk will cover the lowering pipeline, how we leverage upstream MLIR dialects and some key optimizations and learnings for compiling deep learning workloads to Gaudi.Quidditch: An End-to-End Deep Learning Compiler for Occamy using IREE & xDSLMarkus Böck, Sasha Lopoukhine [Slides] [Video]We present Quidditch, a neural network compiler and runtime, that provides an end- to-end workflow from a high-level network description to high-performance code running on ETH Occamy, one of the first chiplet-based AI research hardware accelerators. Quidditch builds on IREE, an AI compiler and runtime focused on GPUs, and a micro-kernel compiler for RISC-V-based accelerators in xDSL.Atomic Reduction OperationsGonzalo Brito Gadeschi [Slides] [Video]Atomic reductions are atomic read-modify-write operations that do not return a value, enabling them to leverage hardware support in architectures like Arm, X86, and GPUs like PTX. Despite the significant performance improvements they offer, these operations are not currently exposed in LLVM IR. This talk introduces atomic reduction operations, explores their performance benefits, explains why optimizing atomicrmw into atomic reductions is - in general - unsound, and discusses how to provide first-class exposure for them in LLVM IR.LLVM Governance UpdateChris Bieneman [Slides] [Video]Come hear the latest updates about the LLVM Governance Proposal presented at the 2023 US LLVM Developer Meeting. This talk will give a brief overview of the current state of the proposal as well as discussing the next steps as the proposal continues to move forward.Why You Should Use ScudoChia-Hung Duan, Christopher Ferris [Slides] [Video]This session will introduce Scudo, a modern memory allocator that provides additional security features. Scudo strikes a balance between allocation speed, memory footprint, and security. We will show how Scudo can help find memory bugs on Android and explain how to build your own Scudo configuration to fit your project's requirement.Building glibc with LLVMCarlos Seo [Slides] [Video]The GNU C Library (glibc) is a known missing link for any Linux distribution that aims to use clang as the default compiler. This talk walks through the required changes to make it buildable using the LLVM toolchain.RISC-V Support into LLVM’s libc: Challenges and Solutions for 32-bit and 64-bitMikhail R. Gadelha [Slides] [Video]This talk covers the integration of RISC-V support into LLVM's libc, focusing on the unique challenges posed by RISC-V's syscall interface, the 32-bit architecture complexities, and testing without hardware. Attendees will gain insights into the process of adding support for new architectures in LLVM's libc.Benchmarking Clang on Windows on Arm: Building and Running SPEC 2017Benchmarking Clang on Windows on Arm: Building and Running SPEC 2017 [Slides] [Video]In this session we go through the process of building and running SPEC 2017 benchmark suite to evaluate performance of Clang on Windows on Arm platform. We aims to provide a preliminary overview of the current state of Clang performance on Windows, particularly on the Arm platform We will discuss which benchmarks build and run successfully and identify those that fail, providing insights into the strengths and limitations of Clang on the quickly evolving Windows on Arm platform. We will present initial performance numbers, comparing Clang's results with MSVC, and highlighting key differences in their performance across various benchmarks. Additionally, we will briefly touch on how tools like the Windows Perf can improve our understanding of these results, setting the stage for future optimization efforts and deeper analysis.Lightning TalksUsing llvm-libc in LLVM Embedded Toolchain for ArmPeter Smith [Slides] [Video]Using llvm-libc in LLVM Embedded Toolchain for Arm Arm have recently added support for LLVM's libc to the LLVM Embedded Toolchain for Arm as an overlay package. This presentation will cover: * How to build the toolchain with llvm-libc libraries. * How to use the llvm-libc libraries with the toolchain. * What works with llvm-libc and what doesn't. * A comparison of llvm-libc with the embedded toolchains' picolibc. The LLVM Embedded Toolchain for Arm is one of the easiest ways to try out llvm-libc for embedded projects. We would like to encourage people to try out llvm-libc to gather feedback for its future development.Hey, do you want a RISC-V debugger? - Enabling RISC-V support in LLDBTed Woodward [Slides] [Video]"Hey, do you want a RISC-V debugger? That question started my odyssey that lead to a working upstream LLDB for RISC-V. This talk will discuss that journey.MD5 Checksums in LLDBJonas Devlieghere [Slides] [Video]Support for DWARF MD5 checksums in LLDB.Experiments with two-phase expression evaluation for a better debugging experienceIlya Kuklin [Slides] [Video]LLDB can spend a substantial amount of time on evaluating expressions during debugging. This is an issue with debugging large real-world applications. We experimented with the idea of having a limited but fast way of evaluating expressions with the ability to fall back to the current LLDB. For this purpose, we revamped a project called `lldb- eval` and integrated it into LLDB. Our experiments with this approach on large real-world applications showed that most expressions are simple enough and could be evaluated much faster making debugging experience noticeably smoother.Flang UpdateSteve Scalpone [Slides] [Video]Flang is an LLVM subproject which is a ground-up implementation of a Fortran front end written in modern C++. Flang uses MLIR as in intermediate language and implements OpenMP for CPUs and GPUs. This lightning talk touches on current development efforts, testing coverage, feature status, and performance.PostersFuzzlang: Generating Compilation Errors to Teach ML Code FixesBaodi Shan [Poster]In the realm of code repair, the diversity and accuracy of error datasets are critical for enhancing model performance. Fuzzlang, a newly developed Clang Python wrapper, addresses this need by generating a wide range of compilation errors through modifications to compilation commands or source code. It systematically collects error messages, corresponding correct and erroneous code, and AST information to build a comprehensive dataset. Fuzzlang’s dataset offers significantly greater error diversity than existing resources like Deepfix and C-Pack-IPAs, as measured against the different error kinds in Clang’s diagnostic files. In a small study we applied Fuzzlang on the llvm-project and identified 417 unique compilation errors. We fine-tuned both the Llama3-8b model and the GPT-4o-mini model, and the code correction accuracy for the observed error catergories improved from 37.22% to 93.97% for Llama3-8b and from 72.29% to 96.70% for GPT-4o-mini.The XLG framework: an MLIR replacement for ASTsFabian Mora-CorderoIn this talk, we present the XLG framework, a novel intermediate representation capable of replacing ASTs with MLIR. As part of the talk, we will also examine how to perform semantic analysis, code generation, and constant evaluation on XLG. Furthermore, we will demonstrate how these tasks can be performed in an extensible manner, allowing the introduction of new semantics rules or constructs as plugins. Finally, we present how to interoperate XLG with existing dialects and leverage existing MLIR passes to handle often tricky programming notions like meta-programming.accfg: Eliminating Setup Overhead for Accelerator DispatchAnton Lydike, Josse Van DelmModern computing is moving toward heterogeneous architectures with general compute cores and specialized accelerators. However, these accelerators require increasing cycles for configuration, creating a new bottleneck that limits peak performance. Fortunately, modern compiler techniques can address this issue. We introduce a general optimization dialect designed to eliminate setup overhead and demonstrate significant speed-ups on three accelerator platforms.MLIR and Pytorch: A Compilation Pipeline targeting Huawei's Ascend BackendAmy Wang [Poster]We present our work on compiling PyTorch code through MLIR to target Ascend AI Processors. The approach starts from PyTorch to Torch-MLIR followed by an MLIR Pipeline, converting down to a custom AscendC Dialect, where C-like AscendC code is produced with enhanced EmitC utilities. This method not only benefits Ascend users but also opens up more optimization opportunities from Ascend back to MLIR. We aim to enhance the MLIR ecosystem by sharing our experiences and we welcome any discussion about potential improvements to our pipeline, to better target AI processors.Developing an HLSL intrinsic for the SPIR-V and DirectX backendsFarzon LotfiThe tutorial will cover the basics of writing an HLSL intrinsic. From frontend to backend development to writing code gen, sema, and backend test cases. Examples included will cover how to handle cases where an intrinsic maps directly to a DXIL or SPIRV op and cases where an intrinsic needs to be replaced with an instruction expansion.New HeadergenRose Zhang, Aaryan Shukla [Poster]LLVM-libc’s headers just got a major upgrade! We ditched the old, complex Tablegen system for a sleek new YAML-based generator. This means easier cross-compiling, faster builds, and a smoother path to use LLVM-libc. Come and see how we transformed header creation and why it’s a game-changer.xdsl-gui: A Playground for the Compiler Optimization GameDalia Shaaban [Poster]Optimizing compilers built on MLIR use customizable pipelines of passes and transformations to implement various optimization strategies. While MLIR provides tools like mlir-opt for controlling compilation flows, the complexity of selecting and sequencing passes can be overwhelming due to the large number of available passes and the manual, time-intensive experimentation required. This talk introduces xdsl-gui, an interactive environment that enhances control and transparency during the compilation process. Users input source code or IR, select and apply passes, and display the updated IR. xdsl-gui also filters relevant passes and offers real-time feedback on pass selection, helping developers optimize strategies effectively.Autostack: a novel approach to implementing shared stack for image size savingsSundeep KushwahaWe propose a new technique called Autostack to share stack memory across multiple software threads which results in significant image size savings. Additionally, Autostack can also be used to improve performance by transitioning the stack from slower memory to faster memory.MLIR Interfaces for Generic High-Level Program RepresentationsHenrich LaukoDiscover how the VAST MLIR-based compiler for C/C++ extends MLIR's capabilities beyond low-level IRs to support high-level features like custom symbols and AST-like operations. This poster unveils advanced symbol tables that enable shadowing, diverse symbol types, and customizable lookups. Learn about our MLIR interfaces that integrate seamlessly with the Clang ecosystem, allowing tools such as AST queries and the Clang Static Analyzer to operate on MLIR. We will demonstrate how MLIR can replicate Clang AST behavior and represent Clang CFG primitives, enabling interoperability and analysis using Clang's high-level tools.Code of ConductThe LLVM Foundation is dedicated to providing an inclusive and safe experience for everyone. We do not tolerate harassment of participants in any form. By registering for this event, we expect you to have read and agree to the LLVM Code of Conduct.ContactTo contact the organizer, email events@llvm.org.