2016 European LLVM Developers' Meeting

Table of Contents

About

The LLVM Developers’ Meeting is a bi-annual gathering of the entire LLVM Project community. The conference is organized by the LLVM Foundation and many volunteers within the LLVM community. Developers and users of LLVM, Clang, and related subprojects will enjoy attending interesting talks, impromptu discussions, and networking with the many members of our community. Whether you are a new to the LLVM project or a long time member, there is something for each attendee.

To see the agenda, speakers, and register, please visit the Event Site here: https://llvm.org/devmtg/2016-03/#schedule

What can you can expect at an LLVM Developers’ Meeting?

What types of people attend?

The LLVM Developers’ Meeting strives to be the best conference to meet other LLVM developers and users.

Please visit the event site for all the information, call for papers, and more: https://llvm.org/devmtg/2016-03/#schedule

For future announcements or questions: Please visit the LLVM Discourse forums. Most posts are in the Announcements or Community categories and tagged with usllvmdevmtg.

Program

Presentations Abstracts

Clang, libc++ and the C++ standard

Marshall Clow (Qualcomm), Richard Smith (Google) [Slides] [Video]
The C++ standard is evolving at a fairly rapid pace. After almost 15 years of little change (1998-2010), we've had major changes in 2011, 2014, and soon (probably) 2017. There are many parallel efforts to add new functionality to the language and the standard library.

Codelet Extractor and REplayer

Chadi Akel (Exascale Computing Research), Pablo De Oliveira Castro (University of Versailles), Michel Popov (University of Versailles), Eric Petit (University of Versailles), William Jalby (University of Versailles) [Slides] [Video]
Codelet Extractor and REplayer (CERE) is an LLVM-based framework that finds and extracts hotspots from an application as isolated fragments of code. Codelets can be modified, compiled, run, and measured independently from the original application. Through performance signature clustering, CERE extracts a minimal but representative codelet set from applications, which can significantly reduce the cost of benchmarking and iterative optimization. Codelets have proved successful in auto-tuning target architecture, compiler optimization or amount of parallelism. To do so, CERE goes trough multiple llvm passes. It first outlines at IR level the loop to capture into a function using CodeExtractor pass. Then, depending on the mode, CERE inserts the necessary instructions to either capture or replay the loop. Probes can also be inserted at IR level around loops to enable instrumentation through externals libraries. Finally CERE also provides a python interface to easily use the tool.

New LLD linker for ELF

Rui Ueyama (Google) [Slides] [Video]
Since last year, we have been working to rewrite the ELF support in LLD, the LLVM linker, to create a high-performance linker that works as a drop-in replacement for the GNU linker. It is now able to bootstrap LLVM, Clang, and itself and pass all tests on x86-64 Linux and FreeBSD. The new ELF linker is small and fast; it is currently fewer than 10k lines of code and about 2x faster than GNU gold linker.

Improving LLVM Generated Code Size for X86 Processors

David Kreitzer (Intel), Zia Ansari (Intel), Andrey Turetskiy (Intel), Anton Nadolsky (Intel) [Slides] [Video]
Minimizing the size of compiler generated code often takes a back seat to other optimization objectives such as maximizing the runtime performance. For some applications, however, code size is of paramount importance, and this is an area where LLVM has lagged gcc when targeting x86 processors. Code size is of particular concern in the microcontroller segment where programs are often constrained by a relatively small and fixed amount of memory. In this presentation, we will detail the work we did to improve the generated code size for the SPEC CPU2000 C/C++ benchmarks by 10%, bringing clang/LLVM to within 2% of gcc. While the quoted numbers were measured targeting Intel® Quark™ microcontroller D2000, most of the individual improvements apply to all X86 targets. The code size improvement was achieved via new optimizations, tuning of existing optimizations, and fixing existing inefficiencies. We will describe our analysis methodology, explain the impact and LLVM compiler fix for each improvement opportunity, and describe some opportunities for future code size improvements with an eye toward pushing LLVM ahead of gcc on code size.

Towards ameliorating measurement bias in evaluating performance of generated code

Kristof Beyls (ARM) [Slides] [Video]
To make sure LLVM continues to optimize code well, we use both post-commit performance tracking and pre-commit evaluation of new optimization patches. As compiler writers, we wish that the performance of code generated could be characterized by a single number, making it straightforward to decide from an experiment whether code generation is better or worse. Unfortunately, performance of generated code needs to be characterized as a distribution, since effects not completely under control of the compiler, such as heap, stack and code layout or initial state in the processors prediction tables, have a potentially large influence on performance. For example, it's not uncommon when benchmarking a new optimization pass that clearly makes code better, the performance results do show some regressions. But are these regressions due to a problem with the patch, or due to noise effects not under the control of the compiler? Often, the noise levels in performance results are much larger than the expected improvement a patch will make. How can we properly conclude what the true effect of a patch is when the noise is larger than the signal we're looking for?

A journey of OpenCL 2.0 development in Clang

Anastasia Stulova (ARM) [Slides] [Video]
In this talk we would like to highlight some of the recent collaborative work among several institutions (namely ARM, Intel, Tampere University of Technology, and others) for supporting OpenCL 2.0 compilation in Clang. This work is represented by several patches to Clang upstream that enable compilation of the new standard. While the majority of this work is already committed, some parts are still a work in progress that should be finished in the upcoming months.

Building a binary optimizer with LLVM

Maksim Panchenko (Facebook) [Slides] [Video]
Large-scale applications in data centers are built with the highest level of compiler optimizations and typically use a carefully tuned set of compiler options as every single percent of performance could result in vast savings of power and CPU time. However, code and code-layout optimizations don't stop at compiler level, as further improvements are possible at link-time and beyond that.

SVF: Static Value-Flow Analysis in LLVM

Yulei Sui (University of New South Wales), Peng Di (University of New South Wales), Ding Ye (University of New South Wales), Hua Yan (University of New South Wales), Jingling Xue (University of New South Wales) [Slides] [Video]
This talk presents SVF, a research tool that enables scalable and precise interprocedural Static Value-Flow analysis for sequential and multithreaded C programs by leveraging recent advances in sparse analysis. SVF, which is fully implemented in LLVM (version 3.7.0) with over 50 KLOC core C++ code, allows value-flow construction and pointer analysis to be performed in an iterative manner, thereby providing increasingly improved precision for both. SVF accepts points-to information generated by any pointer analysis (e.g., Andersen's analysis) and constructs an interprocedural memory SSA form, in which the def-use chains of both top-level and address-taken variables are captured. Such value-flows can be subsequently exploited to support various forms of program analysis or enable more precise pointer analysis (e.g., flow-sensitive analysis) to be performed sparsely. SVF provides an extensible interface for users to write their own analysis easily. SVF is publicly available at http://unsw-corg.github.io/SVF.

Run-time type checking with clang, using libcrunch

Chris Diamand (University of Cambridge), Stephen Kell (Computer Laboratory, University of Cambridge), David Chisnall (Computer Laboratory, University of Cambridge) [Slides] [Video]
Existing sanitizers ASan and MSan add run-time checking for memory errors, both spatial and temporal. However, currently there is no analogous way to check for type errors. This talk describes a system for adding run-time type checks, largely checking pointer casts, at the Clang AST level.

Molly: Parallelizing for Distributed Memory using LLVM

Michael Kruse (INRIA/ENS) [Slides] [Video]
Motivated by modern day physics which in addition to experiments also tries to verify and deduce laws of nature by simulating the state-of-the-art physical models using large computers, we explore means of accelerating such simulations by improving the simulation programs they run. The primary focus is Lattice Quantum Chromodynamics (QCD), a branch of quantum field theory, running on IBM newest supercomputer, the Blue Gene/Q.

How Polyhedral Modeling enables compilation to Heterogeneous Hardware

Tobias Grosser (ETH) [Slides] [Video]
Polly, as a polyhedral loop optimizer for LLVM, is not only a sophisticated tool for data locality optimizations, but also has precise information about loop behavior that can be used to automatically generate accelerator code.

Bringing RenderScript to LLDB

Luke Drummond (Codeplay), Ewan Crawford (Codeplay) [Slides] [Video]
RenderScript is Android's compute framework for parallel computation via heterogeneous acceleration. It supports multiple target architectures and uses a two-stage compilation process, with both off-line and on-line stages, using LLVM bitcode as its intermediate representation. This split allows code to be written and compiled once, before execution on multiple architectures transparently from the perspective of the programmer.

C++ on Accelerators: Supporting Single-Source SYCL and HSA Programming Models Using Clang

Victor Lomuller (Codeplay), Ralph Potter (Codeplay), Uwe Dolinsky (Codeplay) [Slides] [Video]
Heterogeneous systems have been massively adopted across a wide range of devices. Multiple initiatives, such as OpenCL and HSA, have appeared to efficiently program these types of devices.

A closer look at ARM code size

Tilmann Scheller (Samsung Electronics) [Slides] [Video]
The ARM LLVM backend has been around for many years and generates high quality code which executes very efficiently. However, LLVM is also increasingly used for resource-constrained embedded systems where code size is more of an issue. Historically, very few code size optimizations have been implemented in LLVM. When optimizing for code size, GCC typically outperforms LLVM significantly.

Scalarization across threads

Alexander Timofeev (Luxoft), Boris Ivanovsky (Luxoft) [Slides] [Video]
Some of the modern highly parallel architectures include separate vector arithmetic units to achieve better performance on parallel algorithms. On the other hand, real world applications never operate on vector data only. In most cases whole data flow is intended to be processed by vector units. In fact, vector operations on some platforms (for instance, with massive data parallelism) may be expensive, especially for parallel memory operations. Sometimes instructions operating on vectors of identical values could be transformed into corresponding scalar form.

Tutorials Abstracts

LLDB Tutorial: Adding debugger support for your target

Deepak Panickal (Codeplay), Andrzej Warzynski (Codeplay) [Slides] [Video]
This tutorial explains how to get started with adding a new architecture to LLDB. It walks through all the major steps required and how LLDB's various plugins work together in making this a maintainable and easily approachable task. It will cover: basic definition of the architecture, implementing register read/write through adding a RegisterContext, manipulating breakpoints, single-stepping, adding an ABI for stack walking, adding support for disassembly of the architecture, memory read/write through modifying Process plugins, and everything else that is needed in order to provide a usable debugging experience. The required steps will be demonstrated for a RISC architecture not yet supported in LLDB, but simple enough so that no expert knowledge of the underlying target is required. Practical debugging tips, as well as solutions to common issues, will be given.

Analyzing and Optimizing your Loops with Polly

Tobias Grosser (ETH), Johannes Doerfert (Saarland University), Zino Benaissa (Quic Inc). [Slides] [Video]
The Polly Loop Optimizer is a framework for the analysis and optimization of (possibly imperfectly) nested loops. It provides various transformations such as loop fusion, loop distribution, loop tiling as well as outer loop vectorization. In this tutorial we introduce the audience to the Polly loop optimizer and show how Polly can be used to analyze and improve the performance of their code. We start off with basic questions such as "Did Polly understand my loops?", "What information did Polly gather?", "How does the optimized loop nest look like?", "Can I provide more information to enable better optimizations?", and "How can I utilize Polly's analysis for other purposes?". Starting from these foundations we continue with a deeper look in more advanced uses of Polly: This includes the analysis and optimization of some larger benchmarks, the programming interfaces to Polly as well as the connection between Polly and other LLVM-IR passes. At the end of this tutorial we expect the audience to not only be able to optimize their codes with Polly, but also to have a first understanding of how to use it as a framework to implement their own loop transformations.

Building, Testing and Debugging a Simple out-of-tree LLVM Pass

Serge Guelton (Quarkslab), Adrien Guinet (Quarkslab) [Slides] [Video]
This tutorial aims at providing solid ground to develop out-of-tree LLVM passes. It presents all the required building blocks, starting from scratch: cmake integration, llvm pass management, opt / clang integration. It presents the core IR concepts through two simple obfuscating passes: the SSA form, the CFG, PHI nodes, IRBuilder etc. We also take a quick tour on analysis integration through dominators. Finally, it showcases how to use cl and lit to parametrize and test the toy passes developed in the tutorial.

Lightning Talks Abstracts

Random Testing of the LLVM Code Generator

Bevin Hansson (SICS Swedish ICT) [Slides]
LLVM is a large, complex piece of software with many interlocking components. Testing a system of this magnitude is an arduous task. Random testing is an increasingly popular technique used to test complex systems. A successful example of this is Csmith, a tool which generates random, semantically valid C programs. We present a generic method to generate random but structured intermediate representation code. Our method is implemented in LLVM to generate random Machine IR code for testing the post-instruction selection stages of code generation.

ARCHER: Effectively Spotting Data Races in Large OpenMP Applications

Simone Atzeni (University of Utah), Ganesh Gopalakrishnan (University of Utah), Zvonimir Rakamaric (University of Utah), Dong H. Ahn (Lawrence Livermore National Laboratory), Ignacio Laguna (Lawrence Livermore National Laboratory), Martin Schulz (Lawrence Livermore National Laboratory), Gregory L. Lee (Lawrence Livermore National Laboratory) [Slides]
Although the importance OpenMP as a parallel programming model and its adoption in Clang/LLVM is increasing, existing data-race checkers have high overheads and generate many false positives. We propose ARCHER, a data race checker that achieves high accuracy and low overheads on large OpenMP applications. Built on top of LLVM/Clang and TSan, it uses scalable happens-before tracking and modular interfaces with OpenMP runtimes. ARCHER significantly outperforms TSan and Intel Inspector XE. It helped detect critical data races in the Hypre library.

Hierarchical Graph Coloring Register Allocation in LLVM

Aaron Smith (Microsoft Research) [Slides]
This talk presents a new register allocator for LLVM based on a hierarchical graph coloring approach. In this allocator, a program's control structure is represented as a tree of tiles, and a two-phase algorithm colors the tiles based on both local and global information. This talk describes the LLVM implementation and compares it to LLVM's existing greedy allocator.

Retargeting LLVM to an Explicit Data Graph Execution (EDGE) Architecture

Aaron Smith (Microsoft Research) [Slides]
This talk describes work on retargeting LLVM to an Explicit Data Graph Execution (EDGE) architecture, which combines von Neumann and dataflow execution models to provide out-of-order execution with in-order power efficiency. It explains the challenges of targeting an EDGE ISA with LLVM and compares the LLVM-based EDGE compiler to a Visual Studio-based toolchain.

Optimal Register Allocation and Instruction Scheduling for LLVM

Roberto Castañeda Lozano (SICS & Royal Institute of Technology (KTH)), Gabriel Hjort Blindell (Royal Institute of Technology (KTH)), Mats Carlsson (SICS), Christian Schulte (SICS & Royal Institute of Technology (KTH)) [Slides]
This talk presents Unison, a tool that solves register allocation and instruction scheduling simultaneously. Experiments using MediaBench and Hexagon show Unison can speed up LLVM-generated code by up to 30%. Fully integrated with LLVM's code generator, Unison allows for optimal code generation and evaluation of heuristics, trading compile time for code quality.

Towards fully open source GPU accelerated molecular dynamics simulation

Vedran Miletić (Heidelberg Institute for Theoretical Studies), Szilárd Páll (Royal Institute of Technology (KTH)), Frauke Gräter (Heidelberg Institute for Theoretical Studies) [Slides]
This talk discusses enabling GPU-accelerated molecular dynamics simulations using a fully open source OpenCL stack in GROMACS. The project improves AMDGPU LLVM backend and radeonsi Gallium compute stack to support required OpenCL features. It covers challenges encountered and collaboration with AMD developers working on LLVM.

CSiBE in the LLVM ecosystem

Gabor Ballabas (Department of Software Engineering, *University of Szeged*), Gabor Loki (Department of Software Engineering, *University of Szeged*) [Slides]
CSiBE is a code size benchmarking environment originally created for GCC and now gaining attention in IoT. This talk discusses modernizing CSiBE with a modular testbed, user interface, and LLVM support. The updated tool helps benchmark and test compilers such as Clang and Rust, offering insights into its community potential.

Posters Abstracts

ARCHER: Effectively Spotting Data Races in Large OpenMP Applications

Simone Atzeni (University of Utah), Ganesh Gopalakrishnan (University of Utah), Zvonimir Rakamaric (University of Utah), Dong H. Ahn (Lawrence Livermore National Laboratory), Ignacio Laguna (Lawrence Livermore National Laboratory), Martin Schulz (Lawrence Livermore National Laboratory), Gregory L. Lee (Lawrence Livermore National Laboratory)
Although the importance OpenMP as a parallel programming model and its adoption in Clang/LLVM is increasing (OpenMP 3.1 is now fully supported by Clang/LLVM 3.7), existing data-race checkers for OpenMP have high overheads and generate many false positives. In this work, we propose the first OpenMP data race checker, ARCHER, that achieves high accuracy and low overheads on large OpenMP applications. Built on top of LLVM/Clang and the ThreadSanitizer (TSan) dynamic race checker, ARCHER incorporates scalable happens-before tracking, and exploits structured parallelism via combined static and dynamic analysis, and modularly interfaces with OpenMP runtimes. ARCHER significantly outperforms TSan and Intel Inspector XE, while providing the same or better precision. It has helped detect critical data races in the Hypre library that is central to many projects at the Lawrence Livermore National Laboratory (LLNL) and elsewhere.

Design-space exploration of LLVM pass order with simulated annealing

Nicholas Timmons (Cambridge University), David Chisnall (Cambridge University)
We undertook an automated design space exploration of the optimisation pass order and inliner thresholds in Clang using simulated annealing. It was performed separately on multiple input programs so that the results could be validated against each other. Superior configurations to the preset optimisation levels were found, such as those which produce similar run times to the presets whilst reducing build times, and those which offer improved run-time performance than the '-O3' optimisation level. Contrary to our expectation, we also found that the preset optimisation levels did not provide a uniform distribution in the tradeoff space between run and build-time performance.

ConSerner: Compiler Driven Context Switches between Accelerators and CPUs

Ramy Gad (Johannes Gutenberg University), Tim Suess (University of Mainz), Andre Brinkmann (Johannes Gutenberg-Universität Mainz)
Computer systems provide different heterogeneous resources (e.g., GPUs, DSPs and FPGAs) that accelerate applications and that can reduce the energy consumption by using them. Usually, these resources have an isolated memory and a require target specific code to be written. There exist tools that can automatically generate target specific codes for program parts, so-called kernels. The data objects required for a target kernel execution need to be moved to the target resource memory. It is the programmers' responsibility to serialize these data objects used in the kernel and to copy them to or from the resource's memory. Typically, the programmer writes his own serializing function or uses existing serialization libraries. Unfortunately, both approaches require code modifications, and the programmer needs knowledge of the used data structure format. There is a need for a tool that is able to automatically extract the original kernel data objects, serialize them, and migrate them to a target resource without requiring intervention from the programmer.

Evaluation of State-of-the-art Static Checkers for Detecting Objective-C Bugs in iOS Applications

Thai San Phan (University of New South Wales), Yulei Sui (University of New South Wales)
The pervasive usage of mobile phone applications is now changing the way people use traditional software. Smartphone apps generated an impressive USD 35 billion in full-year 2014, and in total 138 billion apps were downloaded in the year. The last few years have seen an unprecedented number of people rushing to develop mobile apps. Apple iOS has always played a major role in the smart-devices industry ever since the evolution of it. On average, around 45,000 newly developed apps are submitted for release to the iTunes App Store in 2014. Similar as desktop software, any mobile applications are prone to bugs and it is difficult to completely make them bug-free. As a fundamental tool to help programmers effectively locate program defects during compile time, static analysis approximates the runtime behaviour of a program without actually executing it. It is extremely helpful to catch bugs earlier during software development cycle before the produced is shipped in order to avoid high maintenance cost. This poster is hence to evaluate the state-of-the-art static checkers for detecting Objective-C bugs to systematically investigate the advantages and disadvantages of using different checkers on a wide variety bug patterns in iOS applications.

Stack Size Estimation on Machine-Independent Intermediate Code for OpenCL Kernels

Stefano Cherubin (Politecnico di Milano), Michele Scandale (Politecnico di Milano), Giovanni Agosta (Politecnico di Milano)
Stack size is an important factor in the mapping decision when dealing with embedded heterogeneous architectures, where fast memory is a scarce resource. Trying to map a kernel onto a device with insufficient memory may lead to reduced performance or even failure to run the kernel. OpenCL kernels are often compiled just-in-time, starting from the source code or an intermediate machine-independent representation. Precise stack size information, however, is only available in machine-dependent code. We provide a method for computing the stack size with sufficient accuracy on machine-independent code, given knowledge of the target ABI and register file architecture. This method can be applied to make mapping decisions early, thus avoiding to compile multiple times the code for each possible accelerator in a complex embedded heterogeneous system.

AAP: The Compiler Writer's Architecture from hell

Simon Cook (Embecosm), Edward Jones (Embecosm), Jeremy Bennett (Embecosm)
Contending with the blistering pace of LLVM advancement is a challenge for out of tree targets. Many out of tree targets, often for widely used embedded processors, have hardware features which are not well represented by the mainstream LLVM project.

Automatic Identification of Accelerators for Hybrid HW-SW Execution

Georgios Zacharopoulos (University of Lugano), Giovanni Ansaloni (University of Lugano), Laura Pozzi (University of Lugano)
While the number of transistors that can be put on a chip significantly increases, as suggested by Moore's law, the dark silicon problem rises. This is due to the power consumption not dropping at a corresponding rate, which generates overheating issues. Accelerator-enhanced architectures can provide an efficient solution to this and lead us to a hybrid HW-SW execution, where computationally intensive parts can be performed by custom hardware. An automation of this process is needed, so that applications in high-level languages can be mapped to hardware and software directly. The process needs, first, an automatic technique for identifying the parts of the computation that should be accelerated, and secondly, an automated way of synthesising these parts onto hardware. Under the scope of this paper, we are focusing on the first part of this process, and we present the automatic identification of the most computationally demanding parts, also known as custom instructions. The state-of-the-art identification approaches have certain limitations, as custom instruction selection is mostly performed within the scope of single Basic Blocks. We introduce a novel selection strategy, implemented within the LLVM framework, that carries out identification beyond the scope of a single Basic Block and identifies Regions within the Control Flow Graph, as subgraphs of it. Specific I/O constraints and area occupation metrics are taken into consideration, in order to obtain Regions that would provide maximum speedup, under architectural constraints, when transferred to hardware. For our final experimentation and evaluation phase, kernels from the signal and image processing domain are evaluated, and promising initial results show that the identification technique proposed is often capable of mimicking manual designer decisions.

Static Analysis for Automated Partitioning of Single-GPU Kernels

Alexander Matz (Ruprecht-Karls University of Heidelberg), Christoph Klein (Ruprecht-Karls University of Heidelberg), Holger Fröning (Ruprecht-Karls University of Heidelberg)
GPUs have established themselves in the computing landscape, convincing users and designers by their excellent performance and energy efficiency. They differ in many aspects from general-purpose CPUs, for instance their highly parallel architecture, their thread-collective bulk-synchronous execution model, and their programming model. Their use has been pushed by the introduction of data-parallel languages like CUDA or OpenCL.

BoFs Abstracts

LLVM Foundation

LLVM Foundation board of directors [Slides]
This BoF will give the EuroLLVM attendees a chance to talk with some of the board members of the LLVM Foundation. We will discuss the Code of Conduct and Apache2 license proposal and answer any questions about the LLVM Foundation.

Compilers in Education

Roel Jordans (Eindhoven University of Technology), Henk Corporaal (Eindhoven University of Technology) [Slides]
While computer architecture and hardware optimization is generally well covered in education, compilers are still often a poorly represented subject. Classical compiler lecture series seem to mostly cover the front-end parts of the compiler but usually lack an in-depth discussion of newer optimization and code generation techniques. Important aspects such as auto-vectorization, complex instruction support for DSP architectures, and instruction scheduling for highly parallel for VLIW architectures are often touched only lightly. However, creating new processor designs requires a properly optimizing compiler in order for it to be usable by your customers. As such, there is a good market for well-trained compiler engineers which does not match with the classical style of teaching compilers in education.

Surviving Downstream

Paul Robinson (Sony Computer Entertainment America) [Slides]
We presented "Living Downstream Without Drowning" as a tutorial/BOF session at the US LLVM meeting in October. After the session, Paul had people coming to talk to him for most of the evening social event and half of the next day (and so missed several other talks!). Clearly a lot of people are in this situation and there are many good ideas to share.

Polly - Loop Optimization Infrastructure

Tobias Grosser (ETH), Johannes Doerfert (Saarland University), Zino Benaissa (Quic Inc.) [Slides]
The Polly Loop Optimization infrastructure has seen active development throughout 2015 with contributions from a larger group of developers located at various places around the globe. With three successful Polly sessions at the US developers meeting and larger interest at the recent HiPEAC conference in Prag, we expect various Polly developers to be able to attend EuroLLVM. To facilitate in-persona collaboration between the core developers and to reach out to the wider loop optimization community, we propose a BoF session on Polly and the LLVM loop optimization infrastructure. Current hot topics are the usability of Polly in an '-O3' compiler pass sequence, profile driven optimizations as well as the definition of future development milestones. The Polly developers community will present ideas on these topics, but very much invites input from interested attendees.

LLVM on PowerPC and SystemZ

Ulrich Weigand (IBM) [Slides]
This Birds of a Feather session is intended to bring together developers and users interested in LLVM on the two IBM platforms PowerPC and SystemZ.

How to make LLVM more friendly to out-of-tree consumers?

David Chisnall (Computer Laboratory, University of Cambridge) [Slides]
LLVM has always had the goal of a library-oriented design. This implicitly assumes that the libraries that are parts of LLVM can be used by consumers that are not part of the LLVM umbrella. In this BoF, we will discuss how well LLVM has achieved this objective and what it could do better. Do you use LLVM in an external project? Do you track trunk, or move between releases? What has worked well for you, what has caused problems? Come along and share your experiences.

Code of Conduct

The LLVM Foundation is dedicated to providing an inclusive and safe experience for everyone. We do not tolerate harassment of participants in any form. By registering for this event, we expect you to have read and agree to the LLVM Code of Conduct.

Contact

To contact the organizer, email Vladimir Subotic.