Google Summer of Code Projects

Table of Contents

GSoC 2024 Projects

Welcome prospective Google Summer of Code 2024 Students! This document is your starting point to finding interesting and important projects for LLVM, Clang, and other related sub-projects. This list of projects is not only developed for Google Summer of Code, but open projects that really need developers to work on and are very beneficial for the LLVM community.

We encourage you to look through this list and see which projects excite you and match well with your skill set. We also invite proposals not on this list. More information and discussion about GSoC can be found in discourse. If you have questions about a particular project please find the relevant entry in discourse, check previous discussion and ask. If there is no such entry or you would like to propose an idea please create a new entry. Feedback from the community is a requirement for your proposal to be considered and hopefully accepted.

The LLVM project has participated in Google Summer of Code for several years and has had some very successful projects. We hope that this year is no different and look forward to hearing your proposals. For information on how to submit a proposal, please visit the Google Summer of Code main website.

Remove undefined behavior from tests

Description:

Many of LLVM’s unit tests have been reduced automatically from larger tests. Previous-generation reduction tools used undef and poison as placeholders everywhere, as well as introduced undefined behavior (UB). Tests with UB are not desirable because 1) they are fragile since in the future the compiler may start optimizing more aggressively and break the test, and 2) it breaks translation validation tools such as Alive2 (since it’s correct to translate a function that is always UB into anything).

The major steps include:

  1. Replace known patterns such as branch on undef/poison, memory accesses with invalid pointers, etc with non-UB patterns.
  2. Use Alive2 to detect further patterns (by searching for tests that are always UB).
  3. Report any LLVM bug found by Alive2 that is exposed when removing UB.
Expected Result: The majority of LLVM’s unit tests will be free of UB.
Skills: Experience with scripting (Python or PHP) is required. Experience with regular expressions is encouraged.
Project Size: Either medium or large.
Difficulty: Medium
Mentors:

Automatically generate TableGen file for SPIR-V instruction set

Description:

The existing file that describes the SPIR-V instruction set in LLVM was manually created and is not always complete or up to date. Whenever new instructions need to be added to the SPIR-V backend, the file must be amended. In addition, since it is not created in a systematic way, there are often slight discrepancies between how an instruction is described in the SPIR-V spec and how it is declared in the TableGen file.

This project proposes creating a script capable of generating a complete TableGen file that describes the SPIR-V instruction set given the JSON grammar available in the KhronosGroup/SPIRV-Headers repository, and updating SPIR-V backend code to use the new definitions.

Expected Result:
  1. The SPIR-V instruction set’s definition in TableGen is replaced with autogenerated content.
  2. A script and documentation are provided to regenerate definitions from the JSON grammar.
  3. SPIR-V backend is updated to use new autogenerated definitions.
Skills: Experience with scripting and intermediate knowledge of C++. Previous experience with LLVM/TableGen is a bonus but not required.
Project Size: Medium (175 hour)
Difficulty:

LLVM bitstream integration with CAS (content-addressable storage)

Description: The LLVM bitstream file format is used for serialization of intermediate compiler artifacts, such as LLVM IR or Clang modules. This project aims to integrate the LLVM CAS library into the LLVM bitstream file format by factoring out frequently duplicated parts of bitstream files into separate CAS objects, reducing storage requirements.
Expected Result: There’s a way to configure the LLVM bitstream writer/reader to use CAS as the backing storage.
Skills: Intermediate knowledge of C++, familiarity with data serialization, and self-motivation.
Project Size: Medium or large
Difficulty:

Add 3-way comparison intrinsics

Description: 3-way comparisons return -1, 0, or 1 based on whether values are lower, equal, or greater. The goal of this project is to implement new 3-way comparison intrinsics and improve optimization results by implementing legalization/expansion support in LLVM’s backend and integrating them into the clang and rustc frontends.
Expected Result: Full support for intrinsics in backend and optimization passes, ideally with frontend integration.
Skills: Intermediate knowledge of C++.
Project Size: Medium or large
Difficulty: Medium

Improve the LLVM.org Website Look and Feel

Description: The llvm.org website serves as the central hub for information about the LLVM project. Over time, the website has evolved organically, prompting the need for a redesign to enhance its modernity, structure, and ease of maintenance. This project aims to create a modern static website that improves navigation, taxonomy, and usability, reflecting the essence of LLVM.org.
Expected Result: A modern, coherent website that improves navigation, content discoverability, mobile support, and accessibility. The project will involve community engagement to gather feedback and ensure a successful implementation.
Skills: Knowledge in web development with static site generators, HTML, CSS, Bootstrap, and Markdown.
Project Size: Large
Difficulty: Hard

Out-of-process execution for clang-repl

Description: The Clang compiler supports various languages such as C, C++, ObjC, and ObjC++. Clang-Repl is an efficient interpreter that makes the C++ language more user-friendly, using the Orcv2 JIT infrastructure within the same process. However, this design has two significant drawbacks: it can’t be used on devices with limited resources, and crashes in user code crash the entire process. This project aims to move Clang-Repl to an out-of-process execution model to address these issues.
Expected Result: Implement out-of-process execution of statements with Clang-Repl. Demonstrate support for some ez-clang use cases. Research restart/continue approaches upon crashes. Stretch goal: Design versatile reliability approach for crash recovery.
Skills: Intermediate knowledge of C++, Understanding of LLVM and the LLVM JIT in particular.
Project Size: Either medium or large.
Difficulty: Medium

Support clang plugins on Windows

Description: Clang supports extending the compiler with plugins, enabling extra user-defined actions during compilation. While plugins work on Unix and Darwin, they are not supported on Windows due to platform differences. This project aims to expose the participant to a broad cross-section of the LLVM codebase, classifying APIs and implementing changes to support plugins on Windows.
Expected Result: Implement clang plugin support on Windows. Extend the working prototype and the annotation tool.
Skills: Intermediate knowledge of C++, experience with Windows compilation and linking model.
Project Size: Either medium or large.
Difficulty: Medium

On Demand Parsing in Clang

Description: Clang currently parses a sequence of characters as they appear, linearly. However, most end-user code only uses a small portion of the entire translation unit. This project proposes an on-demand parsing approach where heavy compiling C++ entities are processed only when required. This approach aims to reduce peak memory usage and improve compile times for sparse translation units.
Expected Result: Design and implement on-demand compilation for non-templated functions and classes. Run performance benchmarks on relevant codebases and prepare a report. Prepare a community RFC document. Stretch goal: Support templates.
Skills: Knowledge of C++, deeper understanding of Clang, Clang AST and Preprocessor.
Project Size: Large
Difficulty: Hard

Improve Clang-Doc Usability

Description: Clang-Doc is a C/C++ documentation generation tool created as an alternative to Doxygen. While it can generate documentation in Markdown and HTML, it has usability issues, lacks support for some constructs, and doesn’t scale well for large codebases. This project aims to improve Clang-Doc to the point where it can be used to generate documentation for large projects like LLVM.
Expected Result: Improve usability of Clang-Doc and resolve existing limitations. Enable Clang-Doc to generate documentation for large codebases like LLVM.
Skills: Experience with web technologies (HTML, CSS, JS), intermediate knowledge of C++. Experience with Clang/LibTooling is a bonus.
Project Size: Either medium or large.
Difficulty: Medium

Rich Disassembler for LLDB

Description: This project aims to annotate LLDB’s disassembler output with the location and lifetime of source variables using variable location information from debug info. This rich disassembler output should be exposed as structured data and made available through LLDB’s scripting API. In a terminal, LLDB should render the annotations as text.
Expected Result: Produce annotated disassembly showing variable lifetime and location. Expose rich disassembler output through LLDB’s scripting API.
Skills:

Required:

  • Good understanding of C++
  • Familiarity with using a debugger on the terminal
  • Familiarity with assembler dialects for machine code (x86_64 or AArch64)

Desired:

  • Compiler knowledge, including data flow and control flow analysis
  • Experience navigating debug information (DWARF)
Project Size: Medium (~175h)
Difficulty: Hard

GPU Delta Debugging

Description: LLVM-reduce and similar tools perform delta debugging but are less useful if many implicit constraints exist. This project aims to develop a GPU-aware version, especially for execution time bugs, that can be used in conjunction with LLVM/OpenMP GPU-record-and-replay or a GPU loader script to minimize GPU test cases more efficiently and effectively.
Expected Result: A tool to reduce GPU errors without losing the original error. Optionally, other properties could also be the focus of the reduction.
Skills: Good understanding of C++, familiarity with GPUs and LLVM-IR.
Project Size: Medium
Difficulty: Medium

Offloading libcxx

Description: Modern C++ defines parallel algorithms as part of the standard library. This project aims to extend the implementation of these algorithms using OpenMP, including GPU offload where reasonable. The goal is to explore different algorithms and options for executing them on both the host and accelerator devices, especially GPUs, automatically via OpenMP.
Expected Result: Improvements to the prototype support of offloading in libcxx. Evaluations against other offloading approaches and documentation on missing parts and shortcomings.
Skills: Good understanding of C++ and C++ standard algorithms, familiarity with GPUs and (OpenMP) offloading.
Project Size: Large
Difficulty: Medium

The 1001 thresholds in LLVM

Description: LLVM has many thresholds and flags to avoid costly cases, but it’s unclear if these thresholds are useful or their impact. This project aims to explore these thresholds, identify when they are hit, and assess how we should select their values and whether different profiles are needed.
Expected Result: Statistical evidence on the impact of various thresholds inside LLVM’s codebase, including compile time changes, impact on transformations, and performance measurements.
Skills: Profiling skills and knowledge of statistical reasoning.
Project Size: Medium
Difficulty: Easy

Performance tuning the GPU libc

Description: Work has begun on a libc library targeting GPUs, allowing users to call functions like malloc or memcpy while executing on the GPU. The goal is to benchmark the implementations of certain libc functions on the GPU and write more optimal implementations.
Expected Result: In-depth performance for libc functions. Overhead of GPU-to-CPU remote procedure calls. More optimal implementations of ’libc’ functions.
Skills: Profiling skills and understanding of GPU architecture.
Project Size: Small
Difficulty: Easy

Improve GPU First Framework

Description: GPU First is a methodology and framework that enables existing host code to execute on a GPU without user modifications. The project aims to port host code to handle RPC and explore support for MPI among multiple thread blocks on a single GPU or multiple GPUs.
Expected Result: A more efficient GPU First framework that supports both NVIDIA and AMD GPUs. Optionally, upstream the framework.
Skills: Good understanding of C++ and GPU architecture, familiarity with GPUs and LLVM IR.
Project Size: Medium
Difficulty: Medium

Compile GPU kernels using ClangIR

Description: The ClangIR project aims to establish a new intermediate representation (IR) for Clang built on top of MLIR. This project focuses on identifying and implementing missing features in ClangIR to compile GPU kernels in OpenCL C language to LLVM-IR for the SPIR-V target.
Expected Result: Polybench-GPU’s 2DCONV, GEMM, and CORR OpenCL kernels can be compiled with ClangIR to LLVM-IR for SPIR-V.
Skills: Intermediate C++ programming skills and familiarity with basic compiler design concepts are required. Prior experience with LLVM IR, MLIR, Clang, or GPU programming is a plus.
Project Size: Large
Difficulty: Medium

Half precision in LLVM libc

Description: Half precision is an IEEE 754 floating-point format standardized as _Float16 in C23. The goal of this project is to implement C23 half precision math functions in the LLVM libc library.
Expected Result: Setup generated headers for various compilers and architectures. Implement basic math operations for half precision data types. Implement optimizations using compiler builtins or hardware instructions. Investigate higher math functions for half precision if time permits.
Skills: Intermediate C++ programming skills and familiarity with basic compiler design concepts are required. Prior experience with LLVM IR, MLIR, Clang, or GPU programming is a plus.
Project Size: Large
Difficulty: Easy/Medium