This repository contains an up-to-date list of theses topics I (Peter Thoman)
am currently interested in supervising as part of the CS bachelor and master programs at UIBK.
Contact me if you are interested in any of these.
Note that, while these topics are grouped into Bachelor and Master levels, it's certainly possible for sufficiently motivated bachelor students to attempt theses at the Master level. The opposite may also be possible if you have ideas on how to substantially extend a given topic.
- Bachelor Thesis Topics
- Reverse-engineering Thread Communication Patterns in Windows Binaries/Games
- Adding Dependency Tracking and Visualization to the Tracy Profiler
- Intel Level-0 Backend for Celerity
- AMD ROCM Backend for Celerity
- A CUDA Interoperability API for Celerity
- SimSYCL: A SYCL Implementation for Development, Testing and Simulation
- Master Thesis Topics
On the Windows operating system, applications are frequently distributed as binaries, with no access to source code provided. With modern CPUs frequently featuring multiple groups of hardware cores with distinct interconnect latencies, understanding the communication patterns between threads in such binaries can be crucial for optimizing their performance via automated thread placement.
The goal of this thesis is to develop a tool which can analyze the communication patterns between threads in a Windows binary, and ideally provide a visualization -- or at least textual representation -- of these patterns. The tool should be evaluated on a set of common Windows applications, and in particular games, which are known to be sensitive to thread placement.
The basic idea is to intercept various Windows API calls related to the creation of threads and synchronization primitives, provide wrapper structures for these, and then track their use in order to infer communication patterns. While very different in approach, this earlier thesis on causal game profiling can serve as a reference for some injection aspects.
Prerequisites:
- Solid C and C++ knowledge
- Some experience with Windows API programming is advantageous
- Some experience with reverse-engineering is advantageous
Tracy is a modern hybrid tracing/sampling real-time profiler. It captures a timeline of zones, events and additional information which can be inspected in detail in a high-performance UI easily capable of handling millions of events.
For our use case in the Celerity project, it would be very useful to have a way to track dependencies between different zones and events, and allow their visualization and analysis in the Tracy UI.
The goals of this thesis are to design an API to provide dependency information to Tracy from the application, add it to the trace data, and implement a visualization for it in the Tracy UI. Special care should be taken to ensure that the generally high performance of Tracy is not compromised by the new features, and they should be pay-for-what-you-use as much as possible. Ideally, this new feature should be integrated into the upstream Tracy project.
Prerequisites:
- Solid C and C++ knowledge
- An understanding of the basics of application profiling
- Some experience with UI development is advantageous
Celerity is a C++ API and runtime system for programming multi-GPU applications, primarily for GPU clusters. It is based closely on the industry-standard Khronos SYCL API.
It features a basic, general SYCL execution backend, and a more optimized (but vendor-specific) CUDA backend. The goal of this thesis is to implement, test and evaluate a new Intel Level-0 backend for Celerity. Primarily, this will require work platform-specific features such as strided data transfers.
Development and testing infrastructure will be made available by the DPS group.
Prerequisites:
- Solid C++ knowledge
- Some prior understanding of GPU/accelerator computing is advantageous, but not strictly necessary
Celerity is a C++ API and runtime system for programming multi-GPU applications, primarily for GPU clusters. It is based closely on the industry-standard Khronos SYCL API.
It currently features a basic, general SYCL execution backend, and a more optimized (but vendor-specific) CUDA backend. The goal of this thesis is to implement, test and evaluate a new AMD ROCM backend for Celerity. Primarily, this will require work platform-specific features such as strided data transfers.
Development and testing infrastructure will be made available by the DPS group.
Prerequisites:
- Solid C++ knowledge
- Some prior understanding of GPU/accelerator computing is advantageous, but not strictly necessary
Celerity is a C++ API and runtime system for programming multi-GPU applications, primarily for GPU clusters. It is based closely on the industry-standard Khronos SYCL API.
SYCL features an interoperability API which enables calling an underlying, usually more hardware- or vendor-specific API (such as CUDA) from a SYCL program. Celerity does not currently feature a similar API.
The goal of this thesis is to identify the additional requirements for an effective interoperability API in the Celerity case (where the runtime system controls data placement), define this API, and provide an implementation for CUDA. The project should be evaluated on a set of test codes which call various common third party APIs (e.g. CuBLAS) from whithin a Celerity program.
Prerequisites:
- Solid C++ knowledge
- Some prior understanding of GPU/accelerator computing or CUDA is advantageous, but not strictly necessary
The Khronos SYCL API has several implementations, but they are primarily designed for high performance on specific hardware. They often require heavy compilation infrastructure, which increases development times, and their performance focus can make debugging harder. SimSYCL is focused on fast compilation, development and testing. It functions as a library-only, single-threaded implementation of the SYCL standard with minimal dependencies.
The goal of this thesis is to extend the SYCL standard coverage of SimSYCL, and implement additional features to support development use cases. The specific features can be discussed in person.
Prerequisites:
- Solid C++ knowledge
- Some prior understanding of GPU/accelerator computing is advantageous, but not strictly necessary
Tracy is a modern hybrid tracing/sampling real-time profiler. It captures a timeline of zones, events and additional information which can be inspected in detail in a high-performance UI easily capable of handling millions of events.
Currently, Tracy is primarily designed for profiling single processes running on one machine. The goal of this thesis is to design and implement improved support for distributed memory applications in Tracy. This could be achieved either by a full redesign which handles data from multiple nodes in a single UI instance, or by providing a better integration for inspecting several communicating processes in individual instances.
Prerequisites:
- Solid C and C++ knowledge
- An understanding of the basics of application profiling
- Some experience with distributed memory programming (i.e. HPC)
Celerity is a C++ API and runtime system for programming multi-GPU applications, primarily for GPU clusters. It is based closely on the industry-standard Khronos SYCL API.
The goal of this thesis is to design, develop and evaluate a plugin for the Clang/LLVM compiler infrastructure which identifies common error patterns. Some of these patterns can be recognized at the library level, which Celerity attempts (e.g. warnings about specific types of capturing), however, others require more static analysis.
Examples:
- Mismatches between the specified access mode of buffers and the actual accesses performed
- Detect invalid reference captures in host command groups
Prerequisites:
- Solid C and C++ knowledge
- Some experience with compiler development is advantageous
Sylkan is a prototype implementation of the Khronos SYCL API targeting any device which supports Vulkan compute.
The goal of this thesis is to update Sylkan towards newer versions of the SYCL standard, while also developing support for a variety of missing features. The most important of these is local storage, which should be mapped to the equivalent Vulkan features.
A comprehensive evaluation across several different target hardware platforms should also be performed, with a particular focus on comparing the performance to existing SYCL implementations on platforms which support both Vulkan and other SYCL backends.
Development and testing infrastructure will be made available by the DPS group.
Prerequisites:
- Solid C++ knowledge
- Some prior understanding of GPU/accelerator computing is advantageous
- Some experience with compiler development is advantageous
In the Celerity system, programmers are required to specify the data dependencies of compute kernels via a mechanism known as Range Mappers. While some higher-level abstractions exist to simplify this task, it still, in principle, represents a duplication of information which already exists within the computation itself -- and like any such duplication, it introduces a productivity overhead in development and maintenance.
The goal of this thesis is to research, design and implement an algorithm which statically analyzes kernels, and implicitly derives the required range mappers from this analysis. Note that the expectation is that this will only be possible for a subset of kernels fulfilling certain regularity criteria.
Prerequisites:
- Solid C and C++ knowledge
- Some experience with compiler development is advantageous