Suggested PhD Projects

These projects are merely suggestions and illustrate the broad themes of the research group. Potential research students are encouraged and welcome to produce their own suggestions in these research areas or other research areas in the field. All applicants are invited to contact the academic associated with the project when making an application.

A tool for well-behaved APIsContact: Laura Bocchi

In this project you will develop a well founded tool to support the design and development of Web APIs, and REST APIs in particular. Common specifications for Web APIs (e.g., using the OpenAPI standard) only describe which resources are exposed by a server, but not the order in which they need to be executed. In many real-world scenarios, a server may need to establish and enforce an ordering for calls to the APIs it provides. For example one may want to enforce that “payment” happens after “authentication” and before “shipment”. These causalities are, at present, expressed in natural language in the documentation of an API, making their understanding and implementation error prone. This project has a theoretical component and a practical component (the balance is negotiable depending on the applicant’s skills and interests). On the theoretical side, you will extend existing API specification with behavioural specifications. One open problem is to ensure the correct composition of behavioural specifications pertaining different aspects (e.g., security, application logic). On the practical side, you will develop a tool that uses behavioural specifications to automatically generate of stubs and test suites, to ensure and verify that the desired causalities are met.

Cost-Aware Behavioural Types for Predictable Communicating Systems

Contact: David Castro-Perez

In this project, you will develop theories and tools for statically guaranteeing that concurrent and distributed systems can run within some specified resource bounds. This work will be based on Behavioural Types, a typing discipline that classifies programs not just in terms of the result of their computations, but also in terms of their allowed sequences of interactions. Behavioural types have been successfully used for verifying functional (e.g. correctness) properties of concurrent and distributed systems. This project will aim at augmenting theories and tools based on behavioural types, to also guarantee extra-functional properties (e.g. cost, or resources needed to run a system) of communicating systems.

Certified Domain Specific Languages for Concurrent/Distributed Systems

Contact: David Castro-Perez

This project is about mechanising domain specific languages for implementing and/or verifying communicating systems in proof assistants. There are currently many theories and tools that can be used for guaranteeing that communicating systems behave according to some specification. But can we trust those theories and tools? There are examples in the literature of subtle changes to a theory that have introduced inconsistencies. The possibility of such errors threatens to invalidate large bodies of work (e.g. extensions of the inconsistent theory, or tools based on it). The aim of this project is to build certified languages and tools for implementing and/or verifying concurrent and distributed systems. This work will be incorporated to our existing framework in the Coq proof assistant.

Omniscient Debugging of Haskell Programs

Contact: Olaf Chitil

An omniscient debugger records a trace during program execution and provides a viewing tool to navigate though the execution backwards and forwards. This project aims to build an omniscient debugger for Haskell programs that provides free navigation through the execution along different dependencies, in particular to follow the origin of any (incorrect) part of a data structure. The new debugger will build on some earlier debuggers for Haskell: Hat, Hoed and HatLight. However, all these debuggers were separate of the compiler. In contrast, the new debugger shall follow the existing lightweight stepping debugger in the Glasgow Haskell interpreter and be implemented as a modification of the interpreter. Thus it can take advantage of accessing the virtual machine and working together with untraced compiled code. In practice it is important for debugged programs to consist of both traced and untraced modules for several reasons: It limits the size of the trace, the untraced (majority) of the code executes at normal speed and it allows inclusion of modules in unsupported languages. The integration into the interpreter will also resolve the problems of Hat, Hoed and HatLight with some Haskell language features and keeping up with continuous language changes, especially the type system.

Delta Debugging of Static Errors in Rust

Contact: Olaf Chitil

Delta debugging is a method for determining parts of a program (or program input) that are most relevant for a given observed bug. So it is a method that helps localising the cause of a bug. Delta debugging systematically checks variants of a program (or program input) for the presence of the bug. Delta debugging has been mostly used to localise in a program the cause of a runtime error; my former PhD student Jo Sharrad applied delta debugging to localise the cause of a type error in a Haskell (or OCaml) program. The system programming language Rust has a rich type system to eliminate many classes of memory-safety and thread-safety bugs at compile-time. Hence its constraint-based type system is complex and actually still evolving. Compiler error messages can be difficult to understand. The hypothesis of this research project is that delta debugging can help the Rust programmer to understand why the compiler rejects a specific program.

Language Implementations and Benchmarking for Million Lines of Code

Contact: Stefan Marr

A growing number of companies develop applications that have millions of lines of code. Many are too large to be compiled and optimized with traditional techniques. Their size itself and being updated multiple times per hour brings whole new classes of problems for efficiently executing such applications.Novel research is needed to better understand how such applications behave, how we can optimize their execution with interpreters and compilers, and how we can create benchmarks that can be freely shared with other researchers, possibly using AI and Large Language Models to generate representative code.

The Future of Python: Bringing a Language and Community into a Threaded World

Contact: Stefan Marr

Python’s community decided to investigate how to transition the Python language to support shared-memory multithreading.From languages such as Java and C++, we know that this is a highly complex undertaking and unfortunately, there is no good understanding for how Python code relies on CPython’s Global Interpreter Lock. We need novel research to better understand how people write code for and rely on CPython’s implicit semantics to enable the community to design and implement a programming model that allows the community to transition to a threaded world without leaving large parts behind. This project could for instance focus on mining Python code to understand better what is needed or how to make existing code safe for a thread world, or propose and evaluate new concurrency abstractions or implementation techniques to improve Python implementations in the long term.

Unifying Structural Operational Semantics and Recursion Schemes in Guarded Domain Theory for Advanced Operational Semantics

Contact: Marco Paviotti

This Ph.D. topic aims to integrate and advance the theoretical foundations of synthetic guarded domain theory by exploring the intersections of step-indexing, bialgebras, and recursion schemes. The foundational papers by Birkedal et al. (2012), Paviotti et al. (2015), and Møgelberg and Paviotti (2016) provide a starting point for modeling recursion using guarded recursion. Building upon this groundwork, the research will investigate the role of bialgebras in providing a coherent framework for guarded structural operational semantics, drawing from Klin’s introduction to bialgebras for structural operational semantics (2011) and Turi and Plotkin’s work on mathematical operational semantics (1997). Additionally, the study will explore recursion schemes as a unifying concept for structured recursion, inspired by the works of Hinze, Wu, and Gibbons (2013, 2015). The goal is to develop conjugate hylomorphisms and other advanced recursion schemes within the context of synthetic guarded domain theory, aiming for a comprehensive understanding of structured recursion. The proposed research seeks to bridge the gap between these diverse areas, aiming to establish a unified framework that leverages the strengths of step-indexing, bialgebras, and recursion schemes to enhance the precision and expressiveness of operational semantics. The outcomes of this research are expected to contribute not only to the theoretical foundations of programming languages but also to the practical development of more robust and efficient programming language features, tools, and methodologies. If interested take a look at my PhD statement as well for more up-to-date information about my supervision style.

Cost analysis of higher-order probabilistic programs

Contact: Vineet Rajani

Randomness plays an important role in a variety of approaches used in the theory of machine learning. For example, algorithms like the stochastic gradient descent and prediction with experts advice make crucial use of randomness to make decisions that minimizes the expected loss of the algorithm (either absolute or relative to the best decision in hindsight). This is often done by giving upper-bounds on such an expected loss. However, computing and verifying such bounds can be both tedious and error-prone, often requiring principles of cost amortization along with probabilistic reasoning. As a result, there is a need for a logical framework that can be used to formally verify such bounds. The goal of this project is to develop both the theory and practice of such a logical framework. In particular, we will build a higher-order probabilistic language with a type system s.t. well typed programs in this language will always adhere to their expected cost bounds.

Type-based verification for differential privacy

Contact: Vineet Rajani

Enormous amount of data is collected every day, this data could be extremely useful in making several automated decisions. However, often this data cannot be released/used because of privacy concerns. Prior work has already shown that simple aggregation and anonymization is not sufficient to prevent privacy loss of individuals. Differential privacy is a promising approach to this problem which offers a statistical guarantee to an individual’s privacy. Intuitively, a mechanism is differentially private if the probability of obtaining a result with (or without) an individuals data is almost the same. This limits the amount of information that can learned about that individual. The aim of this project is to develop a type-theoretic framework for analysing differential privacy. We will use ideas from cost analysis and information flow control, two well studied but very different domains in formal verification.

Fine-grained parallelism in compilation

Contact: Michael Vollmer

Often software engineers do not simply build lone applications, they build entire software ecosystems. Companies like Google and Facebook rely on a single, massive code repository (monorepo) for the company’s source code, with the implication that a single change may trigger all downstream code to rebuild and retest. Similarly, open source software package collections like NixOS rely on build farms that frequently rebuild the full set of packages. In order to make this practical, the compilation must be parallel. The most common way to make software build jobs parallel is to process different source code files in parallel whenever possible (this is what “make -j” does), but this coarse-grained parallelism is limited, as big parallel build jobs end up spending a lot of their time bottlenecked on sequential compilation tasks. Ultimately, we need to go further: we need both coarse-grained (between files) and fine-grained (within a file) parallelism. To achieve the latter, the compiler passes themselves must be made parallel, and that’s what this project is about. Taking on this project will require some background in parallel or concurrent programming, and ideally some experience with compilers or programming language implementation.

Using types to specify precise, byte-level data representation of objects in programming languages

Contact: Michael Vollmer

It is common for high-level programming languages to have some mechanisms for allocating objects in memory, and then for connecting and organising these objects into larger structures. For example, languages like C++ and Java support *classes*, and languages like Haskell and ML support *algebraic data types*. Of those four languages, only C++ offers any significant control over the actual memory layout of objects or how and when they are allocated and de-allocated. Unlike C++, the other languages promise *type safety*, and so, the conventional wisdom goes, they must therefore disallow fine-grained control over details of memory representation. However, safe systems programming languages like Rust have shown that it is practical to guarantee type safety while still exposing some low-level details of memory usage. This project aims to continue along those lines and explore how a high-level programming language might give programmers control over the *exact, byte-level representation* of data in their programs while using types to ensure that the resulting program is *memory-safe*. Taking on this project will require some exposure to low-level systems programming in either C/C++ or Rust, as well as an interest in either programming language design or compiler implementation.

The semantics of concurrent systems

Contact: Mark Batty

The team here at Kent is working to understand the behaviour of modern concurrent systems, and we have exposed — perhaps surprisingly — that humanity does not understand the concurrent behaviour of mainstream computer systems very well at all, yet. We have a broad goal of verifying code written above a typical stack comprising mainstream CPU (ARM, x86), mainstream compiler (GCC, LLVM) and mainstream operating system (Linux). Our work has found and fixed bugs in this stack, in particular making changes to the International Standard Organisation’s C and C++ standards. We have found major problems in optimised concurrent language design (C, C++, Java) that have remained an open for a decade, but we are well on our way to a solution, albeit with many things left to do. We use various methods in this work (typically one person has one or two abilities listed here): (1) We write code to simulate the concurrency designs expressed in specifications. (2) We write mathematical expressions of these designs. (3) We use hand proof and mechanised proof (Coq/HOL/Isabelle) to establish valuable properties of the designs. (4) We reverse engineer parts of the system to understand aspects that are described incompletely by specifications. (5) We build logics for reasoning above the systems we have, rather than above idealised models. (6) We dig into the internals of compilers. (7) We work with prototype hardware designs from project partners like ARM, to help understand their upcoming designs. We are looking for new PhD students who would like to contribute to this project. We have the capacity to build skills that you don’t yet have, but we hope that you will be driven to understand the true behaviour of modern computer systems.

Multi-stage semantics and typing via graded modalities

Contact: Dominic Orchard

Staged computation (otherwise known as meta programming) is a powerful technique for writing programs that generate programs. This might be in a simple macro language, or a more involved compile-time code generator, or even a multi-stage pipeline of multiple languages or sublanguages, including JIT. It turns out that intuitionistic modal S4 logic generates a useful type system for staged computations (through a Curry-Howard like isomorphism), detailed in the work of Davies and Pfenning (A modal analysis of staged computation, 2001). The aim of this project is to generalise their approach to graded modalities, capturing truly multi-stage programs: not just 2 stages, but n-stages, with their interactions controlled and described by a graded modal type system. This approach can then be combined with other kinds of graded modality to explain the role of data across multiple levels of a staged compilation pipeline.

Programming Languages and Systems Research Group

Featured story

A tool for well-behaved APIsContact: Laura Bocchi

Cost-Aware Behavioural Types for Predictable Communicating Systems

Contact: David Castro-Perez

Certified Domain Specific Languages for Concurrent/Distributed Systems

Contact: David Castro-Perez

Omniscient Debugging of Haskell Programs

Contact: Olaf Chitil

Delta Debugging of Static Errors in Rust

Contact: Olaf Chitil

Language Implementations and Benchmarking for Million Lines of Code

Contact: Stefan Marr

The Future of Python: Bringing a Language and Community into a Threaded World

Contact: Stefan Marr

Unifying Structural Operational Semantics and Recursion Schemes in Guarded Domain Theory for Advanced Operational Semantics

Contact: Marco Paviotti

Cost analysis of higher-order probabilistic programs

Contact: Vineet Rajani

Type-based verification for differential privacy

Contact: Vineet Rajani

Fine-grained parallelism in compilation

Contact: Michael Vollmer

Using types to specify precise, byte-level data representation of objects in programming languages

Contact: Michael Vollmer

The semantics of concurrent systems

Contact: Mark Batty

Multi-stage semantics and typing via graded modalities

Contact: Dominic Orchard