These projects are merely suggestions and illustrate the broad themes of the research group. Potential research students are encouraged and welcome to produce their own suggestions in these research areas or other research areas in the field. All applicants are invited to contact the academic associated with the project when making an application.
A well founded tool for well-behaved Web APIs
Contact: Laura Bocchi
In this project you will develop a well founded tool to support the design and development of Web APIs, and REST APIs in particular. Common specifications for Web APIs (e.g., using the OpenAPI standard) only describe which resources are exposed by a server, but not the order in which they need to be executed. In many real-world scenarios, a server may need to establish and enforce an ordering for calls to the APIs it provides. For example one may want to enforce that “payment” happens after “authentication” and before “shipment”. These causalities are, at present, expressed in natural language in the documentation of an API, making their understanding and implementation error prone. This project has a theoretical component and a practical component (the balance is negotiable depending on the applicant’s skills and interests). On the theoretical side, you will extend existing API specification with behavioural specifications. One open problem is to ensure the correct composition of behavioural specifications pertaining different aspects (e.g., security, application logic). On the practical side, you will develop a tool that uses behavioural specifications to automatically generate of stubs and test suites, to ensure and verify that the desired causalities are met.
Modular Omniscient Debugging of Haskell Programs
Contact: Olaf Chitil
An omniscient debugger records a trace during program execution and provides a viewing tool to navigate though the execution backwards and forwards. This project aims to build a truly omniscient debugger for Haskell programs that provides free navigation through the execution along different dependencies, in particular to follow the origin of any (incorrect) part of a data structure. The debugger shall be modular in two senses: 1) The debugger will be separate from the compiler and virtual machine. Thus it is independent of the evolution of these, e.g. the Glasgow Haskell compiler. 2) Debugged programs will consist of both traced and untraced modules for several reasons: It limits the size of the trace, the untraced (majority) of the code executes at normal speed and it allows inclusion of modules in unsupported languages. The latter requires that untraced modules are used unchanged, without any instrumentation. The new debugger will combine methods from earlier debuggers, Hat, Hoed and HatLight. In particular, substantial parts of Hat’s implementation can be reused. Hat’s (https://archives.haskell.org/projects.haskell.org/hat/) instrumentation of all program modules, including untraced modules, is quite complex. Hence it was never extended beyond a subset of Haskell 98. In contrast Hoed’s instrumentation is much simpler, untraced modules are unchanged, but the trace is limited to the algorithmic debugging method. Finally, HatLight is a small prototype recording Hat’s trace using Hoed’s instrumentation method.
Delta Debugging of Static Errors in Rust
Contact: Olaf Chitil
The Delta debugging is a method for determining parts of a program (or program input) that are most relevant for a given observed bug. So it is a method that helps localising the cause of a bug. Delta debugging systematically checks variants of a program (or program input) for the presence of the bug. Delta debugging has been mostly used to localise in a program the cause of a runtime error; my former PhD student Jo Sharrad applied delta-debugging to localise the cause of a type error in a Haskell (or OCaml) program. The system programming language Rust has a rich type system to eliminate many classes of memory-safety and thread-safety bugs at compile-time. Hence its constraint-based type system is complex and actually still evolving. Compiler error messages can be difficult to understand. The hypothesis of this research project is that delta-debugging can help the Rust programmer to understand why the compiler rejects a specific program.
Characteristics, Performance, and Benchmarking of Huge Code Bases
Contact: Stefan Marr
A growing number of internet companies develops applications that are millions of lines of code large. Many are too large to be compiled and optimized with traditional techniques. Their size itself and their rate of change of multiple times per hour bring whole new classes of problems when it comes to efficiently execute such applications. Thus, research is needed to better understand how such applications work, how we could optimize their execution with interpreters or compilers, and how to do create benchmarks that can be freely shared with other researchers.
Towards Automatic Mitigation of Concurrency Bugs and Better Debugging Tools
Contact: Stefan Marr
Modern programming languages offer concurrency abstractions that avoid the low-level issues of basic threads. Still, certain types of concurrency bugs will always be possible, but, there are techniques of detecting them at run time. With such techniques we can prevent bugs from causing harm. Building on execution tracing, execution replay, and heap access detection, we could automatically prevent a whole class of bugs. Of course, one would eventually still want to fix them in the application as well. Thus, we want to research how to make detection techniques fast enough for use in production applications, and effective enough to guide developers to the root cause and possible bug fixes.
Enhancing Expressivity of Type Theories for Productive CoProgramming
Contact: Marco Paviotti
Are you a mathematician or a computer scientist with particular inclination towards Functional Programming and/or Category Theory? Then this Ph.D. topic is for you!
The Computational Trinity is a deep insight dating back to the ’70s that computer science, logic and category theory are three views describing the same phenomena. In particular, the notation f : A → B describes at the same time a program that takes as input something of type A and returns a result of type B; a proof that A implies B; and a function from a certain “set” A to a “set” B. This is known as the Curry-Howard correspondence under the slogan proofs-as-programs and formulas-as-types. The advantage of this approach is that it allows for programming languages to be formal systems in which one can develop programs and prove properties about them at the same time. However, one restriction posed by this setup is that the programs under consideration are terminating or, in mathematical terms, that self-referential definitions are well-defined. In other words, the type systems have to be consistent when viewed as a logic. In general, for a given program this is an undecidable property to check, but it is still possible to enforce termination for a smaller set of terminating programs. This project focuses on the study of type systems that ensure termination and, in particular, focuses on putting together two techniques to do so, namely structured recursion schemes and synthetic guarded domain theory. The former are structured recursion operators used in functional programming and inspired by the categorical concept of algebras and coalgebras. The latter is the internal language of the topos of trees, the category of presheaves over the first infinite ordinal.
Cost analysis of higher-order probabilistic programs
Contact: Vineet Rajani
Randomness plays an important role in a variety of approaches used in the theory of machine learning. For example, algorithms like the stochastic gradient descent and prediction with experts advice make crucial use of randomness to make decisions that minimizes the expected loss of the algorithm (either absolute or relative to the best decision in hindsight). This is often done by giving upper-bounds on such an expected loss. However, computing and verifying such bounds can be both tedious and error-prone, often requiring principles of cost amortization along with probabilistic reasoning. As a result, there is a need for a logical framework that can be used to formally verify such bounds. The goal of this project is to develop both the theory and practice of such a logical framework. In particular, we will build a higher-order probabilistic language with a type system s.t. well typed programs in this language will always adhere to their expected cost bounds.
Type-based verification for differential privacy
Contact: Vineet Rajani
Enormous amount of data is collected every day, this data could be extremely useful in making several automated decisions. However, often this data cannot be released/used because of privacy concerns. Prior work has already shown that simple aggregation and anonymization is not sufficient to prevent privacy loss of individuals. Differential privacy is a promising approach to this problem which offers a statistical guarantee to an individual’s privacy. Intuitively, a mechanism is differentially private if the probability of obtaining a result with (or without) an individuals data is almost the same. This limits the amount of information that can learned about that individual. The aim of this project is to develop a type-theoretic framework for analysing differential privacy. We will use ideas from cost analysis and information flow control, two well studied but very different domains in formal verification.
Fine-grained parallelism in compilation
Contact: Michael Vollmer
Often software engineers do not simply build lone applications, they build entire software ecosystems. Companies like Google and Facebook rely on a single, massive code repository (monorepo) for the company’s source code, with the implication that a single change may trigger all downstream code to rebuild and retest. Similarly, open source software package collections like NixOS rely on build farms that frequently rebuild the full set of packages. In order to make this practical, the compilation must be parallel. The most common way to make software build jobs parallel is to process different source code files in parallel whenever possible (this is what “make -j” does), but this coarse-grained parallelism is limited, as big parallel build jobs end up spending a lot of their time bottlenecked on sequential compilation tasks. Ultimately, we need to go further: we need both coarse-grained (between files) and fine-grained (within a file) parallelism. To achieve the latter, the compiler passes themselves must be made parallel, and that’s what this project is about. Taking on this project will require some background in parallel or concurrent programming, and ideally some experience with compilers or programming language implementation.
Using types to specify precise, byte-level data representation of objects in programming languages
Contact: Michael Vollmer
It is common for high-level programming languages to have some mechanisms for allocating objects in memory, and then for connecting and organising these objects into larger structures. For example, languages like C++ and Java support *classes*, and languages like Haskell and ML support *algebraic data types*. Of those four languages, only C++ offers any significant control over the actual memory layout of objects or how and when they are allocated and de-allocated. Unlike C++, the other languages promise *type safety*, and so, the conventional wisdom goes, they must therefore disallow fine-grained control over details of memory representation. However, safe systems programming languages like Rust have shown that it is practical to guarantee type safety while still exposing some low-level details of memory usage. This project aims to continue along those lines and explore how a high-level programming language might give programmers control over the *exact, byte-level representation* of data in their programs while using types to ensure that the resulting program is *memory-safe*. Taking on this project will require some exposure to low-level systems programming in either C/C++ or Rust, as well as an interest in either programming language design or compiler implementation.
The semantics of concurrent systems.
Contact: Mark Batty
The team here at Kent is working to understand the behaviour of modern concurrent systems, and we have exposed — perhaps surprisingly — that humanity does not understand the concurrent behaviour of mainstream computer systems very well at all, yet. We have a broad goal of verifying code written above a typical stack comprising mainstream CPU (ARM, x86), mainstream compiler (GCC, LLVM) and mainstream operating system (Linux). Our work has found and fixed bugs in this stack, in particular making changes to the International Standard Organisation’s C and C++ standards. We have found major problems in optimised concurrent language design (C, C++, Java) that have remained an open for a decade, but we are well on our way to a solution, albeit with many things left to do. We use various methods in this work (typically one person has one or two abilities listed here): (1) We write code to simulate the concurrency designs expressed in specifications. (2) We write mathematical expressions of these designs. (3) We use hand proof and mechanised proof (Coq/HOL/Isabelle) to establish valuable properties of the designs. (4) We reverse engineer parts of the system to understand aspects that are described incompletely by specifications. (5) We build logics for reasoning above the systems we have, rather than above idealised models. (6) We dig into the internals of compilers. (7) We work with prototype hardware designs from project partners like ARM, to help understand their upcoming designs. We are looking for new PhD students who would like to contribute to this project. We have the capacity to build skills that you don’t yet have, but we hope that you will be driven to understand the true behaviour of modern computer systems.
Cost-Aware Behavioural Types for Predictable Communicating Systems
Contact: David Castro-Perez
In this project, you will develop theories and tools for statically guaranteeing that concurrent and distributed systems can run within some specified resource bounds. This work will be based on Behavioural Types, a typing discipline that classifies programs not just in terms of the result of their computations, but also in terms of their allowed sequences of interactions. Behavioural types have been successfully used for verifying functional (e.g. correctness) properties of concurrent and distributed systems. This project will aim at augmenting theories and tools based on behavioural types, to also guarantee extra-functional properties (e.g. cost, or resources needed to run a system) of communicating systems.
Certified Domain Specific Languages for Concurrent/Distributed Systems
Contact: David Castro-Perez
This project is about mechanising domain specific languages for implementing and/or verifying communicating systems in proof assistants. There are currently many theories and tools that can be used for guaranteeing that communicating systems behave according to some specification. But can we trust those theories and tools? There are examples in the literature of subtle changes to a theory that have introduced inconsistencies. The possibility of such errors threatens to invalidate large bodies of work (e.g. extensions of the inconsistent theory, or tools based on it). The aim of this project is to build certified languages and tools for implementing and/or verifying concurrent and distributed systems. This work will be incorporated to our existing framework in the Coq proof assistant.
Multi-stage semantics and typing via graded modalities
Contact: Dominic Orchard
Staged computation (otherwise known as meta programming) is a powerful technique for writing programs that generate programs. This might be in a simple macro language, or a more involved compile-time code generator, or even a multi-stage pipeline of multiple languages or sublanguages, including JIT. It turns out that intuitionistic modal S4 logic generates a useful type system for staged computations (through a Curry-Howard like isomorphism), detailed in the work of Davies and Pfenning (A modal analysis of staged computation, 2001). The aim of this project is to generalise their approach to graded modalities, capturing truly multi-stage programs: not just 2 stages, but n-stages, with their interactions controlled and described by a graded modal type system. This approach can then be combined with other kinds of graded modality to explain the role of data across multiple levels of a staged compilation pipeline.
Grade-directed compilation and verifying optimisations
Contact: Dominic Orchard
Graded types provide an opportunity to capture program properties at the type level, which can be used to expose static analysis information to the user. This aids the programmer in reasoning about their programs, but can also benefit the compiler: the information in graded types can be used to direct compilation. This project would develop a compiler for Granule (or a similar language with graded types) that uses the graded type information to inform optimisations, e.g., leveraging linear types and bounded-reused modalities to statically de-allocate memory and use mutation where possible, or using security modalities to change how binaries are compiled to obscure data and data structures. There is already an initial LLVM compiler and a Granule-to-Haskell compiler, both of which could provide a starting point. An additional, or alternate, project is to push grades into a semantics in order to verify compiler optimisations at the semantic level. Initial work suggests that graded structures in semantics (e.g. graded monads and graded comonads) can allow common optimisations to be proved correct in a unified way at the denotational level, allowing a new compiler verification technique. There is a lot of work to be done here.