These projects are merely suggestions and illustrate the broad themes of the research group. Potential research students are encouraged and welcome to produce their own suggestions in these research areas or other research areas in the field. All applicants are invited to contact the academic associated with the project when making an application.
The semantics of concurrent systems.
Contact: Mark Batty
The team here at Kent is working to understand the behaviour of modern concurrent systems, and we have exposed — perhaps surprisingly — that humanity does not understand the concurrent behaviour of mainstream computer systems very well at all, yet. We have a broad goal of verifying code written above a typical stack comprising mainstream CPU (ARM, x86), mainstream compiler (GCC, LLVM) and mainstream operating system (Linux). Our work has found and fixed bugs in this stack, in particular making changes to the International Standard Organisation’s C and C++ standards. We have found major problems in optimised concurrent language design (C, C++, Java) that have remained an open for a decade, but we are well on our way to a solution, albeit with many things left to do. We use various methods in this work (typically one person has one or two abilities listed here): (1) We write code to simulate the concurrency designs expressed in specifications. (2) We write mathematical expressions of these designs. (3) We use hand proof and mechanised proof (Coq/HOL/Isabelle) to establish valuable properties of the designs. (4) We reverse engineer parts of the system to understand aspects that are described incompletely by specifications. (5) We build logics for reasoning above the systems we have, rather than above idealised models. (6) We dig into the internals of compilers. (7) We work with prototype hardware designs from project partners like ARM, to help understand their upcoming designs. We are looking for new PhD students who would like to contribute to this project. We have the capacity to build skills that you don’t yet have, but we hope that you will be driven to understand the true behaviour of modern computer systems.
Process mining for failure recovery in distributed systems.
Contact: Laura Bocchi
In this project you will use data mining to analyse logs of interactions between components of distributed systems. The aim is twofold: (1) to understand the process underneath these interactions (process mining) and apply known session types techniques to verify and enforce safety properties on these systems; (2) to understand the failure patterns in that system and generate appropriate recovery strategies. Session types are types that describe interaction patterns, and are used to guarantee propertied like absence of deadlocks and communication mismatched via static typing, monitoring, or code generation.
Behavioural API for secure-by-design IoT systems
Contact: Laura Bocchi, Budi Arief
In this project you will develop a theory and tool for design and implementation of secure IoT systems. The project will centre on Behavioural APIs. Behavioural APIs are abstract specifications of application-level protocols; they go beyond traditional APIs in that they specify conversation patterns among components/services (and not just sets of supported operations). Behavioural API have been successfully applied to guarantee safety (eg., absence of deadlocks) via code generation, verification and monitoring. Security has been explored only to a limited extent in this context.
Cost-Aware Behavioural Types for Predictable Communicating Systems
Contact: David Castro-Perez
In this project, you will develop theories and tools for statically guaranteeing that concurrent and distributed systems can run within some specified resource bounds. This work will be based on Behavioural Types, a typing discipline that classifies programs not just in terms of the result of their computations, but also in terms of their allowed sequences of interactions. Behavioural types have been successfully used for verifying functional (e.g. correctness) properties of concurrent and distributed systems. This project will aim at augmenting theories and tools based on behavioural types, to also guarantee extra-functional properties (e.g. cost, or resources needed to run a system) of communicating systems.
Certified Domain Specific Languages for Concurrent/Distributed Systems
Contact: David Castro-Perez
This project is about mechanising domain specific languages for implementing and/or verifying communicating systems in proof assistants. There are currently many theories and tools that can be used for guaranteeing that communicating systems behave according to some specification. But can we trust those theories and tools? There are examples in the literature of subtle changes to a theory that have introduced inconsistencies. The possibility of such errors threatens to invalidate large bodies of work (e.g. extensions of the inconsistent theory, or tools based on it). The aim of this project is to build certified languages and tools for implementing and/or verifying concurrent and distributed systems. This work will be incorporated to our existing framework in the Coq proof assistant.
Modular Omniscient Debugging of Haskell Programs
Contact: Olaf Chitil
An omniscient debugger records a trace during program execution and provides a viewing tool to navigate though the execution backwards and forwards. This project aims to build a truly omniscient debugger for Haskell programs that provides free navigation through the execution along different dependencies, in particular to follow the origin of any (incorrect) part of a data structure. The debugger shall be modular in two senses: 1) The debugger will be separate from the compiler and virtual machine. Thus it is independent of the evolution of these, e.g. the Glasgow Haskell compiler. 2) Debugged programs will consist of both traced and untraced modules for several reasons: It limits the size of the trace, the untraced (majority) of the code executes at normal speed and it allows inclusion of modules in unsupported languages. The latter requires that untraced modules are used unchanged, without any instrumentation. The new debugger will combine methods from earlier debuggers, Hat, Hoed and HatLight. In particular, substantial parts of Hat’s implementation can be reused. Hat’s (https://archives.haskell.org/projects.haskell.org/hat/) instrumentation of all program modules, including untraced modules, is quite complex. Hence it was never extended beyond a subset of Haskell 98. In contrast Hoed’s instrumentation is much simpler, untraced modules are unchanged, but the trace is limited to the algorithmic debugging method. Finally, HatLight is a small prototype recording Hat’s trace using Hoed’s instrumentation method.
Type Error Debugging for Haskell Programs
Contact: Olaf Chitil
The Hindley-Milner type system used in many functional programming languages gives the programmer much freedom, because types are inferred from the program structure. In return, however, type error messages are often hard to understand or misleading. Over the last 30 years numerous proposals have been made for how to ameliorate the situation, but very little has changed in practice. The aim of this project should *not* be to invent yet another method. Instead there are two lines of research: First of all, many of the existing methods have similarities. These similarities should be identified and be formally described. Thus the huge number of proposals can probably be reduced substantially. Second, I believe that most existing proposals have not made it into practice, because their implementation for a full programming language would require too much effort and, with the evolution of the programming language, too much maintenance. Hence this kind of effort required for any solution is paramount: a method that only gives a little bit of support in type error debugging but is easy to implement will have more chance to succeed in practice. An existing proposal based on delta-debugging, which reuses the existing compiler as a black box, is therefore promising.
Characteristics, Performance, and Benchmarking of Huge Code Bases
Contact: Stefan Marr
A growing number of internet companies develops applications that are millions of lines of code large. Many are too large to be compiled and optimized with traditional techniques. Their size itself and their rate of change of multiple times per hour bring whole new classes of problems when it comes to efficiently execute such applications. Thus, research is needed to better understand how such applications work, how we could optimize their execution with interpreters or compilers, and how to do create benchmarks that can be freely shared with other researchers.
Towards Automatic Mitigation of Concurrency Bugs and Better Debugging Tools
Contact: Stefan Marr
Modern programming languages offer concurrency abstractions that avoid the low-level issues of basic threads. Still, certain types of concurrency bugs will always be possible, but, there are techniques of detecting them at run time. With such techniques we can prevent bugs from causing harm. Building on execution tracing, execution replay, and heap access detection, we could automatically prevent a whole class of bugs. Of course, one would eventually still want to fix them in the application as well. Thus, we want to research how to make detection techniques fast enough for use in production applications, and effective enough to guide developers to the root cause and possible bug fixes.
Multi-stage semantics and typing via graded modalities
Contact: Dominic Orchard
Staged computation (otherwise known as meta programming) is a powerful technique for writing programs that generate programs. This might be in a simple macro language, or a more involved compile-time code generator, or even a multi-stage pipeline of multiple languages or sublanguages, including JIT. It turns out that intuitionistic modal S4 logic generates a useful type system for staged computations (through a Curry-Howard like isomorphism), detailed in the work of Davies and Pfenning (A modal analysis of staged computation, 2001). The aim of this project is to generalise their approach to graded modalities, capturing truly multi-stage programs: not just 2 stages, but n-stages, with their interactions controlled and described by a graded modal type system. This approach can then be combined with other kinds of graded modality to explain the role of data across multiple levels of a staged compilation pipeline.
Grade-directed compilation and verifying optimisations
Contact: Dominic Orchard
Graded types provide an opportunity to capture program properties at the type level, which can be used to expose static analysis information to the user. This aids the programmer in reasoning about their programs, but can also benefit the compiler: the information in graded types can be used to direct compilation. This project would develop a compiler for Granule (or a similar language with graded types) that uses the graded type information to inform optimisations, e.g., leveraging linear types and bounded-reused modalities to statically de-allocate memory and use mutation where possible, or using security modalities to change how binaries are compiled to obscure data and data structures. There is already an initial LLVM compiler and a Granule-to-Haskell compiler, both of which could provide a starting point. An additional, or alternate, project is to push grades into a semantics in order to verify compiler optimisations at the semantic level. Initial work suggests that graded structures in semantics (e.g. graded monads and graded comonads) can allow common optimisations to be proved correct in a unified way at the denotational level, allowing a new compiler verification technique. There is a lot of work to be done here.
Cost analysis of higher-order probabilistic programs
Contact: Vineet Rajani
Randomness plays an important role in a variety of approaches used in the theory of machine learning. For example, algorithms like the stochastic gradient descent and prediction with experts advice make crucial use of randomness to make decisions that minimizes the expected loss of the algorithm (either absolute or relative to the best decision in hindsight). This is often done by giving upper-bounds on such an expected loss. However, computing and verifying such bounds can be both tedious and error-prone, often requiring principles of cost amortization along with probabilistic reasoning. As a result, there is a need for a logical framework that can be used to formally verify such bounds. The goal of this project is to develop both the theory and practice of such a logical framework. In particular, we will build a higher-order probabilistic language with a type system s.t. well typed programs in this language will always adhere to their expected cost bounds.
Type-based verification for differential privacy
Contact: Vineet Rajani
Enormous amount of data is collected every day, this data could be extremely useful in making several automated decisions. However, often this data cannot be released/used because of privacy concerns. Prior work has already shown that simple aggregation and anonymization is not sufficient to prevent privacy loss of individuals. Differential privacy is a promising approach to this problem which offers a statistical guarantee to an individual’s privacy. Intuitively, a mechanism is differentially private if the probability of obtaining a result with (or without) an individuals data is almost the same. This limits the amount of information that can learned about that individual. The aim of this project is to develop a type-theoretic framework for analysing differential privacy. We will use ideas from cost analysis and information flow control, two well studied but very different domains in formal verification.