skip to content
 

Probabilistic deep learning for drug discovery - Alpha Lee - Funding TBC

Predicting bioactivity of small molecules are central challenges in drug discovery. Deep learning is becoming the method of choice but studies to date focus on mean accuracy as the main metric. However, to replace costly and mission-critical experiments by models, a high mean accuracy is not enough: Outliers can derail a discovery campaign, thus models need reliably predict when it will fail, even when the training data is biased; experiments are expensive, thus models need to be data-efficient and suggest informative training sets using active learning. The first stage of the project will focus on developing scalable Bayesian deep learning methodologies to estimate the prediction uncertainty. The second stage of the project will focus on developing active learning methodologies to guide the design of experiments. 

Tackling pharmaceutical formulations with physics-based machine learning - Alpha Lee - Funding TBC

Colloidal aggregation is both an understudied nuisance and an opportunity for drug discovery. It is estimated that 45-80% of published bioactivity measurements are false positives because the compounds being tested aggregate into colloids and interfere with the assay. However, the same physics of aggregation can be used to design nanoparticle drug carriers that selectively deliver drug payload based on cell type. Aggregation is poorly predicted - naive machine learning approaches overlooks the physics of intermolecular interactions and forcefield errors limit the predictive accuracy of molecular simulations. This project aims to use molecular simulations to extract physical descriptors of solute-solute interaction and solute-water interactions and use machine learning to relate those descriptors to experimental data.

Realising a GPS for chemistry: Accelerating chemical synthesis using machine learning - Alpha Lee - Funding TBC

Making a complex chemical molecule from simpler constituents is the key challenge in organic chemistry. The state of the art is often trial-and-error. However, vast amounts of reactions are reported in the chemistry literature, and recent technological advances have made high throughput synthesis experiments and high throughput quantum chemistry calculations possible. The key question is how to assimilate those disparate pieces of data to realize a platform that can suggest synthetic pathways given a target molecule - a chemical GPS.

The first stage of the project will focus on predicting the outcome of chemical reactions using high throughput experiments as training data. Guided by quantum chemistry, we will construct predictive molecular descriptors. The second stage of the project will focus on rational experiment design - developing models that can suggest new experiments that could lead to the greatest information gain. Once diverse classes of reactions have been modeled, the ultimate goal of the project is to realize a computational synthesis recommender that recommends feasible synthetic schemes.

Predicting New Materials for Engineering Applications: Employing Data Science with Machine Learning - Jacqui Cole

The world needs new materials to stimulate the engineering industry in key sectors of our economy: aerospace, aeronautics, automotive, shipbuilding, energy, robotics, classical and additive manufacturing to name just a few. Yet, nearly all materials properties for engineering are still discovered by ‘trial-and-error’, whose lack of predictability affords a major materials bottleneck to technological innovation. This PhD project aims to tackle this materials discovery problem by data science with machine learning. By exploiting database auto-generation code that has been developed by the Molecular Engineering (MolE) group at the Cavendish Laboratory, a specialist world’s repository of data for industrially-relevant engineering applications will be first captured from the literature and niche data sources. Machine learning tools (neural networks, genetic algorithms) will then be employed in order to determine data patterns that represent previously unobserved trends between material structure and key engineering properties. Accordingly, a series of material predictions will be generated and short-listed via statistical methods. The student will have the opportunity for their material predictions to be tested via experimentalists in the MolE group, in collaboration with the Rutherford Appleton Laboratory. 

Creating and Applying New Software Tools for Chemical Data Science - Jacqui Cole

The UK’s $54 billion chemical industry is the largest exporter of any UK goods manufacturing sector. The development of data-driven materials discovery for the chemical industry lies at the heart of its economic future. The Molecular Engineering group at the University of Cambridge is developing new data science tools to meet this need in direct collaboration with industry. This PhD project will employ the latest advances in data science with artificial intelligence to develop new software tools that aim to reduce the average 20 year ‘molecule-to-market’ timeframe. This work will build a brand new strand of our parent software tool, ChemDataExtractor (www.chemdataextractor.org). This project presents a rare opportunity for a student to apply their scientific programming skills to create and apply niche software tools that will be fed directly into industrial application. 

Predicting New Materials for Optoelectronic Applications: Employing Data Science with Machine Learning - Jacqui Cole

The world needs new materials to stimulate the optoelectronics industry in key sectors of our economy: photovoltaics, light-emitting diodes, field emission transistors, to name just a few. Yet, nearly all materials properties for the optoelectronics industry are still discovered by ‘trial-and-error’, whose lack of predictability affords a major materials bottleneck to technological innovation. This PhD project aims to tackle this materials discovery problem by data science with machine learning. By exploiting database auto-generation code that has been developed by the Molecular Engineering (MolE) group at the Cavendish Laboratory, a specialist repository of {chemical, property} data for industrially-relevant optoelectronic applications will be first captured from the literature and niche data sources. Machine learning tools (neural networks, genetic algorithms) will then be employed in order to determine data patterns that represent previously unobserved trends between material structure and key optoelectronic properties. Accordingly, a series of material predictions will be generated and short-listed via statistical methods. The student will have the opportunity for their material predictions to be tested by experimentalists in the MolE group, in collaboration with the Rutherford Appleton Laboratory.

Functionalising Batteries with Computation, Data Science & Machine Learning - Jacqui Cole

Next-generation battery technologies are being developed in order to mitigate the growing global energy crisis. Most batteries are based upon an electrochemical cell, for which a molecular understanding is crucial when designing new types of batteries. While Li batteries have dominated the market over recent years, they have well known drawbacks, most famously their tendency to cause fire and the limitations on their re-charging abilities. This PhD project aims to develop a new type of battery by installing product design at the molecular scale. The development of advanced computational calculations will lie at the core of this project, using density functional theory with machine learning to realise this molecular engineering task. The project will provide a rare opportunity to perform high-throughput calculations using one of the largest supercomputers in the world, with whom this research is in collaboration. The student will have the option to collaborate with experimentalists in the Molecular Engineering group at the Cavendish Laboratory, to transform the computationally-driven battery designs from this work into devices that will be experimentally trialed.

Exploring the energy landscape of histone tail proteins: structural disorder, epigenetic effects, and DNA binding - Rosana Collepardo-Guevara and David Wales

The three-dimensional organisation of DNA is one of the great marvels of physical biology. By winding around a special class of proteins known as histones, DNA manages to avoid entanglement, compresses enormously to fit inside tiny (6 μm) nuclei and, moreover, maintains exquisite control over the accessibility of its data. A set of chemical modifications that extend the genome, known as epigenetic marks, are responsible for this control. Unlike the DNA sequence, which is exactly the same in all our cells (e.g. liver, skin, brain), the distribution of epigenetic marks (epigenome) is different in each cell type. Epigenomes allow DNA sequences to be interpreted differently, generating diversity in cells and tissues. Increasing evidence suggests that epigenomes regulate gene function by directly transforming the nanostructure.Deciphering how epigenetic marks govern the physiological form of the genome (chromatin) is critical for unravelling some of the most basic cellular functions, including transcription activation and gene silencing. Chromatin is the actual substrate for all DNA-directed processes, and thus changes in chromatin structure are intimately linked to gene regulation. Chromatin is formed by a sequence of DNA-protein particles (nucleosomes) joined by free DNA linker segments. The nucleosomes have evolved extraordinary charged and contoured surfaces, along with charged and flexible protruding ‘arms’, known as histone tails, that allow them to control the organization of the DNA inside chromatin with high precision. This project aims to elucidate the mechanisms by which histone tail structural diversity is modulated and the implications for control of chromatin structure. This insight will be achieved through characterization of the energy landscapes for histone tails H4 and H3 using powerful new tools within the energy landscape framework. We have chosen these two tails because they are known to mediate the majority of internucleosome interactions. We will also investigate how binding to small DNA segments (which occurs within the chromatin context), and the presence of epigenetic modifications with strong effects on chromatin structure (e.g. acetylation and phosphorylation), transform the energy landscapes of such tails. This information will help us to understand the importance of protein structural disorder for the organization of the genome in three dimensions.

Design of metallic materials for additive manufacturing applications using artificial intelligence approaches - Gareth Conduit

Additive manufacturing is an exciting new technology for engineers. Our proposal is to create a workflow for designing and optimizing materials for additive manufacturing based on a novel artificial intelligence tool. The method exploits all available data including first principles calculations and experimental, allowing the development of new materials that simultaneously satisfy multiple properties.
The student will collate existing data, perform first principles calculations, build the artificial intelligence model, and exploit the model to understand and propose new compositions of industrial relevance. The project will show how to develop new materials for additive manufacturing, and further the exploitation of artificial intelligence in the rapid design of new materials.

Topological materials at finite temperature - Bartomeu Monserrat

Topology has emerged as a new tool to classify and understand phases of matter. Materials with nontrivial topology carry currents that cannot be stopped by impurities, exhibit electromagnetic effects beyond the standard Maxwell equations, and provide realisations of particles such as Weyl fermions that had so far only been theorised in particle physics. This makes them attractive candidate materials for applications such as dissipationless electronics or quantum computers.  We have a good understanding of topological materials at zero temperature, but very little is known about their behaviour at higher temperatures, which is a necessary condition for future applications of these materials in technology. In this project we will use quantum mechanical simulations to investigate the properties of topological materials when we include the effects of temperature. We will do this by including both thermal expansion and electron-phonon coupling in our calculations, and we will investigate a range of topological materials, including topological insulators, topological crystalline insulators, and Chern insulators.​

Photovoltaics at finite temperature - Bartomeu Monserrat

The Sun is the only infinite source of energy available to us at present, and a large research effort is dedicated to the understanding and design of novel next-generation photovoltaic materials that could lead to more efficient, cheaper, and environmentally friendlier solar cells. Traditionally, the computational design of materials has been performed at zero temperature, but the properties of materials can undergo dramatic changes when temperature is included. In this project we will use quantum mechanical simulations to understand the optoelectronic properties of semiconductors for use in photovoltaics under realistic operating conditions of temperature. We will look at a range of next-generation photovoltaic materials, including perovskites and kesterites.

Superconductivity at extreme pressures (see also Computational discovery of new superconductors with Chris Pickard in Materials Science) - Bartomeu Monserrat

Superconductivity was discovered over a century ago, but so far we have been unable to find a room temperature superconductor. Phonon-mediated superconductivity has traditionally been regarded to lead to lower critical temperatures when compared to so-called high temperature superconductors. However, it has been very recently demonstrated that a compound of hydrogen and sulfur under extreme pressures (close to those at the centre of the Earth), which is a phonon-mediated superconductor, has a critical temperature of 200 K, the highest ever recorded. In this project, we will use quantum mechanical simulations to study superconductivity in a range of materials under extreme pressures. The objective will be to understand the necessary ingredients that lead to such high critical temperatures at high pressure, and try to replicate them under ambient conditions.