Biology has changed radically in the last few decades, transitioning from a descriptive science into a design science. Synthetic biology goes beyond the traditional biology practice of describing and cataloguing (e.g. Linnaean taxonomic classification or developing phylogenetic trees), and aims to design biological systems to a given specification (e.g. produce x grams of this medical drug or invade this type of cancer cell). 

In this effort, new tools enable us to bioengineer cells faster than ever: CRISPR-enabled genetic editing has revolutionized our ability to modify DNA in vivo, DNA synthesis productivity improves as fast as Moore’s law, transcriptomics data volume has a doubling rate of 7 months, and high-throughput workflows for proteomics and metabolomics are becoming increasingly available.

However, our inability to predict the behavior of biological systems hampers synthetic biology from reaching its full potential. While we can make the DNA changes we intend, the end result on cell behavior is usually unpredictable. 

Our group is devoted to develop algorithms to predict biological behavior: specifically, we aim to create algorithms to systematically predict actionable items that help reach a design specification.

In order to achieve this, we work on:

Machine Learning

Machine learning provides predictive power without the need for detailed mechanistic understanding.  Thus, it complements perfectly synthetic biology's current needs. 

For example, we have used machine learning to choose promoters for a pathway, predict the dynamics of new pathway inserted in a host (kinetic learning), to guide metabolic engineering efforts to produce a yeast variety that supersedes hops in beer production

We are also working on applying Deep Learning methods to meet synthetic biology needs.

Flux-based Mechanistic Models

Mechanistic models in synthetic biology are still important, despite the success of machine learning approaches. Mechanisms offer a causally related set of processes and parts that produce the observed phenomena, so their understanding allows the transfer of this knowledge to different systems (pathways, strains, products, etc). 

Metabolic fluxes (i.e., the number of metabolites traversing each biochemical reaction per unit time) are crucial because they map how carbon and electrons flow through metabolism to enable cell function. Among the most popular methods for studying metabolic fluxes are Flux Balance Analysis (FBA) and 13C Metabolic Flux Analysis (13C MFA). We created methods that combine the advantages of both, by using 13C tracing experiments to efficiently constrain genome-scale models: Bayflux and 2S-13C MFA

These methods have allowed us to predict the outcome of ~50 direct measurements of metabolite labeling, direct metabolic engineering to improve biofuel production, and map the metabolic effects of gene knock-outs.


Retrobiosynthesis is a basic tool for metabolic engineering and synthetic biology. Given a target molecule of interest, retrobiosynthesis tools ‘walk’ backwards through the known chemical transformation rules to identify potential precursors and reactions.

Polyketide synthases (PKSs) provide a systematic and modular way to synthesize millions of structurally distinct small molecules. Interestingly, type I modular PKSs follow a deterministic logic: the order of its modules can predict the resulting chemical product with surprising accuracy. However, engineering them to obtain a specified molecule is highly non-trivial.

We developed ClusterCAD as a computational retrobiosynthesis platform to streamline the process of designing PKS variants to obtain a desired molecule. 


Software Development

Predictive modeling requires standardized data collection and storage to be truly effective (excel sheets really do not cut it long term).  

We have developed the Experiment Data Depot (EDD), an online tool designed as a repository of experimental data and metadata. EDD can uptake experimental data, provide visualization of these data, and produce downloadable data in several standard output formats through an API.

Synthetic Biology Automation

Machine learning and other modeling efforts require large quantities of high-quality data. While we have leveraged human-generated data successfully in the past, the need for larger unbiased data sets, and faster turnover times, can ultimately only be met through automated data collection.

We have focused  our attention on three different approaches to automation:

You can see our publications here.