Biology has changed radically in the last few decades, transitioning from a descriptive science into a design science. Synthetic biology goes beyond the traditional biology practice of describing and cataloguing (e.g. Linnaean taxonomic classification or phylogenetic tree development), and aims to design biological systems to a given specification (e.g. produce x grams of this medical drug or invade this type of cancer cell).
In this effort, new tools enable us to bioengineer cells faster than ever: CRISPR-enabled genetic editing has revolutionized our ability to modify DNA in vivo, DNA synthesis productivity improves as fast as Moore’s law, transcriptomics data volume has a doubling rate of 7 months, and high-throughput workflows for proteomics and metabolomics are becoming increasingly available.
However, our inability to predict the behavior of biological systems hampers synthetic biology from reaching its full potential. While we can make the DNA changes we intend, the end result on cell behavior is usually unpredictable.
Our group is devoted to develop algorithms to predict biological behavior: specifically, we aim to create algorithms to systematically predict actionable items that help reach a design specification.
I order to achieve this we work on:
Machine learning: provides the predictive capabilities that synthetic biology desperately needs.
Automation: produces the amounts of high-quality training data needed for machine learning.
Flux-based mechanistic models: complements machine learning approaches with models that incorporate prior biological knowledge and focus on mechanistic understanding.
Retrobiosynthesis: systematizes and streamlines the choice of targets for metabolic engineering.
Software development: vital in order to collect and visualize the large amounts of data required for successful modeling of biological systems.
For example, we have used machine learning to choose promoters for a pathway, predict the dynamics of new pathway inserted in a host (kinetic learning), to guide metabolic engineering efforts to produce a yeast variety that supersedes hops in beer production.
Flux-based Mechanistic Models
Mechanistic models in synthetic biology are still indispensable, in spite of the success of machine learning approaches. Mechanisms offer a causally related set of processes and parts that produce the observed phenomena, so their understanding allows the transfer of this knowledge to different systems (pathways, strains, products, etc). Furthermore, mechanistic models force stringent constraints on data quality, the lack of which is conspicuously revealed in failed predictions.
Metabolic fluxes (i.e., the number of metabolites traversing each biochemical reaction per unit time) are crucial because they map how carbon and electrons flow through metabolism to enable cell function. Among the most popular methods for studying metabolic fluxes are Flux Balance Analysis (FBA) and 13C Metabolic Flux Analysis (13C MFA). We created a new method that combines the advantages of both: 2S-13C MFA determines fluxes for a full genome-scale model, without the need to rely on maximum growth assumptions (it uses 13C labeling experiments to constrain fluxes).
2S-13C MFA has allowed us to predict the outcome of ~50 direct measurements of metabolite labeling, direct metabolic engineering to improve biofuel production, and map the metabolic effects of gene knock-outs.
Retrobiosynthesis is a basic tool for metabolic engineering and synthetic biology. Given a target molecule of interest, retrobiosynthesis tools ‘walk’ backwards through the known chemical transformation rules to identify potential precursors and reactions.
Polyketide synthases (PKSs) provide a systematic and modular way to synthesize millions of structurally distinct small molecules. Interestingly, type I modular PKSs follow a deterministic logic: the order of its modules can predict the resulting chemical product with surprising accuracy. However, engineering them to obtain a specified molecule is highly non-trivial.
We developed ClusterCAD as a computational retrobiosynthesis platform to streamline the process of designing PKS variants to obtain a desired molecule.
Predictive modeling requires standardized data collection and storage to be truly effective (excel sheets really do not cut it long term).
We have developed the Experiment Data Depot (EDD), an online tool designed as a repository of experimental data and metadata. EDD can uptake experimental data, provide visualization of these data, and produce downloadable data in several standard output formats through an API.
Furthermore, we have developed Arrowland, an online multiscale interactive tool for multiomics data visualization (paper in preparation).
Synthetic Biology Automation
Machine learning and other modeling efforts require large quantities of high-quality data. While we have leveraged human-generated data successfully in the past, the need for larger unbiased data sets, and faster turnover times, can ultimately only be met through automated data collection.
We have focused our attention on three different approaches to automation: