Guided by artificial intelligence and powered by a robotic platform, a system developed by MIT researchers moves a step closer to automating the production of small molecules that could be used in medicine, solar energy, and polymer chemistry.
The system, described in the August 8 issue of Science, could free up bench chemists from a variety of routine and time-consuming tasks, and may suggest possibilities for how to make new molecular compounds, according to the study co-leaders Klavs F. Jensen, the Warren K. Lewis Professor of Chemical Engineering, and Timothy F. Jamison, the Robert R. Taylor Professor of Chemistry and associate provost at MIT.
The technology “has the promise to help people cut out all the tedious parts of molecule building,” including looking up potential reaction pathways and building the components of a molecular assembly line each time a new molecule is produced, says Jensen.
“And as a chemist, it may give you inspirations for new reactions that you hadn’t thought about before,” he adds.
Other MIT authors on the Science paper include Connor W. Coley, Dale A. Thomas III, Justin A. M. Lummiss, Jonathan N. Jaworski, Christopher P. Breen, Victor Schultz, Travis Hart, Joshua S. Fishman, Luke Rogers, Hanyu Gao, Robert W. Hicklin, Pieter P. Plehiers, Joshua Byington, John S. Piotti, William H. Green, and A. John Hart.
From inspiration to recipe to finished product
The new system combines three main steps. First, software guided by artificial intelligence suggests a route for synthesizing a molecule, then expert chemists review this route and refine it into a chemical “recipe,” and finally the recipe is sent to a robotic platform that automatically assembles the hardware and performs the reactions that build the molecule.
Coley and his colleagues have been working for more than three years to develop the open-source software suite that suggests and prioritizes possible synthesis routes. At the heart of the software are several neural network models, which the researchers trained on millions of previously published chemical reactions drawn from the Reaxys and U.S. Patent and Trademark Office databases. The software uses these data to identify the reaction transformations and conditions that it believes will be suitable for building a new compound.
“It helps makes high-level decisions about what kinds of intermediates and starting materials to use, and then slightly more detailed analyses about what conditions you might want to use and if those reactions are likely to be successful,” says Coley.
“One of the primary motivations behind the design of the software is that it doesn’t just give you suggestions for molecules we know about or reactions we know about,” he notes. “It can generalize to new molecules that have never been made.”
Chemists then review the suggested synthesis routes produced by the software to build a more complete recipe for the target molecule. The chemists sometimes need to perform lab experiments or tinker with reagent concentrations and reaction temperatures, among other changes.
“They take some of the inspiration from the AI and convert that into an executable recipe file, largely because the chemical literature at present does not have enough information to move directly from inspiration to execution on an automated system,” Jamison says.
The final recipe is then loaded on to a platform where a robotic arm assembles modular reactors, separators, and other processing units into a continuous flow path, connecting pumps and lines that bring in the molecular ingredients.
“You load the recipe — that’s what controls the robotic platform — you load the reagents on, and press go, and that allows you to generate the molecule of interest,” says Thomas. “And then when it’s completed, it flushes the system and you can load the next set of reagents and recipe, and allow it to run.”
Unlike the continuous flow system the researchers presented last year, which had to be manually configured after each synthesis, the new system is entirely configured by the robotic platform.
“This gives us the ability to sequence one molecule after another, as well as generate a library of molecules on the system, autonomously,” says Jensen.
The design for the platform, which is about two cubic meters in size — slightly smaller than a standard chemical fume hood — resembles a telephone switchboard and operator system that moves connections between the modules on the platform.
“The robotic arm is what allowed us to manipulate the fluidic paths, which reduced the number of process modules and fluidic complexity of the system, and by reducing the fluidic complexity we can increase the molecular complexity,” says Thomas. “That allowed us to add additional reaction steps and expand the set of reactions that could be completed on the system within a relatively small footprint.”
Toward full automation
The researchers tested the full system by creating 15 different medicinal small molecules of different synthesis complexity, with processes taking anywhere between two hours for the simplest creations to about 68 hours for manufacturing multiple compounds.
The team synthesized a variety of compounds: aspirin and the antibiotic secnidazole in back-to-back processes; the painkiller lidocaine and the antianxiety drug diazepam in back-to-back processes using a common feedstock of reagents; the blood thinner warfarin and the Parkinson’s disease drug safinamide, to show how the software could design compounds with similar molecular components but differing 3-D structures; and a family of five ACE inhibitor drugs and a family of four nonsteroidal anti-inflammatory drugs.
“I’m particularly proud of the diversity of the chemistry and the kinds of different chemical reactions,” says Jamison, who said the system handled about 30 different reactions compared to about 12 different reactions in the previous continuous flow system.
“We are really trying to close the gap between idea generation from these programs and what it takes to actually run a synthesis,” says Coley. “We hope that next-generation systems will increase further the fraction of time and effort that scientists can focus their efforts on creativity and design.”
The research was supported, in part, by the U.S. Defense Advanced Research Projects Agency (DARPA) Make-It program.
The microtubule-binding protein tau in neurons of the central nervous system can misfold into filamentous aggregates under certain conditions. These filaments are found in many neurodegenerative diseases such as Alzheimer’s disease, chronic traumatic encephalopathy (CTE), and progressive supranuclear palsy. Understanding the molecular structure and dynamics of tau fibrils is important for designing anti-tau inhibitors to combat these diseases.
Cryoelectron microscopy studies have recently shown that tau fibrils derived from postmortem brains of Alzheimer’s patients adopt disease-specific molecular conformations. These conformations consist of long sheets, known as beta sheets, that are formed by thousands of protein molecules aligned in parallel. In contrast, recombinant tau fibrillized using the anionic polymer heparin was reported to exhibit polymorphic structures. However, the origin of this in vitro structural polymorphism as compared to the in vivo structural homogeneity is unknown.
Using solid-state nuclear magnetic resonance (SSNMR) spectroscopy, MIT Professor Mei Hong, in collaboration with Professor Bill DeGrado at the University of California at San Francisco, has shown in a paper, published July 29 in PNAS, that the beta sheet core of heparin-fibrillized tau in fact adopts a single molecular conformation. The tau protein they studied contains four microtubule-binding repeats, and the beta sheet fibril core spans the second and third repeats.
Clarifying biochemical studies of tau and its fibril formation
Previous research on this subject had reported four polymorphic structures of four-repeat (4R) tau fibrils, a polymorphism that led many labs to believe that in vitro tau fibrils were poor mimics of the in vivo patient-brain tau. However, through the use of their SSNMR spectra, which show only a single set of peaks for the protein, Hong and DeGrado discovered a crucial biochemical problem that led to the previous polymorphism.
Once this error was corrected, 4R tau was found to display only a single molecular structure. The revelation of this common biochemical problem, which is protease contamination in the heparin used to fibrillize tau, will significantly clarify and positively impact the field of tau research.
Preventing the formation of tau aggregates in Alzheimer’s disease and beyond
The three-dimensional fold of this four-repeat tau fibril core is distinct from the fibril core of the Alzheimer’s disease tau, which consists of a mixture of three- and four-repeat isoforms. “The tau isoform we studied is the same as that in diseases such as progressive supranuclear palsy, [so] the structural model we determined suggests what the patient brain tau from PSP may look like. Knowing this structure will be important for designing anti-tau inhibitors to either disrupt fibrils or prevent fibrils from forming in the first place,” explains Hong.
This SSNMR study also reported detailed characterizations of the mobilities of amino acid residues outside the rigid beta sheet core. These residues, which appear as a “fuzzy coat” in transmission electron micrographs, exhibit increasingly larger-amplitude motion towards the two ends of the polypeptide chain. Interestingly, the first and fourth microtubule-binding repeats, although excluded from the rigid core, display local b-strand conformations and are semi-rigid.
These structural and dynamical results suggest future medicinal interventions to disrupt or prevent the formation of tau aggregates in some neurodegenerative diseases.
Identifying species among plants and animals has been a full-time occupation for some biologists, but the task is even more daunting for the myriad microbes that inhabit the planet. Now, MIT researchers have developed a simple measurement of gene flow that can define ecologically important populations among bacteria and archaea, including pinpointing populations associated with human diseases.
The gene flow metric separates co-existing microbes in genetically and ecologically distinct populations, Martin Polz, a professor of civil and environmental engineering at MIT, and colleagues write in the August 8 issue of Cell.
Polz and his colleagues also developed a method to identify parts of the genome in these populations that show different adaptations that can be mapped onto different environments. When they tested their approach on a gut bacterium, for instance, they were able to determine that different populations of the bacteria were associated with healthy individuals and patients with Crohn’s disease.
Biologists often call a group of plants or animals a species if the group is reproductively isolated from others — that is, individuals in the group can reproduce with each other, but they can’t reproduce with others. As a result, members of a species share a set of genes that differs from other species. Much of evolutionary theory centers on species and populations, the representatives of a species in a particular area.
But microbes “defy the classic species concept for plants and animals,” Polz explains. Microbes tend to reproduce asexually, simply splitting themselves in two rather than combining their genes with other individuals to produce offspring. Microbes are also notorious for “taking up DNA from environmental sources, such as viruses,” he says. “Viruses can transfer DNA into microbial cells and that DNA can be incorporated into their genomes.”
These processes make it difficult to sort coexisting microbes into distinct populations based on their genetic makeup. “If we can’t identify those populations in microbes, we can’t one-to-one apply all this rich ecological and evolutionary theory that has been developed for plants and animals to microbes,” says Polz.
If researchers want to measure an ecosystem’s resilience in the face of environmental change, for instance, they might look at how populations within species change over time. “If we don’t know what a species is, it’s very difficult to measure and assess these types of perturbations,” he adds.
Christopher Marx, a microbiologist at the University of Idaho who was not part of the Cell study, says he and his colleagues “will immediately apply” the MIT researchers’ approach to their own work. “We can use this to answer the question, ‘What should we define as an ecologically important unit?’”
A yardstick for gene flow
Martin and his colleagues decided to look for another way to define ecologically meaningful populations in microbes. Led by microbiology graduate student Philip Arevalo, the researchers developed a metric of gene flow that they called PopCOGenT (Populations as Clusters Of Gene Transfer).
PopCOGenT measures recent gene flow or gene transfer between closely related genomes. In general, microbial genomes that have exchanged DNA recently should share longer and more frequent stretches of identical DNA than if individuals were just reproducing by splitting their DNA in two. Without this sort of recent exchange, the researchers suggested, the length of these shared stretches of identical DNA would shorten as mutations insert new “letters” into the stretch.
Two microbial strains that are not genetically identical to each other but share sizable “chunks” of identical DNA are probably exchanging more genetic material with each other than with other strains. This gene flow measurement can define distinct microbial populations, as the researchers discovered in their tests of three different kinds of bacteria.
In Vibrio bacteria, for instance, closely related populations may share some core gene sequences, but they appear completely isolated from each other when viewed through this measurement of recent gene flow, Polz and colleagues found.
Polz says that the PopCOGenT method may work better at defining microbial populations than previous studies because it focuses on recent gene flow among closely related organisms, rather than including gene flow events that may have happened thousands of years in the past.
The method also suggests that while microbes are constantly taking in different DNA from their environment that might obscure patterns of gene flow, “it may be that this divergent DNA is really removed by selection from populations very quickly,” says Polz.
The reverse ecology approach
Microbiology graduate student David VanInsberghe then suggested a “reverse ecology” approach that could identify regions of the genome in these newly defined populations that show “selective sweeps” — places where DNA variation is reduced or eliminated, likely as a result of strong natural selection for a particular beneficial genetic variant.
By identifying specific sweeps within populations, and mapping the distribution of these populations, the method can reveal possible adaptations that drive microbes to inhabit a particular environment or host — without any prior knowledge of their environment. When the researchers tested this approach in the gut bacterium Ruminococcus gnavus, they uncovered separate populations of the microbe associated with healthy people and patients with Crohn’s disease.
Polz says the reverse ecology method is likely to be applied in the near future to studying the full diversity of the bacteria that inhabit the human body. “There is a lot of interest in sequencing closely related organisms within the human microbiome and looking for health and disease associations, and the datasets are growing.”
He hopes to use the approach to examine the “flexible genome” of microbes. Strains of E. coli bacteria, for instance, share about 40 percent of their genes in a “core genome,” while the other 60 percent — the flexible part — varies between strains. “For me, it’s one of the biggest questions in microbiology: Why are these genomes so diverse in gene content?” Polz explains. “Once we can define populations as evolutionary units, we can interpret gene frequencies in these populations in light of evolutionary processes.”
Polz and colleagues’ findings could increase estimates of microbe diversity, says Marx. “What I think is really cool about this approach from Martin’s group is that they actually suggest that the complexity that we see is even more complex than we’re giving it credit for. There may be even more types that are ecologically important out there, things that if they were plants and animals we would be calling them species.”
Other MIT authors on the paper include Joseph Elsherbini and Jeff Gore. The research was supported, in part, by the National Science Foundation and the Simons Foundation.
In recent years, MIT scientists have developed a new model for how key genes are controlled that suggests the cellular machinery that transcribes DNA into RNA forms specialized droplets called condensates. These droplets occur only at certain sites on the genome, helping to determine which genes are expressed in different types of cells.
In a new study that supports that model, researchers at MIT and the Whitehead Institute for Biomedical Research have discovered physical interactions between proteins and with DNA that help explain why these droplets, which stimulate the transcription of nearby genes, tend to cluster along specific stretches of DNA known as super enhancers. These enhancer regions do not encode proteins but instead regulate other genes.
“This study provides a fundamentally important new approach to deciphering how the ‘dark matter’ in our genome functions in gene control,” says Richard Young, an MIT professor of biology and member of the Whitehead Institute.
Young is one of the senior authors of the paper, along with Phillip Sharp, an MIT Institute Professor and member of MIT’s Koch Institute for Integrative Cancer Research; and Arup K. Chakraborty, the Robert T. Haslam Professor in Chemical Engineering, a professor of physics and chemistry, and a member of MIT’s Institute for Medical Engineering and Science and the Ragon Institute of MGH, MIT, and Harvard.
Graduate student Krishna Shrinivas and postdoc Benjamin Sabari are the lead authors of the paper, which appears in Molecular Cell on Aug. 8.
“A biochemical factory”
Every cell in an organism has an identical genome, but cells such as neurons or heart cells express different subsets of those genes, allowing them to carry out their specialized functions. Previous research has shown that many of these genes are located near super enhancers, which bind to proteins called transcription factors that stimulate the copying of nearby genes into RNA.
About three years ago, Sharp, Young, and Chakraborty joined forces to try to model the interactions that occur at enhancers. In a 2017 Cell paper, based on computational studies, they hypothesized that in these regions, transcription factors form droplets called phase-separated condensates. Similar to droplets of oil suspended in salad dressing, these condensates are collections of molecules that form distinct cellular compartments but have no membrane separating them from the rest of the cell.
In a 2018 Science paper, the researchers showed that these dynamic droplets do form at super enhancer locations. Made of clusters of transcription factors and other molecules, these droplets attract enzymes such as RNA polymerases that are needed to copy DNA into messenger RNA, keeping gene transcription active at specific sites.
“We had demonstrated that the transcription machinery forms liquid-like droplets at certain regulatory regions on our genome, however we didn't fully understand how or why these dewdrops of biological molecules only seemed to condense around specific points on our genome,” Shrinivas says.
As one possible explanation for that site specificity, the research team hypothesized that weak interactions between intrinsically disordered regions of transcription factors and other transcriptional molecules, along with specific interactions between transcription factors and particular DNA elements, might determine whether a condensate forms at a particular stretch of DNA. Biologists have traditionally focused on “lock-and-key” style interactions between rigidly structured protein segments to explain most cellular processes, but more recent evidence suggests that weak interactions between floppy protein regions also play an important role in cell activities.
In this study, computational modeling and experimentation revealed that the cumulative force of these weak interactions conspire together with transcription factor-DNA interactions to determine whether a condensate of transcription factors will form at a particular site on the genome. Different cell types produce different transcription factors, which bind to different enhancers. When many transcription factors cluster around the same enhancers, weak interactions between the proteins are more likely to occur. Once a critical threshold concentration is reached, condensates form.
“Creating these local high concentrations within the crowded environment of the cell enables the right material to be in the right place at the right time to carry out the multiple steps required to activate a gene,” Sabari says. “Our current study begins to tease apart how certain regions of the genome are capable of pulling off this trick.”
These droplets form on a timescale of seconds to minutes, and they blink in and out of existence depending on a cell’s needs.
“It’s an on-demand biochemical factory that cells can form and dissolve, as and when they need it,” Chakraborty says. “When certain signals happen at the right locus on a gene, the condensates form, which concentrates all of the transcription molecules. Transcription happens, and when the cells are done with that task, they get rid of them.”
A new view
Weak cooperative interactions between proteins may also play an important role in evolution, the researchers proposed in a 2018 Proceedings of the National Academy of Sciences paper. The sequences of intrinsically disordered regions of transcription factors need to change only a little to evolve new types of specific functionality. In contrast, evolving new specific functions via “lock-and-key” interactions requires much more significant changes.
“If you think about how biological systems have evolved, they have been able to respond to different conditions without creating new genes. We don’t have any more genes that a fruit fly, yet we’re much more complex in many of our functions,” Sharp says. “The incremental expanding and contracting of these intrinsically disordered domains could explain a large part of how that evolution happens.”
Similar condensates appear to play a variety of other roles in biological systems, offering a new way to look at how the interior of a cell is organized. Instead of floating through the cytoplasm and randomly bumping into other molecules, proteins involved in processes such as relaying molecular signals may transiently form droplets that help them interact with the right partners.
“This is a very exciting turn in the field of cell biology,” Sharp says. “It is a whole new way of looking at biological systems that is richer and more meaningful.”
Some of the MIT researchers, led by Young, have helped form a company called Dewpoint Therapeutics to develop potential treatments for a wide variety of diseases by exploiting cellular condensates. There is emerging evidence that cancer cells use condensates to control sets of genes that promote cancer, and condensates have also been linked to neurodegenerative disorders such as amyotrophic lateral sclerosis (ALS) and Huntington’s disease.
The research was funded by the National Science Foundation, the National Institutes of Health, and the Koch Institute Support (core) Grant from the National Cancer Institute.
The MIT Press has announced the release of a comprehensive report on the current state of all available open-source software for publishing. “Mind the Gap,” funded by a grant from The Andrew W. Mellon Foundation, “shed[s] light on the development and deployment of open source publishing technologies in order to aid institutions' and individuals' decision-making and project planning,” according to its introduction. It will be an unparalleled resource for the scholarly publishing community and complements the recently released Mapping the Scholarly Communication Landscape census.
The report authors, led by John Maxwell, associate professor and director of the Publishing Program at Simon Fraser University, catalog 52 open source online publishing platforms. These are defined as production and hosting systems for scholarly books and journals that meet the survey criteria, described in the report as those “available, documented open-source software relevant to scholarly publishing” and as well as others in active development. This research provides the foundation for a thorough analysis of the open publishing ecosystem and the availability, affordances, and current limitations of these platforms and tools.
The number of OS online publishing platforms has proliferated in the last decade, but the report finds that they are often too small, too siloed, and too niche to have much impact beyond their host organization or institution. This leaves them vulnerable to shifts in organizational priorities and external funding sources that prioritize new projects over the maintenance and improvement of existing projects. This fractured ecosystem is difficult to navigate and the report concludes that if open publishing is to become a durable alternative to complex and costly proprietary services, it must grapple with the dual challenges of siloed development and organization of the community-owned ecosystem itself.
“What are the forces — and organizations — that serve the larger community, that mediate between individual projects, between projects and use cases, and between projects and resources?” asks the report. “Neither a chaotic plurality of disparate projects nor an efficiency-driven, enforced standard is itself desirable, but mediating between these two will require broad agreement about high-level goals, governance, and funding priorities — and perhaps some agency for integration/mediation.”
“John Maxwell and his team have done a tremendous job collecting and analyzing data that confirm that open publishing is at a pivotal crossroads,” says Amy Brand, director of the MIT Press. “It is imperative that the scholarly publishing community come together to find new ways to fund and incentivize collaboration and adoption if we want these projects to succeed. I look forward to the discussions that will emerge from these findings.”
“We found that even though platform leaders and developers recognize that collaboration, standardization, and even common code layers can provide considerable benefit to project ambitions, functionality, and sustainability, the funding and infrastructure supporting open publishing projects discourages these activities,” explains Maxwell. “If the goal is to build a viable alternative to proprietary publishing models, then open publishing needs new infrastructure that incentivizes sustainability, cooperation, collaboration, and integration.”
Readers are invited to read, comment, and annotate “Mind the Gap” on the PubPub platform: mindthegap.pubpub.org