‘Automation of science’ bears the promise of enabling more science by making better decisions faster. In drug discovery, automated systems have had a long and fruitful history. For instance, high-throughput screening of specialized assays has become the standard in the pharmaceutical industry. Other applications of automated systems in drug discovery includes decision-support systems, computational molecular design, as well as robotic synthesis and hit identification. Nevertheless, the complete integration of all aspects of compound design, synthesis, testing and automated iteration through the molecular design cycle has not yet been consistently achieved, although there have been a few proof-of-concept studies.
New technologies, including ‘organ-on-a-chip’ and artificial intelligence (AI), are now providing the basis for the more widespread application of semi-autonomous or even fully autonomous processes to identifying and optimizing compounds in drug discovery. The benefits of automation include diminished measurement errors and reduced material consumption; shortened synthesize-and-test cycle times; fast feedback loops and compound optimization; and ‘objective’ molecular design free from personal bias. The application of sophisticated cell-based assays that recapitulate disease biology more effectively and thereby improve the likelihood of identifying compounds that show efficacy in humans requires more rigorous compound prioritization. Automated approaches could be particularly important in this case as these assays are not always suitable for high-throughput screening. Other applications of technological advances include automated high-throughput combinatorial synthesis, ‘big data’ and artificial intelligence as applied to the drug discovery and design cycle.
Challenges for Medicinal Chemistry
The drug design cycle has been defined as starting with results obtained by high-throughput compound screening, fragment screening, computational modeling, or data from the literature, and then proceeding through a feedback-driven discovery process, eventually leading to optimized hit and lead compounds. In the ideal case, the entire design cycle can be performed in software. Construction of adaptive de novo design algorithms require them to be equipped with both chemical knowledge (for in silico compound synthesis) and meaningful virtual screening models (as surrogates for laboratory-based biochemical and biological tests.) In addition, active learning algorithms enable chemical space navigation towards ‘hit’ compounds with the properties desired of active lead molecules. Medicinal chemists typically select, design, and prioritize molecular structures based on factors such as the desired biological activity of the compounds, including absorption, distribution, metabolism, excretion, and toxicity (ADMET properties.) In addition, they must determine the availability of compounds, as well as retrosynthetic analysis, if the compounds are being synthesized rather than sourced from existing libraries or commercial suppliers. As a consequence, medicinal chemists face challenging optimization problems across many different dimensions, with the relative importance of differing parameters changing as the discovery, design and optimization process progresses from initial screening hit identification towards the selection of viable candidates with the appropriate absorption, distribution, metabolism and excretion characteristics.
Over the past two decades, various concepts have emerged to help guide compound library design, hit-to-lead expansion and the enrichment of compound collections with new chemical entities that possess drug-like properties. Diversity-oriented synthesis (DOS) provides a rationale for generating collections of small molecules with diverse functional groups, stereochemistry, and frameworks in a controlled fashion. Biology-oriented synthesis (BIOS) takes natural products as templates for generating synthetically accessible derivatives and mimetics. Function-oriented synthesis (FOS) takes the BIOS concept to the next level by aiming to recapitulate or tune the function of a biologically active lead structure, in order to obtain simpler scaffolds, increase their ease of synthesis and achieve synthetic innovation. For each of these synthetic methods, artificial neural networks have contributed to rationalization of the drug-likeness concept in more sophisticated terms and enabled on-the-fly computational compound profiling. In addition, it has been recognized that compound quality may be improved by appropriate lead selection and optimization based on informed decisions rather than by the naive application of empirical rules. Generative and adversarial neural networks can infer weak rules that govern drug-receptor interactions and use them to guide the identification, selection, and optimization of better drugs. Today, fully fledged in silico decision-support systems greatly extend and augment such concepts and guidelines and assist medicinal chemists in multi-objective compound design, selection, and prioritization.
Towards automated de novo design
Today, the probabilities of the underlying research hypotheses are recorded as experimental metadata and stored in databases, which now enables automated semantic analysis, generating both revised design hypotheses as well as deriving new examples (chemical entities) for subsequent synthesis and testing. Numerous automated compound generators and selection operators have been conceived for this purpose, some of which use classes of ‘deep’ machine learning methods; for example, generative and recurrent neural networks, inverse quantitative structure–activity (QSAR) models, and reaction-based compound assembly techniques. De novo molecular design methods have now matured enough to be applicable in prospective settings and are receiving increasing attention. De novo molecular design methods have matured enough to be applicable in prospective settings and are now receiving increasing attention. It has not become clear that computational, and even de novo, design can deliver original synthesizable chemical entities with the desired binding, pharmacokinetic and pharmacodynamic properties.
Clearly, drug discovery is a challenging endeavor that requires skillful navigation in a multidimensional, multimodal search space. Activity ‘cliffs’ may affect lead optimization, and unexpected biochemical and pharmacological effects can derail lead compound expansion and development. State-of-the-art computational design can now reliably and reproducibly deliver new synthesizable chemical entities, either from existing fragments or ‘leads’, or entirely de novo, with desired properties for activity and kinetics. Compound selection strategies have shown their applicability to de novo design, which is not only useful for prioritizing chemically attractive lead-like and drug-like molecular structures but also relevant considering ligand–target promiscuity (estimates range between five and eleven pharmacologically relevant targets per drug). The next step will be to combine these and related techniques together with automated synthesis and rapid compound testing into an integrated drug discovery pipeline. Automated drug discovery could help to considerably reduce the number of compounds to be tested in a medicinal chemistry project and establish a rational unbiased foundation of adaptive molecular design.
Chapman, T. Lab automation and robotics: automation on the move. Nature 421, 661–666 (2003)
King, R. D. et al. The automation of science. Science 324, 85–89 (2009)
Sanderson, K. March of the synthesis machines. Nat. Rev. Drug Discov. 14, 299–300 (2015)
Schneider, G. Automating drug discovery. Nat. rev. Drug Discov. 17, 97-113 (2018)