NP3 MS Workflow is open-source software that makes it easier to map the chemistry of biodiversity and use this information to develop new medications
Researchers from the Brazilian Center for Research in Energy and Materials (CNPEM), the University of São Paulo (USP), and the University of Campinas (UNICAMP) have published an article in the journal ACS Analytical Chemistry introducing NP3 MS Workflow, open-code software that can be used by scientists worldwide to discover new pharmaceuticals.
This new development is intended to accelerate the identification and chemical annotation of natural products with biological activity that may be key in discovering new pharmaceuticals. To promote chemical mapping of biodiversity, the software allows users to rapidly identify different molecules present in samples of natural products, which are complex mixtures of molecules that are initially unknown. The software also can annotate chemical structures to known molecules and distinguish molecules that are already known to science and recorded in databases from those that are still unknown. Additionally, NP3 MS Workflow can correlate molecules from biological sources with data from biological assays, indicating bioactive molecules in complex mixtures of natural products (such as extracts from plants, bacteria, and fungi).
Natural products and new analysis methods
Natural products or specialized metabolites are known to be the source of over half of all the medications that have been developed worldwide. But even though roughly 300,000 natural products have been discovered over the past 100 years, significant technical difficulties in researching them have impeded their integration into modern processes of medication discovery. Developing more efficient analysis methods is therefore very important to these fields of study, accelerating new discoveries and making it feasible to use natural products in current discovery platforms.
The new software uses the untargeted metabolomics approach, based on experimental data obtained through liquid tomography tandem mass spectrometry (LC-MS/MS). This technique is used to separate and identify different components (molecules) in a mixture, even when the quantities are small and the mixture is complex, such as extracts from plants, bacteria, and fungi.
Chromatography is an essential technique in analytical chemistry employed to separate the components of a mixture. In this process, the complex mixture (a mixture of molecules) is injected into a chromatographic column (the stationary phase), and a solvent (the mobile phase) is used to elute the components of the sample (molecules) according to their affinity with the two phases of the system (mobile and stationary). This causes the components of the sample (the molecules) to migrate differently in the chromatography, resulting in the separation of the molecules of the sample according to their affinity for the mobile and stationary phases. In the system that incorporates mass spectrometry, the separated molecules are then automatically injected into a mass spectrometer.
Mass spectrometry is an analytical technique that makes it possible to identify and quantify compounds. In this method, molecules are ionized to acquire a charge and then detected according to their mass-to-charge ratio (m/z). The research in question used a quadrupole time of flight (Q-TOF) hybrid system to select and determine the m/z ratio of each ion generated. This method makes it possible to calculate the exact mass of the molecule in question, which is directly related to its chemical formula (the combination of atoms that comprise it). Additionally, the molecules are fragmented in a collision chamber within the device, and the resulting fragments are also measured to determine their m/z ratio and intensity. This produces a spectrum of the masses of the sample (m/z ratio for each ion detected) and researchers can collect the respective fragmentation spectra (MS/MS spectrum) characteristic of each ionized molecule.
The MS/MS spectrum is related to the chemical structure of the molecule, in other words, how the atoms in the molecule are organized in space. The MS/MS spectrum can be considered the “digital fingerprint” of each molecule, since the same molecules have the same MS/MS spectrum and similar molecules (or those that share similar chemical substructures) have similar MS/MS spectra. This data for known molecules (such as databases of MS/MS spectra) can be used to annotate the chemical structure of identical or similar molecules. This process of indicating known molecules within an unknown sample is called dereplication. It can also indicate MS/MS spectra that do not exhibit similarity with known molecules, thus signaling potentially new molecules in the sample. From the collected spectra, molecular networks of similarity between these spectra can be generated (individual matching) and grouped according to their spectral similarity, which also infers chemical similarity.
According to Daniela Trivella, the corresponding author of the article and a researcher at the Brazilian Biosciences National Laboratory (LNBio) at CNPEM, “the LC-MS/MS technique is relatively recent, and has evolved significantly in this area of research on natural products. It allows you to directly manipulate virtually all the molecules that are present. This experiment is done quickly, in an automated and miniaturized manner. It uses very small quantities of samples from natural products in mixtures: we inject roughly 1 microliter of the sample into the device. This is very beneficial so that we can quickly access the chemistry of the biodiversity with minimal environmental impacts.”
Computational challenges of the method
This method’s ease of use comes along with a major bottleneck: processing the data collected. Despite promising developments over the past decade in the computational aspects of this analysis method, there were still major hurdles to handling the data collected and sensitive and precise definition of the fragmentation spectra and their relationship with the molecules actually present in the complex samples of natural products. For example, in a single analysis the same molecule may be detected multiple times (multiple ions), producing multiple fragmentation spectra. Analysis of a sample containing thousands of molecules can consequently result in a quantity of collected spectra that is orders of magnitude higher. Furthermore, the dissection of very similar molecules with different elution in the chromatography (such as molecular isomers) was ignored. Detection of minority spectra (low signal to noise) was suboptimal, resulting in the elimination of these spectra that were less apparent but still very important. And quantification, for spectral data within a sample and even more so between samples, was extremely compromised or unfeasible.
From an operational point of view, multiple software tools and file conversion were required, with significant hands-on activity and specialist time to extract the spectra from the dataset with a certain degree of precision. In this way, using untargeted metabolomics based on LC-MS/MS to research natural products had been limited to analysis of very few samples at a time. “NP3 MS Workflow was designed to facilitate the use of untargeted metabolomics in various fields, maintaining scientific rigor and the limits of LC-MS/MS, using the data generated from an LC-MS/MS experiment in the best way possible. The software uses the same data generated in the experiment to correlate the m/z ratios detected in a certain time of chromatographic retention with the fragmentation spectra. Additionally, the noise data from the measurement is also utilized in the analyses. NP3MS Workflow verifies and aligns different samples automatically and with high precision. It can separate out the noise very well and maintain spectra, even minority spectra, in the dataset. This is very relevant for this area, after all the newest compounds that we are not yet familiar with are generally minorities,” explains Trivella.
In untargeted metabolomics studies, researchers never know exactly what they may find. For this reason, quantifying the compounds present in these samples is also a major challenge, since there is no clear comparison for scientists to base their analyses on. “Another major gain we had with NP3 MS Workflow was working with quantification related to the spectra collected in a series of samples from a single matrix, allowing us to use this information to correlate with data on biological activity,” she adds.
In this way, after successfully extracting fragmentation spectra and verifying their attributes for chromatographic retention time, corresponding m/z ratio, ion type, relative quantity, and annotated chemical structure, NP3 MS Workflow is able to chemically represent the sample of natural products which may contain unknown molecules. This makes it possible to map the chemistry of biodiversity and compare the chemical diversity and abundance of different samples obtained from different biological sources or distinct collection locations.
Within the context of pharmaceutical discovery, it can also be used to correlate data from biological assays (such as anti-cancer activity) with the molecules present in the different samples. To do so, the samples are previously analyzed against a biological assay (for activity in cancer cells, for example) and a value for biological activity is verified for each sample. Today these analyses are also conducted quickly and at a miniaturized scale, using nanoliters of samples from natural products.
NP3MS Workflow directly utilizes the values for biological activity extracted from bioassays for different samples (active, inactive, and partially active) with the MS/MS spectra measured in each sample, correlating the presence of the molecules with the biological activity in each sample. This generates an index of biological activity for each MS/MS spectrum, allowing researchers to rank the molecules present in the sample according to the probability that each one is the compound responsible for the measured biological activity (such as anti-cancer effects) directly in the complex mixture of natural products. From this, this candidate MS/MS spectrum and its chemical annotation are assessed regarding their chemical relevance to the therapeutic area under study. This allows researchers to analyze thousands of samples of natural products and prioritize samples and molecules to refine chemical research on the natural product in question, as well as the pharmaceutical development plan.
The small quantity of sample required and the large number of samples that can be analyzed together in this analysis method are a major benefit for studies of natural products, at least during the initial research phases. Traditional methods require collection of large quantities of material in order to process and isolate the various molecules present in the sample, and only then could analysis be done using processes such as nuclear magnetic resonance to define the chemical structures and their biological activities. This process can involve years of work on each sample, and often results in the rediscovery of molecules that are already known to science. “The use of untargeted metabolomics via LC-MS/MS allows direct assessment of a complex mixture and a large number of samples at the same time. This method does not replace confirmation of the molecules using traditional methods, but analyzes many samples at the same time, speeding up analysis of large collections of natural products and providing support to decide which samples and molecules are most interesting for more in-depth evaluation. This saves a lot of time and significantly broadens the scale of the analyses,” reinforces Trivella.
Open-source software for the scientific community
Developing the software required major interdisciplinary collaboration involving researchers from various areas including mathematics, computation, biology, and chemistry. And because it is open-source, improvements in the future will be much easier.
“The community that researches natural products helps a lot. We ourselves used some open-source resources to develop this software, and it itself is also open-source. In other words, other researchers in the area can use and help us to improve it over time, and develop increasingly efficient tools for these assessments,” she adds.
Immediate impacts and prospects
Even though hundreds of thousands of natural products have been reported, many of these molecules still do not have demonstrated biological activity. Furthermore, many molecules from biodiversity sources are still unknown. Reducing the bottlenecks involved with data analysis is extremely important to broaden our knowledge of the chemistry of biodiversity as well as our capacity to discover new medications.
Brazil’s enormous biodiversity is a particularly important differential for the development of pharmaceuticals in the country. The plants, fungi, and bacteria present in the nation’s biomes ensure a significant competitive advantage for this development in Brazil. This new tool will permit significant advances in mapping the chemistry of Brazilian biodiversity and developing new medications here.
The article was produced with support from the Serrapilheira Institute, FAPESP, and the Brazilian Ministry of Science, Technology and Innovation (MCTI).
About CNPEM
With a sophisticated and vibrant environment for research and development that is the only one of its kind in Brazil and found in only a few scientific centers in the world, the Brazilian Center for Research in Energy and Materials (CNPEM) is a private, non-profit organization overseen by the Ministry of Science, Technology and Innovation (MCTI). The Center operates four national laboratories and Sirius, the most complex project in Brazilian science and one of the world’s most advanced synchrotron light sources. CNPEM is home to highly specialized multi-thematic teams, globally competitive lab infrastructure that is open to the scientific community, strategic lines of research, innovative projects in partnerships with the productive sector, and training for researchers and students. The Center is an environment driven by research into solutions that impact the areas of health, energy and renewable materials, agri-environmental, and quantum technologies. In 2022, with support from the Brazilian Ministry of Education (MEC), CNPEM expanded its activities with the opening of the Ilum School of Science. This interdisciplinary program in science, technology, and innovation implements innovative ideas to provide a high quality free and full-time undergraduate education immersed in the research environment at CNPEM. The CNPEM 360 Platform provides visitors with a virtual immersive visit to the Center’s main environments and activities. Visit at: https://pages.cnpem.br/cnpem360/.