Open-source generation of sigma profiles: impact of quantum chemistry and solvation treatment on machine learning performance

Abstract

The combination of machine learning (ML) models with chemistry-related tasks requires the description of molecular structures in a machine-readable way. The nature of these so-called molecular descriptors has a direct and major impact on the performance of ML models and remains an open problem in the field. Structural descriptors like SMILES strings or molecular graphs lack size-independence and can be memory intensive. Machine-learned descriptors can be of low dimensionality and constant size but lack physical significance and human interpretability. Sigma profiles, which are unnormalized histograms of the surface charge distributions of solvated molecules, combine physical significance with low dimensionality and size-independence, making them a suitable candidate for a universal molecular descriptor. However, their widespread adoption in ML applications requires open access to sigma profile generation, which is currently not available. This work details the development of OpenSPGen – an open-source tool for generating sigma profiles. Also presented are studies on the effect of different settings on the efficacy of the generated sigma profiles at predicting thermophysical material properties when used as inputs to a Gaussian process as a simple surrogate ML model. We find that a higher level of theory does not translate to more accurate results. We also provide further recommendations for sigma profile calculation and use in ML models.

Graphical abstract: Open-source generation of sigma profiles: impact of quantum chemistry and solvation treatment on machine learning performance

Supplementary files

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

Article type
Paper
Submitted
05 Mar 2025
Accepted
29 Jul 2025
First published
12 Aug 2025
This article is Open Access
Creative Commons BY license

Digital Discovery, 2025, Advance Article

Open-source generation of sigma profiles: impact of quantum chemistry and solvation treatment on machine learning performance

F. Y. M. Salih, D. O. Abranches, E. J. Maginn and Y. J. Colón, Digital Discovery, 2025, Advance Article , DOI: 10.1039/D5DD00087D

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements