Graph-based intermolecular interaction prediction enables rational design of co-crystals

Xiurong Yang; Ying Wang; Linhu Pan; Ruihui Wang; Yi Wang; Honglei Xia; Siwei Song; Qinghua Zhang

doi:10.1039/D5TA02975A

View PDF Version

DOI: 10.1039/D5TA02975A (Paper) J. Mater. Chem. A, 2025, Advance Article

Graph-based intermolecular interaction prediction enables rational design of co-crystals†

Xiurong Yang‡ ^ab, Ying Wang‡^ab, Linhu Pan^ab, Ruihui Wang^ab, Yi Wang^ab, Honglei Xia^ab, Siwei Song*^ab and Qinghua Zhang*^ab
^aNational Key Laboratory of Solid Propulsion, Northwestern Polytechnical University, Xi'an, Shanxi 710072, China. E-mail: ssw_sv@nwpu.edu.cn
^bSchool of Astronautics, Northwestern Polytechnical University, Xi'an, Shanxi 710065, China. E-mail: qinghuazhang@nwpu.edu.cn

Received 15th April 2025 , Accepted 14th June 2025

First published on 27th June 2025

Abstract

Co-crystal engineering has emerged as a promising strategy to tailor the properties of organic materials, yet the mechanistic principles governing co-crystallization remain underexplored. Here, we systematically analyze co-crystal formation using intermolecular interaction as the criterion from a thermodynamic perspective. Building on these energetic insights, a computational framework integrating intermolecular interaction analysis and machine learning methods to enable rational design of co-crystals was established. First, anomaly detection via local outlier factor (LOF) algorithms preliminarily screened out abnormally designed co-crystals based on energy decomposition analysis of existing co-crystals. Next, a comparative energy benchmark was established, requiring that the intermolecular interaction between the two designed molecules is more stable than that in their individual component, ensuring thermodynamic feasibility. Furthermore, a graph neural network (GNN) model was implemented to rapidly predict intermolecular interaction profiles, enabling efficient co-crystal design. This machine learning-enhanced workflow accelerated the identification of two novel urea-based co-crystals, which were experimentally synthesized and characterized. Our findings not only quantify the energy steering supramolecular assembly but also break through the traditional method of co-crystal classification, providing new perspectives for the mechanism and prediction of co-crystal formation.

Introduction

Co-crystallization enables the design of functional materials through non-covalent assembly of molecules, offering advantages in cost, structural flexibility, and solution processing capability for pharmaceutical and material applications.^1–3 However, the selection of suitable coformers for making co-crystals remains a significant challenge, impeding the development of co-crystals. Recent advances in computational materials science have revolutionized co-crystal design strategies.^4–7 The integration of data-driven machine learning (ML) with high-throughput computational screening demonstrates remarkable success, enabling efficient generation and evaluation of potential coformer combinations,^8–11 as exemplified by Jiang et al.'s deep learning framework with transfer learning for co-crystal formation prediction with >96% accuracy and experimental validation.¹² Nevertheless, inherent limitations in accurately capturing non-co-crystallizing pairs (negative samples) persist in current ML binary classification methodologies, which are crucial for robust model training and reliable prediction.^13–15

The scarcity of experimentally confirmed negative samples stems from publication bias favoring successful cases in crystallization studies. This imbalance of the positive and negative samples is evident in early methodologies. Vriza et al. exclusively trained models using positive π–π co-crystal data,¹³ while Soroush Ahmadi's dataset exhibited a severe 5394 [thin space (1/6-em)] :994 positive-to-negative ratio.¹⁶ To address this limitation, computational strategies like network-based link prediction¹⁷ and molecular similarity metrics¹⁴ have been employed to generate synthetic negative samples. However, these rule-based approaches lack experimental validation, potentially compromising model reliability in real-world applications. Furthermore, it is challenging to determine whether these negative samples (failed co-crystals) result from experimental technical limitations or an inherent inability of the components. This ambiguity underscores the critical need for establishing a mechanism-based criterion to evaluate the co-crystallization potential of selected coformers.

Over the past few decades, the development of supramolecular chemistry has substantially advanced our understanding of the co-crystallization mechanism, defining co-crystals as multicomponent systems stabilized by non-covalent interactions.^18,19 Particularly, the supramolecular synthon concept, derived from statistical analyses of molecular interactions and involved functional groups, provides a theoretical foundation for coformer pairing.²⁰ The supramolecular assembly of co-crystals is predominantly governed by directional non-covalent interactions, including hydrogen bonding, van der Waals forces, π–π stacking, and electrostatic forces. For instance, molecular structures featuring complementary hydrogen bond donors/acceptors (e.g., –COOH and –NH₂) and π-conjugated systems (e.g., benzene rings) exhibit enhanced co-crystallization propensity, as validated across pharmaceutical and energetic material systems.^21–25 Advanced computational techniques, including symmetry adapted perturbation theory (SAPT),^26–28 Hirshfeld surface analysis and electrostatic potential mapping, have been used to elucidate the critical role of these interactions in co-crystalline systems.^29–32 These methods have also been successfully applied to guide the design of co-crystals and unveil their formation mechanism.^16,33 Furthermore, the analysis of the attention mechanism and feature importance in classification models also highlights the significance of hydrogen bond acceptors and donors. These insights into interactions shed light on a theoretical inspiration for the rational selection of coformers from a thermodynamic viewpoint.

Given the great potential of intermolecular interaction for predicting co-crystal formation, this work intends to establish an energy-based evaluation criterion for co-crystallization processes in energetic materials. To comprehensively analyze intermolecular interactions in co-crystals, we adopted a molecular cluster approach by resolving the crystal structure into multiple dimers, following the methodology established in our previous work.^34–36 Then, the SAPT energy decomposition analysis was employed to quantify weak interactions between components of dimers, including electrostatic and dispersion components.^37–39 By conducting statistical analysis of the distribution of energy values, we investigated the correlation between intermolecular interactions and co-crystal architectures, identifying dominant energy contributions during crystallization. Thereby, we proposed a thermodynamic criterion for evaluating the formation of designed co-crystals through a multi-step computational framework. First, Local Outlier Factor (LOF) algorithms were employed for outlier detection to identify abnormal co-crystal candidates. Then, an energetic benchmark was developed by comparing the intermolecular interaction between designed molecular pairs and their individual crystals, validating thermodynamic feasibility of co-crystallization. Finally, a graph neural network (GNN)-based framework was implemented to rapidly predict intermolecular interactions between co-crystal components (Fig. 1). With the help of the framework, two novel urea-based co-crystals were experimentally synthesized and characterized. This ML energy-based methodology avoids the inherent limitations of sample imbalance in traditional classification models, offering a new strategy for the design and screening of energetic co-crystals.


	Fig. 1 Overview of the co-crystal screening framework. (a) Criteria for screening data from the CSD. (b) Extraction of dimer configurations from co-crystals. (c) Energy decomposition analysis of intermolecular interactions in dimers. (d) Comparison of intermolecular interactions between the co-crystal (E_AB) and individual component (E_AA and E_BB). (e) Architecture of the GNN model. (f) Illustration of the message-passing mechanism and global attention mechanism. (g) Representation of the global state descriptor u used in the model.

Results and discussion

Data collection for co-crystals

Understanding the aggregation mechanisms of molecules into crystalline structures, particularly co-crystals, requires accurate characterization of intermolecular interactions and crystal packing motifs. To better reflect the true nature of intermolecular interactions in co-crystals, obtaining reliable crystal structure information is essential. Hence, the crystallographic information files for co-crystals were screened out from the Cambridge Structural Database (CSD) according to the following conditions, as shown in Fig. 1a:

1. Only containing two chemically distinct polyatomic units.

2. Having three-dimensional structures and excluding structures exhibiting positional disorder.

3. The density range of samples is 1.3–2.2 g cm⁻³.

4. Only containing C, H, O, N, F, Cl, Br, and I elements.

5. Being neutral components to exclude salts.

6. Some unrecognized structures were deleted.

7. Excluding crystals with nitrogen mass fractions below 20%.

8. Discarded systems containing >30 heavy atoms per molecule.

Since our team mainly focuses on the application of co-crystal technology in the field of energetic materials, the screening conditions we set are mainly to cover energetic co-crystals as much as possible, and most drug-like co-crystals are excluded. Consequently, 2100 crystal structures were retrieved using the CSD Python API (Data S1†). This dataset includes 1034 solvates due to their analogous formation mechanisms with co-crystals. The solvates containing H₂O were classified as hydrates, while those incorporating organic solvents such as methanol or acetone were categorized as non-aqueous solvates. There are 687 hydrates and 347 non-aqueous solvates.

Analysis of intermolecular interaction of dimer samples

To more fully analyze the co-crystal component interactions, multiple dimer configurations were extracted from the crystal, as illustrated in Fig. 1b, c and Methods. Using PSI4 SAPT calculations, weak intermolecular interactions were computed for 24 [thin space (1/6-em)]

760 validated dimer structures (Data S2†). It must be noted that the total energy equals the summation of E_exc, E_ele, E_ind and E_dis, as shown in eqn (1). As shown in Fig. 2a, energy distributions reveal similar patterns across co-crystals, hydrates, and solvates. In the range of co-crystal distribution, dispersion energy E_dis and electrostatic energy E_ele were generally distributed within 0–−122 kJ mol⁻¹, and induction energy E_ind was concentrated in the range of 0 ∼ −34 kJ mol⁻¹. The exchange repulsion E_exc of co-crystals also exhibited a broad distribution, which was equivalent with that of E_ele. Overall, E_ele and E_dis make great contributions in most structures. A comparative analysis of energy distributions across co-crystals, hydrates, and solvates reveals strikingly similar trends, suggesting analogous formation mechanisms. This similarity is further supported by examining the structure with the strongest intermolecular interaction in co-crystals, whose energy distribution is illustrated in Fig. 2b.


	Fig. 2 Energy distribution analysis. (a) Distribution of E_dis, E_ele, E_ind, and E_exc for all dimers in co-crystals, hydrates and solvates. Green represents E_dis, pink represents E_exc, orange represents E_ind, and blue represents E_ele. (b) Distribution of E_dis, E_ele, E_ind, and E_exc for the most stable dimers in co-crystals, hydrates and solvates.

It is also observed that in the structure with the strongest interaction, parts of the structures are the same molecules rotating and displacing. With analysis of the crystal cell of a co-crystal, this displacement structure was produced mainly due to crystal cell expansion, and the rotation structure was produced due to the asymmetric units in the unit cell, as illustrated in Fig. S1.† This demonstrates that interaction between the same components is critical for crystallization. As a result, we studies whether the most stable structures (with more powerful intermolecular interactions) were composed of similar or different molecules, and the distribution of energy is shown in Fig. S2.† Specifically, these interacting molecules with strong intermolecular interactions play a key role in the nucleation period of crystal growth.^40,41

The extracted dimer structures and their corresponding intermolecular interactions were systematically analyzed to find regular patterns and correlations. As illustrated in Fig. S3,† the dominant intermolecular interactions are hydrogen bonding and π–π stacking in most structures. To further elucidate the relationship between energy contributions and structures, we examined the energy partitioning (E_dis, E_ele, and E_ind) for the most stable dimer in each co-crystal, as shown in Fig. 3. E_dis% and E_ele% represent the value of E_dis/(E_dis + E_ele + E_ind) and E_ele/(E_dis + E_ele + E_ind), respectively. The analysis reveals that dispersion E_dis (E_dis% > 50%) dominates in 277 out of 938 co-crystals, while electrostatic interactions (E_ele% > 50%) play a more significant role in 606 cases. This trend extends to solvates and hydrates; the dimer with (E_dis% > 50%) and (E_ele% > 50%) has a total ratio of (264 [thin space (1/6-em)] :306) and (502:563), respectively. Notably, dimers with E_dis contributions between 50% and 90% universally exhibit parallel molecular packing, whereas those with E_ele contributions in the 50–80% range consistently feature hydrogen bonding. These findings are further supported by the four strongest interaction configurations shown in Fig. S4,† where the maximum E_dis and E_ele values correspond to π–π stacking and hydrogen bonding structure, respectively. Even in systems where neither E_dis nor E_ele exceeds 50% as shown in Fig. S5,† the majority of structures retain both hydrogen bonding and stacking interactions. This comprehensive energy-structure correlation analysis demonstrates that the structural features of co-crystals can be effectively characterized through their intermolecular interaction profiles.


	Fig. 3 Energy ratio analysis of the strongest interaction structures. The circular diagram illustrates the ratio of structures with different E_dis% and E_ele% values in the strongest interaction structures. Only values within the range of 0% to 100% were considered. The outer layer represents the distribution of E_dis%, and the inner layer depicts the distribution of E_ele%.

Energy-based discrimination via LOF algorithm ratios

Given the observed correlation between regular energy partitioning patterns and dimer structures, we developed an outlier detection framework to assess the potential of co-crystallization. Calculated interactions comprising the E_dis, E_ele, E_exc and E_ind from the most stable hetero-molecular dimers in each co-crystal (Data S3†) were analyzed using the LOF algorithm, as described in Methods. The distribution of these four energies is shown in Fig. S2.† As shown in Fig. 4A, 90% of the samples exhibit LOF scores below 1.3, indicating a dense data distribution. This is further supported by the energy distribution plot in Fig. 4B, where data points within dense regions are represented by smaller green spheres, reflecting lower outlier scores. These high-density regions, which reflect the energy preferences of existing co-crystals, correspond to energetically favorable configurations, indicating a higher probability of co-crystal formation. Specifically, the smaller the outlier score of a given energy configuration, the greater its probability of stabilizing a co-crystal structure.


	Fig. 4 Co-crystal formation criterion. (A) The outlier scores for the intermolecular interactions of the most stable dimers. (B) The energy values and outlier scores are visualized in three dimensions (E_dis, E_ele, and E_ind) and their two dimensional projections. A smaller, green-colored sphere represents a lower outlier score. (C) Comparison of intermolecular interactions between the co-crystal and single component. (D) Comparison of intermolecular interactions for failed co-crystals. The E_ABML is the intermolecular interaction predicted by the GNN model.

Criteria for co-crystal formation from a thermodynamic viewpoint

It is obviously not convincing enough to use the energy decomposition analysis results as the only formation criterion. An additional evaluation criterion for co-crystal formation was proposed from the perspective of supramolecular chemistry. As shown in Fig. 1d, when a co-crystal AB is formed, the original crystal structures of components A and B are disrupted, implying that the heteromolecular intermolecular interaction must exceed the strongest homomolecular interactions (E_AA and E_BB).⁴¹ To validate this criterion, we systematically compared the strongest E_AB in extracted dimers with the strongest E_AA and E_BB in single-component crystals for partial co-crystal systems. As summarized in Table S1† and Fig. 4C, E_AB was stronger than or equal to at least one of the single-component energies (E_AA or E_BB). However, there are some exceptions, such as the hexanitrohexaazaisowurtzitane (CL-20) co-crystals and nitrofurantoin co-crystals, BIZZAO, HAHYAU, JAQVOP, LEWTAK, LIGYOR, etc., where the intermolecular interaction energies are significantly weaker than those of the corresponding single-component crystals.

To gain deeper insights into the initial stages of co-crystal formation, we investigated the most stable configuration of molecular pairs (A and B), which likely represents the first step in co-crystal nucleation. Using the Molclus program,⁴² we generated the energetically most favorable dimer configurations, denoted as AB*, with a detailed methodology provided in ESI Methods.† Notably, as shown in Fig. S6,† some AB* configurations optimized by Molclus differ from their counterparts AB in co-crystals, suggesting that structural rearrangements occur during co-crystallization due to factors such as lattice constraints, multi-body interactions, and steric effects. This discrepancy is particularly pronounced for CL-20, where the cage-like structure is highly sensitive to surrounding molecular environments, leading to significant deviations between the optimized AB* and AB in co-crystals.

The intermolecular interaction E_AB* of these optimized dimers was compared with E_AA and E_BB extracted from single-component crystals, as summarized in Table S1.† As shown in Fig. 4C, intermolecular interaction E_AB* of the optimized dimers is generally comparable to or stronger than the corresponding single-component energies (E_AA and E_BB). This trend is further supported by the quantity 2E_AB* − E_AA − E_BB, which demonstrates the thermodynamic stability of E_AB* in most systems. However, certain CL-20 and nitrofurantoin co-crystals exhibit lower-than-expected E_AB*, which may arise from the neglect of experimental conditions, incomplete sampling of the most stable AB* configurations and unexpected binding sites between A and B molecule. For instance, molecule A may combine with molecule B at sites exhibiting weaker E_BB interactions. Furthermore, the elevated temperatures and polymorphic transformations may disrupt the native structures of A and B crystals, thereby promoting the formation of AB*.⁴³ Additionally, the failed co-crystal systems reported by Jiang et al.¹² were analyzed to validate the reliability of our co-crystal formation criterion; more details are shown in ESI Methods.† Fig. 4D demonstrates that in certain failed co-crystals, the intermolecular interactions of single-component (E_AA and E_BB) significantly exceed E_AB*. These results further validate the feasibility and rationality of our proposed co-crystal formation criterion from a complementary perspective. In general, both E_AB or E_AB* are expected to significantly exceed the intermolecular interaction of the single-component E_AA and E_BB. At a minimum, E_AB should be greater than either E_AA or E_BB, with a maximum energy gap not exceeding 50 kJ mol⁻¹, while the E_AB* should ideally surpass the intermolecular interaction of all single components. Based on these principles, we established a guideline for co-crystal design. However, since the calculation via PSI4 is computationally intensive and E_AB for newly designed components is unknown, we propose leveraging ML models to rapidly predict these intermolecular interactions, thereby accelerating the co-crystal screening and design process.

GNN model for intermolecular interaction prediction

We developed two distinct GNN frameworks for intermolecular interaction prediction (see Fig. 1e–g and Methods), each tailored to specific applications. The first model predicts the intermolecular interaction E_AB between designed coformers, trained on the strongest hetero-dimer intermolecular interaction from known co-crystals (Data S3†). Since the relative location information of designed coformers is unknown, this GNN model excludes the global state descriptor u and relies solely on molecular graph features; further implementation details are provided in ESI Methods.† The model achieves a mean absolute error (MAE) of 5.80 kJ mol⁻¹ and a coefficient of determination (R²) of 0.76. This framework was successfully applied to predict intermolecular interaction E_ABML for both failed and designed co-crystal systems. As illustrated in Fig. 4D, the E_ABML for failed co-crystals was compared with single-component energies (E_AA and E_BB) to elucidate the reasons for co-crystallization failure. Additionally, Tables S5 and S6† summarize the E_ABML for designed coformers, providing insights into their co-crystal formation probability.

The second model is designed to predict intermolecular interaction for dimers extracted from co-crystals. A large dataset comprising 18 [thin space (1/6-em)] 204 calculated energy values was split into training and test sets in an 8:2 ratio. To distinguish between dimers with identical components but differing intermolecular interactions, descriptors encoding relative positional information were incorporated as the global state descriptor u and concatenated with molecular graph features to form comprehensive descriptors. To evaluate the impact of these features, we conducted four comparative experiments with different feature combinations. The first model which only uses the molecular graph as features is labeled as Graph. The second model uses a feature combination of the molecular graph and smooth overlap of atomic positions (SOAP) descriptors, marked as Graph + SOAP, while the third model combines the molecular graph and intermolecular distance descriptors, labeled as Graph + DIST. At last, a combination of all descriptors was marked as Graph + SOAP + DIST. In addition, neural network models without graph features are also discussed. This systematic approach allows us to assess the contribution of each feature type to the predictive accuracy of the model.

Table 1 summarizes the performance of the models in predicting intermolecular interaction using different feature combinations. The Graph model, which relies solely on molecular graph features, exhibits poor predictive capability, achieving an R² of 0.13 and an MAE of 12.22 kJ mol⁻¹ on the test set. By augmenting the graph features with SOAP descriptors, the model's performance significantly improves, yielding an R² of 0.88 and an MAE of 2.68 kJ mol⁻¹. Similarly, incorporating DIST descriptors further enhances the model, resulting in an R² of 0.97 and an MAE of 1.18 kJ mol⁻¹. The most accurate Graph + SOAP + DIST model combines all feature types and achieves exceptional performance, and the R² and MAE on the test set are 0.97 and 1.17 kJ mol⁻¹, respectively. This enhancement is particularly critical given that most structures share similar node and edge features, making it essential to capture subtle differences in molecular configurations through comprehensive descriptors. This also shows that the relative position information of the components in the dimer configuration is very important for accurately predicting the intermolecular interactions. The neural network model trained without graph features achieved an R² of 0.97 and an MAE of 1.26 kJ mol⁻¹ on the test set.

Table 1 Performance evaluation of models in predicting intermolecular interactions using different descriptors

Descriptors	Train		Test
Descriptors	R²	MAE (kJ mol⁻¹)	R²	MAE (kJ mol⁻¹)
Graph	0.18	12.14	0.13	12.22
Graph + SOAP	0.97	2.02	0.88	2.68
Graph + DIST	0.99	1.03	0.97	1.18
SOAP + DIST	0.99	1.23	0.97	1.26
Graph + DIST + SOAP	0.99	1.06	0.97	1.17

As shown in Fig. 5A, the performance of the Graph + SOAP + DIST model in predicting individual energy components (E_dis, E_ele, E_ind, E_exc and E_tot) was evaluated using the kernel density estimation (KDE) to analyze the data distribution. The R² between predicted and true values was calculated for each energy component. The dispersion energy E_dis model demonstrates the highest accuracy, achieving an R² value of 0.99 on the test set. The E_exc, E_ele and E_ind models also perform well, with R² values of 0.97, 0.97 and 0.96, respectively, and larger deviations are observed for few points. Overall, the predicted values closely match the true values in high-density regions of the data distribution, demonstrating the high accuracy and reliability of the model. Furthermore, compared to other models in the literature,^44,45 our approach, which is based on true structural descriptors, exhibits superior accuracy and computational efficiency.


	Fig. 5 Model performance evaluation. (A) Density distribution of true and predicted values on the test set. (B) Error distribution of predicted values on the test set.

To enhance the capability and interpretability of the GNN, we incorporated an attention mechanism, which identifies and prioritizes critical structural features from complex molecular data. In graph-level tasks, global attention scores are computed based on node features, enabling the optimization of the feature space and providing insights into the model's decision-making process. For the dimers in co-crystal ACETAF01, as illustrated in Fig. 6A, the deeper color intensities on the benzene ring, hydroxyl group, and amino group highlight the focus of the model on regions involved in π–π conjugation and hydrogen bonding. In contrast, Fig. 6B shows dimers lacking significant hydrogen bonds, where the dark green regions emphasize the importance of π–π stacking interactions. These results demonstrate that the attention mechanism effectively captures key structural motifs governing intermolecular interactions, thereby significantly enhancing the model's predictive accuracy and interpretability.


	Fig. 6 Model applicability assessment including the attention score and sensitivity to distance. (A) Attentional visualizations for co-crystal ACETAF01. (B) Attentional visualizations for co-crystal LAZPIO, where greener regions indicate higher attention weights. (C) Heatmap of intermolecular interaction energies for dimers in ACETAF01 and LAZPIO at varying stretched distances. The horizontal and vertical axes represent 100 configurations, and the color bar indicates the difference in intermolecular interaction energy. Darker shades in the color bar represent closer energy values.

To validate the reliability and applicability of the model, we systematically adjusted the intermolecular distance in the co-crystal ACETAF01 and predicted the corresponding intermolecular interaction. The distance between molecules was incrementally stretched by 0.05 Å along the vector of the shortest intermolecular contact, generating 100 configurations over a range of 0–5 Å. As shown in Fig. 6C, the predicted intermolecular interactions exhibit a gradual decrease as the distance increases by 0.25 Å. However, within the range of 0.1–0.25 Å, the energy remains nearly constant, indicating that the model is insensitive to small variations in intermolecular distance. As a result, the model shows significant deviations when predicting intermolecular interactions for structures optimized by Molclus, as evidenced by the comparison in Table S2.† This limitation suggests that the model is best suited for predicting intermolecular interactions within co-crystal environments, while the PSI4 module remains the preferred method for calculating energies of Molclus-optimized dimers.

Design and synthesis of new co-crystals

Based on the intermolecular interaction rules and prediction model established above, several molecules were designed as potential coformers for CL-20 and urea co-crystals.^46,47 However, the selected CL-20 co-crystals were not successfully synthesized, despite the promising predictions in the ESI.† This outcome may be attributed to limitations in our synthetic techniques or an incomplete consideration of the thermodynamic and kinetic factors governing co-crystal formation. For instance, the obstacle of CL-20 co-crystallization may be attributed to the strong intermolecular interaction (−76.56 kJ mol⁻¹) of the CL-20 crystal, which hinders the disruption of its native crystal lattice. In contrast, urea, with its significantly weaker intermolecular interaction (−33.37 kJ mol⁻¹), is more susceptible to lattice disruption by external molecules, facilitating co-crystal formation. This is evidenced by the substantially higher number of urea co-crystals documented in the CSD. To explore this further, we selected six molecules as potential urea coformers. As shown in Table S2 and Fig. S7,† the energy E_ABML and E_AB* for these candidates fall within a favorable range for co-crystallization. Three molecules with the highest potential were identified based on their intermolecular interaction E_AB*, as highlighted by dashed boxes in Fig. S7.† Among these, the two most promising candidates nitrotriazolone (NTO) and 4H,8H-bis([1,2,5]oxadiazolo)[3,4-b:3′,4′-e]pyrazine (DFP) were successfully synthesized as urea co-crystals. Their crystal structures are presented in Fig. 7a and b, with detailed synthetic procedures provided in Methods. These results demonstrate the efficacy of our energy-driven approach in designing and synthesizing co-crystals, highlighting its potential for broader application in co-crystal engineering.


	Fig. 7 Intermolecular interaction analysis of co-crystals. (a) Crystal structure of NTO/urea. (b) Crystal structure of DFP/urea. (c) MESP analysis of co-crystals; the overlapping red and blue regions represent strong electrostatic interactions. (d) Hirshfeld surface analysis, with redder areas representing higher electron density and stronger intermolecular contacts. (e) E_dis, E_ind, E_exc, and E_ele of dimers for co-crystals NTO/urea and DFP/urea, respectively.

Further analysis of weak intermolecular interactions in the synthesized co-crystals is presented in Fig. 7c–e. Molecular electrostatic potential (MESP) mapping of components within the crystalline lattice reveals distinct electrostatic complementarity,^48,49 as shown in Fig. 7c, and significant van der Waals surface penetration is observed in hydrogen bond formation regions, where red (electron-rich) and blue (electron-deficient) areas interpenetrate, reflecting the electrostatic attraction nature of hydrogen bonding. Hirshfeld surface analysis further quantifies these interactions, demonstrating that hydrogen bonds constitute 48.11% and 62.08% of total surface contacts in the NTO/urea and DFP/urea co-crystals, respectively (Fig. 7d and S8†). This underscores the dominant role of hydrogen bonding in structural stabilization. Energy decomposition analysis via PSI4 calculations corroborates these findings, as shown in Fig. 7e. Electrostatic contributions (E_ele) account for a high proportion of the total intermolecular interaction in most structures, confirming the prevalence of hydrogen-bonding. The predictive capability of our GNN model is validated through comparative analysis of the strongest intermolecular interaction shown in Fig. 7e and predicted E_ABML in Table S2.† The model achieves prediction errors of 8.37 kJ mol⁻¹ and 3.79 kJ mol⁻¹ for NTO/urea and DFP/urea, respectively; these deviations fall within acceptable thresholds for screening applications, highlighting the utility of the model in guiding experimental synthesis.

Methods

Energy decomposition analysis of dimer configurations

To systematically examine intermolecular interactions in co-crystals, we analyzed pairwise structures within the crystal lattice. Dimer extraction was implemented where the central molecule in the cell served as the reference core, and all neighboring molecules within its spherical radius were identified. Intermolecular interactions between the central molecule and its surrounding counterparts were subsequently calculated. While this approach maximizes the inclusion of relevant molecular interactions in co-crystal systems, it may exhibit two limitations: potential undersampling in co-crystals and possible redundancy in configurations due to crystallographic symmetry. According to the above method, we extracted a total of 24 [thin space (1/6-em)]

760 dimer configurations from 2100 crystal structures.

For energy decomposition analysis, we employed the open-source quantum chemistry package PSI4 to perform SAPT calculations.⁵⁰ This method partitions the intermolecular interaction E_tot into four distinct components: dispersion E_dis, electrostatics E_ele, induction E_ind, and exchange repulsion E_exc values, as shown in eqn (1), enabling quantitative analysis of individual intermolecular interaction types. The SAPT0/jun-cc-pVDZ basis set was selected for calculation.


Etot = Edis + Eele + Eind + Eexc	(1)

Discrimination of outliers

The LOF algorithm was implemented to identify anomalous intermolecular interactions in the designed co-crystal. As a density-based anomaly detection method, the LOF offers distinct advantages over conventional clustering approaches: while traditional clustering algorithms (e.g., k-means or DBSCAN) provide binary classification (inlier/outlier), the LOF quantitatively assigns an outlier score to each data point, thereby enabling graded assessment of anomalous behavior.

The fundamental principle of this methodology lies in comparative density analysis.⁵¹ The algorithm posits that non-outlier data points exhibit local density characteristics consistent with their neighboring observations, whereas outliers demonstrate significant density deviations within their local neighborhoods. This intrinsic mechanism renders the LOF particularly suitable for detecting subtle anomalies in complex energy datasets, as it accounts for both global distribution patterns and localized structural variations.

For a given sample point P, the distances to all other points were calculated and sorted in ascending order, where the distance to the k-th nearest point is denoted as d_k(P); the points within the region centered at P with a radius of d_k(P) are denoted as N_k(P). When determining the accessible distance between P and another sample point O, the maximum value between d_k(O) and d(P,O) is selected, as expressed in eqn (2). The k-local accessible density of P is expressed as eqn (3):


reach_dist_k(P,O) = max{d_k(O),d(P,O)}	(2)


	(3)

The numerator represents the sum of the k-accessible distances from point P to all points in N_k(P).

The LOF is defined as the ratio of the average local accessible density of all points in N_k(P) to the local accessible density of P, as expressed in eqn (4):


	(4)

A ratio significantly greater than 1 indicates that P has a lower density compared to its neighbors, suggesting that it is an outlier. Conversely, a ratio close to or less than 1 implies that P has a higher density, classifying it as a normal point. Based on this principle, the LOF algorithm from the scikit-learn library was employed to calculate outlier scores for unknown components combining co-crystal datasets, enabling the evaluation of their co-crystal formation potential. The parameters n_neighbors and contamination were assigned values of 40 and 0.01, respectively.

Construction of the GNN model

To construct feature representations for the GNN model, we defined 35 atomic descriptors (node attributes) and 4 bond descriptors (edge attributes) to characterize the molecular structure. These features were systematically aggregated from individual co-crystal molecules to generate unified atom-feature and bond-feature matrices, as detailed in Table S3.† All descriptors were computed using the open-source cheminformatics toolkit RDKit (https://www.rdkit.org). To enhance the predictive capability of the GNN model, we augmented the basic graph representation with a supplementary global state descriptor involving molecular relative positions. These included distance-based descriptors categorized by connected atom types and the length range; the angle between the principal inertial axes of molecules; nearest-neighbor atom type; truncated SOAP. The SOAP descriptors and additional global features were calculated using the DScribe package⁵² (https://singroup.github.io/dscribe/latest/), with full implementation details provided in ESI Methods and descriptor specifications enumerated in Table S4.†

The GNN model is specifically designed to deal with non-Euclidean domains and represented as a graph with a complex relationship and interdependency between objects.⁵³ The graph of a molecule can be expressed as a combination of nodes and edges, in which nodes are represented as atoms and edges are represented as bonds. In molecular modeling, a co-crystal system can be represented as eqn (5):


G = (V, E, u)	(5)

where V denotes atomic feature vectors (nodes), E represents covalent bond attributes (edges) and u encodes the global state descriptor. As illustrated in Fig. 1e, this framework maps atoms to nodes and bonds to edges. Building upon the framework established by Guo et al.,⁵⁴ we extend the representation for co-crystal systems by concatenating node and edge features from individual molecules, while adding the global state descriptor u on these features, as displayed in Fig. 1g. The co-crystal graph was defined as eqn (6):


G = {(V(V1, V2), E(E1, E2), u)}	(6)

The GNN architecture employs iterative message-passing operations to propagate and update atomic information. As shown in Fig. 1f, this process updates and aggregates the nodes along with their neighbor information via an adjacency matrix. By repeating this procedure, the model progressively captures information from more distant neighbors, corresponding to the feature extraction process of each convolutional layer.

The message-passing phase is followed by the readout phase, which is designed to generate a comprehensive graph-level representation by aggregating the hidden embeddings of subgraphs. To enhance the interpretability and performance of the model, we incorporate an attention mechanism during the readout phase. This mechanism optimizes the hidden information by assigning attention coefficients to each atom, which are computed based on their node features.

Following feature vector extraction in the pooling layer, the global state descriptor u is concatenated into these features, as shown in Fig. 1g. A four-layer fully connected (FC) neural network is constructed to establish the mapping between structural features and intermolecular interactions. Rectified Linear Unit (ReLU) activation functions are implemented between FC layers to enhance model expressivity while mitigating gradient vanishing issues. To further optimize predictive performance, we adopted a multitask learning paradigm to predict five energies. Hyperparameter optimization was systematically conducted through Bayesian optimization, with the parameter space and final configurations documented in Table S5.†

The mean square error (MSE) was chosen as the loss function for backpropagation. The MAE and R² are used to evaluate the prediction performance of the GNN model. The MSE, MAE and R² values are defined using eqn (7)–(9):


	(7)


	(8)


	(9)

where y_i is the actual value,

is the predicted value, and

is the mean value of the true values.

Preparation and characterization

DFP was synthesized according to the method given in the literature.⁵⁵ Urea and organic solvents used in this experiment were all purchased from Aladdin with a mass purity of >98%. NTO was synthesized according to the method given in the literature.⁵⁶

Single crystals of urea co-crystals were obtained via slow evaporation in acetonitrile at room temperature (25 ± 2 °C). First, 10 mmol of DFP or NTO and 10 mmol of urea (1 [thin space (1/6-em)] :1 molar ratio) were dissolved in 10 mL of acetonitrile in a glass vial. After complete dissolution, the solution was filtered through a 0.22 μm membrane filter into a designated container to remove any undissolved particles. The filtrate was then left to evaporate slowly at room temperature. After approximately 1–2 days, single crystals of urea co-crystals were obtained. The multiscan absorption, Lorentz and polarization correction were carried out using the SADABS. The structures were solved by direct methods using SHELXS-2013 and OLEX2 software,⁵⁷ and the refinement was further carried out by the full-matrix least-squares technique. The H atoms were refined isotropically, and the non-H atoms were refined anisotropically.

Single crystal X-ray diffraction (SCXRD) was used to determine the crystal structures for DFP/urea and NTO/urea, collected using an XtaLAB diffractometer (Rigaku Co.) with Mo-Kα (λ = 0.71073 Å) radiation at 170 K or 100 K, as shown in Table S6.†

Conclusions

In this study, we systematically investigated the fundamental principles governing co-crystal formation by extracting dimers from existing co-crystals and analyzing their intermolecular interactions. The decomposition of intermolecular interaction revealed that dispersion E_dis and electrostatics E_ele emerged as the dominant contributors in most structures. Further analysis demonstrated a clear correlation between molecular configurations and energy partitioning: structures with parallel aromatic stacking exhibit a high ratio of dispersion energy, while hydrogen-bond-rich configurations are dominated by electrostatic interactions. These findings quantitatively highlight the importance of weak intermolecular interactions, such as packing effects and hydrogen bonding, in driving co-crystallization.

Based on these insights, we established the criterion for evaluating co-crystal formation. At first, the intermolecular interaction E_AB for designed coformers must reside within the dense energy region of known co-crystals, as indicated by a LOF score below 1.3. Then, from the perspective of thermodynamics, the E_AB is expected to exceed the intermolecular interaction (E_AA or E_BB) in single-component crystals, with the energy difference between them not more than 50 kJ mol⁻¹. And it is best for the intermolecular interaction E_AB* from the most stable structure of two molecules to exceed the energy E_AA and E_BB for single components.

To accelerate the co-crystal screening and design process, two distinct graph neural network (GNN) frameworks were established for different intermolecular interaction predictions. The first model, which relies solely on molecular graph representations without global state descriptors u, achieves an MAE of 5.80 kJ mol⁻¹ and is specifically designed to predict the strongest intermolecular interaction (E_AB) in designed co-crystals. The second model for predicting intermolecular interactions for dimers extracted from co-crystals can reach a high accuracy with an MAE value of 1.69 kJ mol⁻¹, which could be used for extensive co-crystal intermolecular interaction exploration. Guided by this framework, two urea co-crystals (NTO/urea and DFP/urea) were synthesized successfully. However, limitations such as the under-sampling of dimers extracted from co-crystals and AB* configurations from Molclus, discrepancies between computational and real-world conditions, and the fact that the energy-based criterion primarily applies to the initial stages of co-crystal formation may lead to the ineffective identification of potential co-crystals. Despite these challenges, this work provides a new strategy for predicting co-crystal formation. While concentrated on energetic co-crystals, the framework grounded in fundamental physicochemical principles demonstrates significant potential for broader application in pharmaceutical co-crystal discovery.

Data availability

The collected co-crystals and calculation results are presented in Data S1–S3.† Data and the code for deep learning can be acquired at https://github.com/rong-yang-tech/gnn/tree/master. The X-ray crystallographic coordinates for structures reported in this study have been deposited at the Cambridge Crystallographic Data Center (CCDC), under deposition numbers 2434054 and 2434055. The new crystallographic structures of co-crystals are presented in Data S4.†

Author contributions

Xiurong Yang: writing – review & editing, writing – original draft, methodology, data curation. Ying Wang: writing – review & editing, methodology, investigation. Linhu Pan: data curation, software. Ruihui Wang: visualization, formal analysis. Yi Wang: methodology, investigation. Honglei Xia: methodology, investigation. Siwei Song: writing – review & editing, methodology, conceptualization. Qinghua Zhang: writing – review & editing, methodology, conceptualization.

Conflicts of interest

There are no conflicts to declare.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 22205218, 22325504, 22075259 and 22405214), the Northwestern Polytechnical University Aerospace Power Future Industry Concept Proof Project (24GNYZ0001-06), and the Fundamental Research Funds for the Central Universities (D5000240045 and D5000240065).

References

P. Sanphui, V. K. Devi, D. Clara, N. Malviya, S. Ganguly and G. R. Desiraju, Mol. Pharm., 2015, 12, 1615–1622 CrossRef CAS .
M. Singh, K. Shen, W. Ye, Y. Gao, A. Lv, K. Liu, H. Ma, Z. Meng, H. Shi and Z. An, Angew. Chem., Int. Ed., 2024, 63, e202319694 CrossRef CAS .
L. Sun, Y. Wang, F. Yang, X. Zhang and W. Hu, Adv. Mater., 2019, 31, 1902328 CrossRef .
H. Tran, R. Gurnani, C. Kim, G. Pilania, H.-K. Kwon, R. P. Lively and R. Ramprasad, Nat. Rev. Mater., 2024, 9, 866–886 CrossRef CAS .
S. Song, F. Chen, Y. Wang, K. Wang, M. Yan and Q. Zhang, J. Mater. Chem. A, 2021, 9, 21723–21731 RSC .
S. Song, Y. Wang, F. Chen, M. Yan and Q. Zhang, Engineering, 2022, 10, 99–109 CrossRef CAS .
Y. Wang, Y. Liu, S. Song, Z. Yang, X. Qi, K. Wang, Y. Liu, Q. Zhang and Y. Tian, Nat. Commun., 2018, 9, 2444 CrossRef PubMed .
T. Heng, D. Yang, R. Wang, L. Zhang, Y. Lu and G. Du, ACS Omega, 2021, 6, 15543–15550 CrossRef CAS PubMed .
M. J. Lee, J. Y. Kim, P. Kim, I. S. Lee, M. E. Mswahili, Y. S. Jeong and G. J. Choi, Pharmaceutics, 2022, 14, 429 CrossRef CAS PubMed .
M. E. Mswahili, M. J. Lee, G. L. Martin, J. Kim, P. Kim, G. J. Choi and Y. S. Jeong, Appl. Sci., 2021, 11, 1323 CrossRef CAS .
M. Haghighatlari, J. Li, F. Heidar-Zadeh, Y. Liu, X. Guan and T. Head-Gordon, Chem, 2020, 6, 1527–1542 CAS .
Y. Jiang, Z. Yang, J. Guo, H. Li, Y. Liu, Y. Guo, M. Li and X. Pu, Nat. Commun., 2021, 12, 5950 CrossRef CAS .
A. Vriza, I. Sovago, D. Widdowson, V. Kurlin, P. A. Wood and M. S. Dyer, Digital Discovery, 2022, 1, 834–850 RSC .
D. Wang, Z. Yang, B. Zhu, X. Mei and X. Luo, Cryst. Growth Des., 2020, 20, 6610–6621 CrossRef CAS .
D. Yang, L. Wang, P. Yuan, Q. An, B. Su, M. Yu, T. Chen, K. Hu, L. Zhang, Y. Lu and G. Du, Chin. Chem. Lett., 2023, 34, 107964 CrossRef CAS .
S. Ahmadi, M. A. Ghanavati and S. Rohani, Chem. Mater., 2024, 36, 1153–1161 CrossRef CAS .
J. J. Devogelaer, H. Meekes, P. Tinnemans, E. Vlieg and R. de Gelder, Angew. Chem., Int. Ed., 2020, 59, 21711–21718 CrossRef CAS PubMed .
M. C. Etter and S. M. Reutzel, JACS, 1991, 113, 2586–2598 CrossRef CAS .
J. M. Lehn, Angew. Chem., Int. Ed., 1988, 27, 89–112 CrossRef .
J. A. R. P. Sarma and G. R. Desiraju, Cryst. Growth Des., 2002, 2, 93–100 CrossRef CAS .
K. B. Landenberger, O. Bolton and A. J. Matzger, Angew. Chem., Int. Ed., 2013, 52, 6468–6471 CrossRef CAS .
R. Bu, F. Jiao, G. Liu, J. Zhao and C. Zhang, Cryst. Growth Des., 2021, 21, 3–15 CrossRef CAS .
R. Bu, Y. Xiong, X. Wei, H. Li and C. Zhang, Cryst. Growth Des., 2019, 19, 5981–5997 CrossRef CAS .
R. Bu, Y. Xiong and C. Zhang, Cryst. Growth Des., 2020, 20, 2824–2841 CrossRef CAS .
G. R. Liu, S. H. Wei and C. Zhang, Cryst. Growth Des., 2020, 20, 7065–7079 CrossRef CAS .
W. D. Derricotte, J. Phys. Chem. A, 2019, 123, 7881–7891 CrossRef CAS .
A. Tekin, J. Chem. Phys., 2019, 151, 244302 CrossRef PubMed .
A. Tekin and G. Jansen, Phys. Chem. Chem. Phys., 2007, 9, 1680–1687 RSC .
Y. Liu, S. Zhang, R. Gou, Y. Chen, Y. Liu, J. Jiang, M. Chen and Z. Li, Mater. Today Commun., 2020, 24, 101020 CrossRef CAS .
J. Zhang, P. Gao, J. J. Xiao, F. Zhao and H. M. Xiao, Modell. Simul. Mater. Sci. Eng., 2016, 24, 085008 CrossRef .
F. Li, Z. Zheng, S. Xia and L. Yu, J. Mol. Struct., 2020, 1219, 128480 CrossRef CAS .
S. Akram, A. Mehmood, S. Noureen and M. Ahmed, CrystEngComm, 2023, 25, 6478–6488 RSC .
B. Li, S. Zhang, R. Gou, S. Zhu, Y. Liu, S. Feng, Q. Guo and B. Zhang, Chem. Pap., 2023, 77, 7515–7526 CrossRef CAS .
Y. Wang, Q. Zhang, F. Chen, S. Song and K. Wang, J. Phys. Chem. C, 2023, 127, 8887–8893 CrossRef .
J. Wu, S. Song, X. Qi, H. Yang and Y. Wang, Phys. Chem. Chem. Phys., 2024, 26, 4752–4758 RSC .
S. Song, X. Tian, Y. Wang, X. Qi and Q. Zhang, CrystEngComm, 2022, 24, 1537–1545 RSC .
G. G. Celik, B. Dedeoglu, A. G. Gurek, Y. Zorlu and M. M. Ayhan, Cryst. Growth Des., 2023, 23, 7285–7294 CrossRef .
Q. Liu, D. Yang, T. Chen, B. Zhang, C. Xing, L. Zhang, Y. Lu and G. Du, Cryst. Growth Des., 2021, 21, 6321–6331 CrossRef CAS .
T. Misiaszek and Z. Czyznikowska, J. Mol. Graphics Modell., 2014, 51, 73–78 CrossRef CAS PubMed .
K. N. Jarzembska, A. M. Goral, R. Gajda and P. M. Dominiak, Cryst. Growth Des., 2013, 13, 239–254 CrossRef CAS .
K. N. Jarzembska, A. A. Hoser, R. Kamiński, A. Ø. Madsen, K. Durka and K. Woźniak, Cryst. Growth Des., 2014, 14, 3453–3465 CrossRef CAS .
T. Lu, http://www.keinsci.com/research/molclus.html, 2023.
T. Fei, P. Lv, Y. Liu, C. He, C. Sun and S. Pang, Cryst. Growth Des., 2019, 19, 2779–2784 CrossRef CAS .
D. P. Metcalf, A. Koutsoukas, S. A. Spronk, B. L. Claus, D. A. Loughney, S. R. Johnson, D. L. Cheney and C. D. Sherrill, J. Chem. Phys., 2020, 152, 074103 CrossRef CAS PubMed .
M. Thürlemann, L. Böselt and S. Riniker, J. Chem. Theory Comput., 2023, 19, 562–579 CrossRef .
J. Du, B. Wang, Y. Chen, X. Li and C. Wang, J. Mol. Model., 2024, 30, 348 CrossRef CAS .
G. Hang, J. Wang, T. Wang, H. Shen, W. Yu and R. Shen, J. Mol. Model., 2022, 28, 58 CrossRef CAS .
J. Zhang and T. Lu, Phys. Chem. Chem. Phys., 2021, 23, 20323–20328 RSC .
T. Lu and F. Chen, J. Comput. Chem., 2012, 33, 580–592 CrossRef CAS .
D. G. A. Smith, L. A. Burns, A. C. Simmonett, R. M. Parrish, M. C. Schieber, R. Galvelis, P. Kraus, H. Kruse, R. Di Remigio, A. Alenaizan, A. M. James, S. Lehtola, J. P. Misiewicz, M. Scheurer, R. A. Shaw, J. B. Schriber, Y. Xie, Z. L. Glick, D. A. Sirianni, J. S. O'Brien, J. M. Waldrop, A. Kumar, E. G. Hohenstein, B. P. Pritchard, B. R. Brooks, H. F. Schaefer III, A. Y. Sokolov, K. Patkowski, A. E. DePrince III, U. Bozkaya, R. A. King, F. A. Evangelista, J. M. Turney, T. D. Crawford and C. D. Sherrill, J. Chem. Phys., 2020, 152, 184108 CrossRef CAS .
M. M. Breunig, H. P. Kriegel, R. T. Ng and J. Sander, presented in part at the Proceedings of the 2000 ACM SIGMOD international conference on Management of data, Dallas, Texas, USA, 2000 Search PubMed .
L. Himanen, M. O. J. Jäger, E. V. Morooka, F. Federici Canova, Y. S. Ranawat, D. Z. Gao, P. Rinke and A. S. Foster, Comput. Phys. Commun., 2020, 247, 106949 CrossRef CAS .
Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang and P. S. Yu, IEEE Trans. Neural Netw. Learn. Syst., 2021, 32, 4–24 Search PubMed .
J. Guo, M. Sun, X. Zhao, C. Shi, H. Su, Y. Guo and X. Pu, J. Chem. Inf. Model., 2023, 63, 1143–1156 CrossRef CAS .
H. Y. Zhu, Y. H. Liu, H. Y. Sun, D. D. Cao, l. Yuchuan and S. P. Pang, RSC Adv., 2023, 13, 16536–16548 RSC .
D. Viswanath, T. Ghosh and V. Boddu, Emerging energetic materials: synthesis, physicochemical, and detonation properties, Springer Nature, Dordrecht, 2018 Search PubMed .
O. Dolomanov, L. Bourhis, R. Gildea, J. Howard and H. Puschmann, J. Appl. Cryst., 2009, 42, 339–341 CrossRef CAS .

Footnotes

† Electronic supplementary information (ESI) available. CCDC 2434054 and 2434055. For ESI and crystallographic data in CIF or other electronic format see DOI: https://doi.org/10.1039/d5ta02975a

‡ These authors contributed equally to this work and should be considered co-first authors.

Click here to see how this site uses Cookies. View our privacy policy here.