Chenyang Weia,
Wenbo Mu*b,
Hongyuan Zhanga,
Zhenghui Liu*c and
Tiancheng Mu
*ad
aSchool of Chemistry and Life Resources, Renmin University of China, Beijing 100872, P.R. China. E-mail: tcmu@ruc.edu.cn
bDepartment of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093-0404, USA. E-mail: wmu@uscd.edu
cSchool of Pharmaceutical and Chemical Engineering, Taizhou University, Taizhou 318000, Zhejiang, China. E-mail: liuzhenghui@iccas.ac.cn
dKey Laboratory of Green Chemical Media and Reactions, Ministry of Education, School of Chemistry and Chemical Engineering, Henan Normal University, Xinxiang, Henan 453007, P. R. China
First published on 10th July 2025
Double-atom catalysts (DACs) are promising electrocatalysts due to their synergistic metal–metal interactions and high atom utilization. However, the vast chemical space arising from diverse metal pairs and substrates presents a major challenge for rational design. Here, we combine high-throughput density functional theory (DFT) calculations with machine learning (ML) analysis to systematically investigate DACs for the CO2 reduction reaction (CO2RR), hydrogen evolution reaction (HER), and oxygen evolution reaction (OER). We establish a predictive ML framework capable of rapidly screening DAC candidates with near-DFT accuracy, enabling efficient evaluation across a wide range of substrates. Guided by ML and DFT approaches, we identify PtZn/N-C3N4 as a highly active OER catalyst with a theoretical overpotential of ∼0.15 eV, and CuNi/N-C3N4 as a top-performing bifunctional catalyst for overall water splitting. For CO2RR, VTi/N-C3N4 shows a limiting potential approaching ∼0.15 V, close to the optimal volcano plot peak, along with strong HER suppression. In summary, this work offers key insights for the design of ACs, providing substantial time savings and demonstrating the immense potential of ML as a universally applicable tool in diverse energy-related fields.
Atomic catalysts (ACs), particularly double-atom catalysts (DACs), have garnered attention due to their exceptional metal atom utilization and enhanced catalytic activity.4–9 By dispersing metal atoms across two-dimensional (2D) substrates, ACs offer a high density of active sites. DACs, which introduce a second metal atom, further enhance catalytic versatility and promote complex inter-metallic interactions, modifying the electronic and geometric properties of active sites. These changes allow for targeted optimization of catalytic performance, highlighting DACs as a promising platform for advancing catalytic processes. However, DACs face a significant challenge: the vast chemical space of possible combinations (Fig. 1), especially when considering diverse 2D substrates. Within this vast chemical landscape, some DAC configurations may suffer from thermodynamic or electrochemical instability, while others may fail to meet desired catalytic criteria. This brings us to a key question: How can we effectively predict the properties of DACs within such an expansive chemical space? Traditional density functional theory (DFT) calculations, while accurate, are computationally expensive, and relying solely on chemists' intuition often lacks the precision when predicting the effects of complex substrate–metal interactions.
The rapid advancement of machine learning (ML) has positioned it as a transformative tool in material discovery, synthesis, and characterization.10–17 ML efficiently detects patterns and relationships within large datasets, enabling rapid predictions and accelerating research. By uncovering complex relationships that are often beyond traditional models, ML has the potential to revolutionize scientific research. For instance, Google DeepMind's Graph Networks for Materials Exploration (GNoME) has enabled the creation of an extensive database of stable crystals, predicting approximately 220 million structures.18 Likewise, Amir Kotobi's team used graph neural networks (GNNs) to predict X-ray absorption spectra (XAS) of organic molecules, enhancing interpretability through class activation maps (CAM).19 Another example includes Tongtong Yang and collaborators, who proposed a ML approach leveraging spectroscopic descriptors to predict catalytic properties and achieve structural inversion.20 These advancements highlight ML's potential to replace traditional labor-intensive methods and deliver substantial time efficiencies. Specifically, some instances indicate that ML has the promise in the design of ACs.4,5,21 However, most studies have focused on transition-metal-based ACs on specific substrates or employed conservative doping strategies that minimally alter the substrate's structural framework (Fig. S1†). This trend is even more pronounced in DACs research, where simplified approaches facilitate the analysis of metal–metal interactions and the development of straightforward predictive frameworks. While these methods are effective in idealized scenarios, they diverge from real-world complexities. In practical applications, optimal ACs require exploring diverse substrate–metal combinations, as substrates can significantly alter the electron density and structure of metals, imparting unique properties to the catalysts.22,23 These complexities pose significant challenges for ML, which must account for the dynamic interactions between substrates and metals to accurately reflect real-world conditions.
To address these challenges, this study employs four prototypical 2D graphitic carbon nitride (CxNy) substrates, which, to our knowledge, have not been extensively used in DACs. These substrates introduce significant modifications to the substrate structure and coordination environment surrounding the metal atoms, bringing the study closer to realistic material design processes. Using high-throughput DFT calculations, we initially investigate DAC-driven CO2 reduction reaction (CO2RR) activity across 46 distinct DACs, revealing that conventional descriptor-the Gibbs free energy change for CO adsorbates (ΔGCO*) does not adequately capture CO2RR performance. Consequently, we leverage ML to predict the limiting potential (UL) of CO2RR and estimate key stability metrics-binding energy (ΔEbind), dissolution potential (Udiss), and the Gibbs free energy for H adsorbates (ΔGH*)-which serve as proxies for thermal, electrochemical stability and competing hydrogen evolution reaction (HER) activity. In subsequent analyses, ML further dissects the adsorption behavior of intermediates, revealing that the d-electron count within bimetallic systems-via alterations in the d-band center and electron transfer-is crucial for modulating adsorption, while DAC stability is influenced by both substrate and intrinsic metal properties. Building on preceding insights, we identify the N vacancy in graphitic carbon nitride (N-C3N4) as a promising substrate for stabilizing bimetallic atoms and explore the oxygen evolution reaction (OER) activity, noting that traditional descriptors like OH adsorption Gibbs free energy (ΔGOH*) fall short, thus prompting ML-based prediction of the theoretical overpotential (ηOER). Finally, we present an innovative framework that integrates high-throughput DFT with advanced ML techniques to predict DAC performance in key catalytic reactions. By employing symbolic regression via the PySR library, we enhance the interpretability of adsorption predictions for key intermediates-thereby laying the groundwork for future descriptor development-while reducing computation time by approximately 3750 times. Ultimately, our approach streamlines the DAC discovery process and offers an efficient pathway for identifying high-performance catalysts for sustainable energy applications.
The overall selection criteria are as follows:
(1) stability: ΔEbind < 0, while Udiss > 0;
(2) high CO2RR activity: l UL > −0.4 eV;40,41
(3) high HER activity: |ΔGH*| < 0.15 eV;
(4) high OER activity:ηOER < 0.66 eV (using IrO2 as the benchmark42).
Due to the high dimensionality of CM and SOAP, we applied dimensionality reduction techniques (PCA or t-SNE) to retain essential information while reducing the computational burden.
![]() | (1) |
![]() | (2) |
![]() | (3) |
Above is a table summarizing the descriptors and their respective applications (Table 1).
Feature category | Description | Usage |
---|---|---|
Elemental info | Atomic number, or d-electrons number, et al. | Used in all models |
Local structural info | Bond lengths around bimetallic centers | Used in analysis models only |
Global structural info | CM | Used in the prediction framework only |
SOAP |
![]() | (4) |
ypred = ȳ + f(x1) + f(x2) + ⋯ + f(xn) |
In contrast, the limiting potential UL inherently captures the thermodynamic span of all elementary steps and explicitly identifies the highest-energy barrier across the entire reaction pathway. This makes it a more comprehensive and physically meaningful descriptor for multi-step reactions like CO2RR, where the PDS can shift dynamically depending on catalyst composition or coordination environment, making it a more robust descriptor for CO2RR. To improve high-throughput prediction in this context, we further leveraged ML to construct composite descriptors that integrate electronic and geometric, features-beyond single-adsorbate energetics-to more accurately represent the complex multi-step nature of CO2RR on diverse DAC surfaces.
To support this framework, we constructed a comprehensive dataset encompassing both stability and reactivity metrics. In addition to UL, we computed the ΔEbind and Udiss of DACs to evaluate their stability. A more negative ΔEbind indicates enhanced thermal stability and a higher likelihood of experimental validation, while a positive Udiss suggests metal resistance to dissolution during electrochemical processes. An illustrative heatmap (Fig. S2†) compares these stability metrics across various DACs. Additionally, we computed ΔGH* for approximately 120 DACs to quantify their HER activity, thereby providing further data for subsequent ML modeling.
SHAP analysis (Fig. 3b) highlights that descriptors related to the coordination environment-such as the ratio of nitrogen atoms near the bimetallic center (denoted ‘number of N’)-are highly influential. The high SHAP value associated with the nitrogen ratio indicates that nitrogen atoms are essential for modulating the electronic configuration of the bimetallic centers, which directly influences their ability to adsorb intermediates like CO. Previous studies have demonstrated how nitrogen atoms in the substrate can interact with the metal's d-electrons, enhancing or reducing CO adsorption strength.50–52 Furthermore, the contribution of transition metal d-electrons to predicting ΔGCO* suggests that both the nitrogen environment and the d-electron configuration of the metals are crucial for optimizing catalytic performance. By selecting appropriate substrates and metal pairs based on these features, we can design more efficient catalysts for reactions like CO2 reduction.
![]() | ||
Fig. 3 Comprehensive analysis of electrocatalytic descriptors for the CO2RR process on DACs; (a) heatmap of p among various electron-related features. The color gradient (ranging from red to blue) indicates the magnitude of p, with intense colors at the extremes reflecting strong correlations (the complete heatmap is shown in ESI Fig. S8a†). (b) Bar plot of the top 11 features' relative importance based on SHAP values from the ML analysis of ΔGCO* for DACs, highlighting the most influential descriptors (full SHAP values for all features are provided in ESI Fig. S8b†). (c) Heatmap depicting the relationship between lC–O and the d-electron counts of Metal 1 (M1) and Metal 2 (M2). The accompanying color bar represents the range of lC–O, with the red region indicating elevated CO activation. (d) The four selected DACs for CO2RR discussed in this study: (i) PtPt_C2N, (ii) PtNi_C2N, (iii) NiSc_g-C3N4, and (iv) NiW_g-C3N4. (e) & (f) Correlation analyses of εd, |e|, and ΔGCO*: panels (e) and (f) display data points colored red, green, blue, and orange for DACs with CN, C2N, g-C3N4, and N-C3N4 substrates, respectively. The shaded area in (f) highlights DACs with εd near EF. Together, panels (e) & (f) elucidate the interrelationships among these key catalytic descriptors. |
To substantiate the impact of d-electrons on CO adsorption, we applied the Pearson correlation coefficient (p) to evaluate the relationships between the number of outermost electrons (Ne), s-electrons (θs), d-electrons (θd), and the C–O bond length (lC–O) (Fig. 3a). Our findings reveal a strong correlation between the number of d-electrons in transition metals and lC–O-a trend not observed for s-electrons-suggesting that d-electrons uniquely promote CO activation by elongating the C–O bond. Bader charge analysis (Table S4†) further confirms that transition metal d-electrons predominantly donate charge to CO, enhancing its activation. Moreover, correlating Bader charge transfer (|e|) with lC–O shows that greater d-electron transfer corresponds to longer C–O bonds, signaling amplified CO activation (Fig. S3†). A heatmap (Fig. 3c) further illustrates a critical balance in the electronic configuration of bimetallic systems: neither an excess nor a deficiency of d-electrons favors optimal CO activation, as fully occupied d-orbitals impede activation while too few d-electrons lead to insufficient electron donation, and consequently, is detrimental to CO2RR efficiency.
Subsequently, to elucidate how bimetallic atoms influence adsorption behavior and electronic structure, we computed the PDOS (Fig. S4† and 3d). These PDOS plots reveal the interaction between the two metals-evident from overlapping regions-and demonstrate how bimetallic structures modulate the d-band center (εd), as summarized in Table S5.† For example, in Pt–Pt and Pt–Ni combinations on a C2N substrate (Fig. 3di and ii), replacing one Pt atom with Ni shifts the remaining Pt's εd closer to the Fermi level (EF), resulting in the overall εd being adjusted toward EF, thereby altering ΔGCO*. A similar trend is observed in other metal pairings (e.g., NiSc and NiW on g-C3N4 substrates), underscoring the mutual influence of the metal partners. Such εd shifts, by modulating the adsorption strength of CO, directly influence CO2RR activity through changes in intermediate stabilization and desorption energetics. Similar effects on OH adsorption are expected in OER, as discussed later.
To gain a more integrated understanding of the factors influencing ΔGCO*, we examined the correlations among ΔGCO*, εd, and |e| (Fig. S5†, 3e and f). Fig. S5† presents a synthesized view of these parameters, revealing a volcano relationship between ΔGCO* and εd. Notably, εd has a more substantial impact on ΔGCO* than |e|, in line with the majority of related works that position εd as an informative descriptor for adsorption energetics for transition metals; although higher εd generally corresponds to increased |e|, the relationship is not strictly linear. Fig. 3e further illustrates a volcano-shaped dependence, where a smaller gap between εd and EF facilitates electron migration from the metal site to the adsorbate, thereby enhancing CO adsorption. This observation is supported by Fig. 3f, which shows that DACs with εd near EF exhibit significant charge transfer |e|, emphasizing the role of electron displacement in CO activation.
Interestingly, certain DACs enriched with pre-transition metals (e.g., Zr, Ti, Sc) display a positive εd due to their incompletely occupied d-orbitals, resulting in empty d-bands above EF. In DACs such as NbZr_g-C3N4, VZr_CN, and FeSc_CN, these unpaired d-electrons are more available, leading to higher Bader charge transfer values (Fig. 3f) and affect ΔGCO* and CO2RR activity. Aside from these anomalies, the overall volcanic trend between εd and ΔGCO* is evident. Moreover, we hypothesize that these exceptions may partly arise from alternative CO adsorption modes: while most DACs adopt an end-on conformation (with C as the adsorption site), the anomalous DACs favor side-on adsorption (Fig. S6†).
Finally, we evaluated ΔGH* for a subset of 12 DACs along with their εd and |e| values (Fig. S7†). These results indicate that a εd closer to EF enhances hydrogen adsorption and that d-electron migration modulates ΔGH* in a manner similar to CO adsorption. The high SHAP values associated with d-electrons further reinforce these findings.
For analyzing ΔEbind, our ML model (Fig. 4b) indicates that both the spatial distance between transition metals and substrate atoms, and the distance between the two metal atoms themselves, are crucial. This is intuitively plausible: if the metal atoms are too close or if they possess excessively large atomic radii, strong repulsive forces may arise, compromising DAC stability. Concurrently, substrate-related features are also highly influential, underscoring the critical role of substrate type and structure. To clearly present these findings and reinforce our ML analysis, we detail ΔEbind in Fig. 4a. Both the substrate and the transition metals type significantly affect ΔEbind, consistent with our ML insights. Among the substrates, N-C3N4 exhibits a notably lower ΔEbind compared to others, suggesting a stronger potential for securely anchoring dual metal atoms.53 We postulate that this enhanced binding is attributed not only to the large pore structures and surface area of N-C3N4, but also critically to the presence of nitrogen vacancies, which introduce undercoordinated carbon atoms and locally distorted triazine units that provide flexible, asymmetric binding pockets for dual-metal anchoring. These defect-induced coordination environments enable diverse metal–support interactions and facilitate charge transfer from the substrate to the metal centers, thereby stabilizing the metal pair and modulating their oxidation states.54,55 This dual effect-structural adaptability and electronic enrichment-underlies the superior anchoring capability of N-C3N4 observed in our ΔEbind analysis.
![]() | ||
Fig. 4 Detailed evaluation of ΔEbind and descriptor significance in DACs; (a) compilation of ΔEbind values for DACs with various substrate structures. Data points are color-coded: red, green, blue, and orange represent DACs on CN, C2N, g-C3N4, and N-C3N4 substrates, respectively, facilitating comparison across different substrates. (b) Bar chart of the top 11 features ranked by relative importance, as determined by SHAP values from the ML model for ΔEbind analysis. This chart highlights the most influential factors affecting ΔEbind predictions. For a full comparison of SHAP values for all features, refer to ESI Fig. S9.† |
Furthermore, our analysis reveals that the atomic radii of the di-metal pairs impact ΔEbind. Metal pairs with similar radii tend to have comparable ΔEbind values, while those with significant radii differences show pronounced disparities. For instance, on the CN substrate, metal pairs such as NiFe, NiCo, CoFe, and FeFe-having similar atomic radii-exhibit similar ΔEbind values. A similar pattern is observed for NiCo and NiFe on N-C3N4 and for CoPt, CrPt, and CuPt on C2N. Conversely, metal pairs with large radii differences (e.g., ZrV and ZrNb on N-C3N4, or NbSc, NiSc, VFe, and VW on g-C3N4) display substantial variations in ΔEbind.
Finally, the d-electron configuration of transition metals also influences DAC thermal stability. Transition metals such as Zn, Cd, Ag, and Au typically have a stable configuration with 10 d-electrons, which limits their ability to bond with surrounding atoms and hinders DAC formation, often resulting in altered ΔEbind values and distorted configurations.
Since OER activity is closely linked to the adsorption behavior of intermediates (OH*, O*, and OOH*) within the associative mechanism59 and ΔGOH* is a common descriptor for OER,60,61 we first investigated ΔGOH* in DACs with N-C3N4. Given that the OH adsorption strength also depends on the d electron count in transition metals,14,61 we performed DFT calculations across diverse DACs to correlate ΔGOH* with the number of d electrons. However, the presence of dual metal sites in DACs makes it challenging to determine the exact d electron count, unlike in single-atom catalysts' system. To address this, we developed a preliminary method to identify the dominant metal for OH adsorption by measuring the distances between the adsorbed OH and each metal atom. For example, in a MoFe DAC, the proximity of OH to Fe allowed us to assign a d electron count of 6 based on Fe's configuration (Fig. S10†). Cases with ambiguous adsorption preference were excluded from the initial analysis. The resulting violin plot in Fig. 5a shows that OH adsorption strength generally decreases as the d electron count increases, although potential biases from data exclusion warrant cautious consideration about this conclusion.
To more comprehensively probe the relationship between ΔGOH* and d electron count, we refined our method by introducing a weighted descriptor ϕ. In this approach, weights are assigned to the d electron count of each metal based on its proximity to the adsorbed O atom, since OH typically adsorbs with the O atom closest to the metal. Detailed methodology is provided in the ESI note accompanying Fig. S10.† This refined approach allowed us to include DACs where preferential OH adsorption was previously ambiguous. As illustrated in Fig. 5b, the data now more clearly reveal that a higher average d electron count correlates with weaker OH adsorption. Complementary SHAP analysis (Fig. S11†) further supports the critical role of d electron count and the spatial relationship between the adsorbed OH and the metal sites in determining ΔGOH*.
To gain a refined understanding of adsorption in DACs, we selected six representative systems for detailed analysis to re-establish the relationship between ΔGOH*, εd, and |e| (Fig. 5c and g). Our findings are twofold: (1) a εd closer to EF correlates with stronger OH adsorption; and (2) increased d-electron transfer-as indicated by a higher |e|-correlates with a lower ΔGOH*. However, these parameters do not exhibit a strict linear correlation, suggesting that additional factors (e.g., varied adsorption modes and steric hindrances in different DAC structures) also influence OH adsorption. An R2 of approximately 0.6 (Fig. 5b) further indicates that, while the d-electron count is a key determinant, it alone cannot fully capture OH adsorption strength. Ultimately, to validate ϕ's rationality for essential adsorption characteristics, we examined its correlations with εd and |e| (Fig. S12†). We found that a higher ϕ corresponds to a more negative εd (i.e., shifted further from EF), which attenuates OH adsorption-a trend consistent with previous findings linking increased d-electron presence to larger ΔGOH*. Moreover, a higher ϕ is associated with reduced charge transfer, reflecting diminished electron donation as εd moves away from EF. Thus, ϕ not only reflects the overall adsorption energy but also encapsulates intrinsic factors governing adsorption, validating it as generally adept and rational parameter for assessing OER activity in DACs with single substrate.
Following our examination of the correlation between ΔGOH* and d electrons, we extended our investigation to the remaining OER intermediates-namely, O* and OOH* and their relationship with OH*. Accordingly, we computed the energy barriers along the OER pathway for approximately 30 DACs (Fig. S13†). Concurrently, we calculated the Gibbs free energy changes for O* and OOH* (ΔGO* and ΔGOOH*, respectively) and visualized their relationships with ΔGOH* in Fig. 5d and e. The results indicate a linear relationship between ΔGOH* and both ΔGO* and ΔGOOH*, although the correlation for ΔGO* is notably weaker than that for ΔGOOH*. Traditionally, on transition metal surfaces, the robust linear relationships of both yield a characteristic volcano plot correlating ΔGOH* with ηOER, thereby establishing ΔGOH* as a reliable descriptor of OER activity.22 However, in the DACs studied here, these relationships are obscured; when examining the correlation between ηOER and ΔGOH* (Fig. 5f), the expected volcano trend is not clearly observed. This obscured trend underscores the complexity of intermediate adsorption in DACs, where adsorbates may bind to one or both metal atoms or even to the substrate, deviating from conventional transition metal surface models.
In summary, our workflow (Fig. 6a) consists of several modules built on the XGBoost algorithm, which performs well on small to medium-sized datasets and avoids overfitting common with neural networks. XGBoost's numerous hyperparameters allow extensive optimization for superior performance. We extract optimized atomic coordinates and unit cell information in batches and convert these data into either CM or SOAP-two widely used feature construction methods in DFT. Our comparison reveals that SOAP outperforms CM (Fig. 6b) because it accounts for crystal structure repeatability and complex van der Waals interactions, thereby better simulating the true atomic environment for crystal materials. t-SNE visualization of SOAP features (Fig. 6c) shows that the four clusters correspond to the four substrate types in our dataset, with both g-C3N4 and N-C3N4 grouped distinctly on the right, and CN and C2N on the left, indicating that SOAP effectively retains the key structural information of these materials. We also tested the impact of hyperparameter tuning on model performance, investigating whether CM could outperform SOAP. However, even after additional hyperparameter tuning (using Optuna for optimization, Table S6†), the XGBoost model with SOAP consistently outperforms the one using CM. This confirms that capturing structural repeatability is crucial for materials feature engineering. Nonetheless, SOAP has limitations due to its high storage requirements and longer computational times, highlighting the need for future advancements in feature engineering methods.
![]() | ||
Fig. 6 The innovative prediction framework, module performance, and PySR-generated formula; (a) overview of the innovative prediction framework. (b) Comparison of accuracy between CM and SOAP feature construction methods. (c) t-SNE visualization of SOAP features for the four substrates studied, with different metal pairs. (d) Prediction results for each metric (ΔEbind, Udiss, ΔGH*, UL and ηOER), with the blue-grey shaded area indicating satisfaction of the screening criteria. (e) Prediction performance of each ML model. (f) Selected examples of 5 N-C3N4 DACs with the comparison between DFT and ML prediction. Blue shaded area indicates ideal ηOER or UL, while red shaded area denotes high ΔGH* activity. (g) Summary of PySR-generated formula and the corresponding key intermediates' adsorption energies. This figure presents simplified versions of the formulas, with more detailed and high-precision fitting results provided in Table S10 and Fig. S14.† |
In our framework, we employed an active learning strategy that iteratively selects the most uncertain examples (uncertainty sampling) for each learning loop. This approach, effective even when data are generated via generative diffusion models (Fig. S15†), continuously refines model predictions by focusing on uncertain cases. To ensure the reliability of the generated data, we performed a comprehensive analysis of the samples produced by the diffusion mode (Table S7 and Fig. S15†). The analysis involved three key metrics: Maximum Mean Discrepancy (MMD), Average Cosine Similarity (ACS), and Nearest-Neighbor Consistency (NNC), along with t-SNE visualization of generated data. These results confirm that the generated data align closely with the real data, thereby ensuring their suitability for model training and validation. In large-scale DFT predictions, active learning significantly reduces computational costs and broadens the model's applicability. Despite its reliance on uncertainty estimation, its iterative nature provides a strong theoretical and practical foundation for automated materials exploration. Using this strategy, we achieved high-precision predictions for DAC properties (Fig. 6d, e and Table S8†) sequentially: first assessing DAC stability (ΔEbind and Udiss) using multi-objective predictions, then evaluating HER activity (ΔGH*) and the CO2RR limiting potential (UL) with single-objective predictions to rapidly screen promising CO2RR electrocatalytic DACs. Furthermore, by integrating these predictions with ranking recommendations (Table S9†), we quickly identified the most stable catalysts, particularly those with potential dual functionality for overall water splitting (the purple dots located in the blue shadow in Fig. 6d). Additionally, we selected five DACs with the N-C3N4 substrate, which were proved to be stable with AIMD analysis (Fig. S16†), and compared DFT-calculated and ML-predicted values (Fig. 6f). The results show minimal discrepancies between the calculated and predicted values. Notably, CuNi emerges as the most promising bifunctional water splitting catalyst, while CuPd and PtZn metal pairs excel only in OER performance. Among them, PtZn demonstrates an ηOER close to 0.15 eV, significantly outperforming other materials in OER. Both PtMn and VTi exhibit reasonable UL values, but VTi stands out with strong CO2RR activity and HER resistance, with its CO2RR limiting potential approaching −0.15 V, nearly the peak of the theoretical volcano plot, while PtMn significantly underperforms in CO2RR for its low HER resistance.
However, we still encountered a challenge: while high-dimensional descriptors such as SOAP and CM enhance accuracy by retaining more information, they also compromise interpretability for their high dimension. To address this, we incorporated High-Performance Symbolic Regression (PySR) into our framework. By integrating prior knowledge from SHAP analysis and chemical logic, we provided PySR with key descriptors most likely to influence key intermediate adsorption, achieving high-precision fits (Table S10† and Fig. 6g). The interpretability offered by PySR yields valuable insights into adsorption patterns; furthermore, focusing on atomic features in the immediate vicinity of the adsorbate suggests that the adsorption strength in DACs is primarily determined by the local bimetallic environment. This makes the selection of such features a more efficient approach for ML model construction, with minimal loss in predictive accuracy. In particular, we emphasize that the “simple” versions of the PySR models rely solely on elemental properties such as electronegativity, electron affinity, and the number of valence d-electrons-features that can be directly obtained from publicly available databases (e.g., the NIST database) without requiring prior DFT calculations. This enables rapid, interpretable estimation of catalytic behavior without incurring the computational cost of quantum mechanical simulations. Consequently, these compact symbolic expressions can serve as efficient surrogates for predicting adsorption energetics of key intermediates, allowing researchers to pre-screen candidate DACs at negligible cost. This balance between transparency and computational simplicity makes the simple models particularly suitable for early-stage catalyst discovery.
Importantly, our ML framework significantly expedites the identification of potential CO2RR and water-splitting DACs. As shown in Fig. S17,† our strategy approximately outperforms complex DFT calculations by a factor of 3750, underscoring its efficiency. Note that the DFT computational time referenced pertains only to the DACs in our dataset; expanding DFT analysis to cover the entire chemical space would lead to an exponential increase in computational time and cost, rendering it impractical for real-world applications. We therefore anticipate that ML will increasingly supplant less efficient DFT methods and become increasingly prevalent in the discovery and development of novel materials systems.
Moreover, in the latter part of our study, we established a ML framework not only delivers precise catalytic predictions consistent with theoretical results but also substantially reduces computational costs, underscoring ML's transformative potential in catalyst development. Ultimately, the methodologies and insights from this study chart a definitive course for using ML to advance the design of atomic catalysts for energy conversion and storage, a key step toward a sustainable energy paradigm.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5ta03021h |
This journal is © The Royal Society of Chemistry 2025 |