Yifan Kea,
Xinming Qin*a,
Wei Hu
*a and
Jinlong Yang
*b
aHefei National Research Center for Physical Sciences at the Microscale, University of Science and Technology of China, Hefei, Anhui 230026, China. E-mail: xmqin03@ustc.edu.cn; whuustc@ustc.edu.cn
bKey Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Hefei, Anhui 230026, China. E-mail: jlyang@ustc.edu.cn
First published on 11th August 2025
Density functional theory (DFT) calculations of hybrid functionals have traditionally been limited to small systems containing hundreds of atoms due to substantial computational constraints. In this work, we introduce an interface between DeepH, a machine learning-based Hamiltonian approach, and HONPAS, a density functional theory software package. By leveraging DeepH's ability to bypass self-consistent field (SCF) iterations, DFT calculations in HONPAS become significantly more efficient, including computationally intensive hybrid functional calculations. This combined approach is particularly advantageous for twisted van der Waals systems, as demonstrated through examples of twisted bilayer graphene and twisted bilayer MoS2. The substantial reduction in computation time for the HSE06 functional suggests that our method effectively addresses the efficiency-accuracy trade-off in DFT calculations, making high-accuracy calculations feasible for large systems containing more than ten thousand atoms.
Neural network models are increasingly being integrated into material science research to enhance computational efficiency and predictive accuracy.12–18 Extensive efforts have been dedicated to applying machine learning in materials science research. To effectively model and predict DFT results, previous training targets have primarily focused on charge density or wave functions. These studies predominantly advocate for the use of surrogate models, where atomic structures serve as inputs and electron densities as outputs.19,20 For instance, DeePHF efficiently predicts Hartree–Fock orbitals and density matrices,21 while DeePKS explores generalized functionals applicable to diverse chemical systems.22 Deep density employs the deep potential neural network framework to map atomic structures to electron densities.23,24 Chemception applies deep convolutional neural networks directly to 2D molecular images, achieving competitive performance with expert-designed QSAR/QSPR models without relying on explicit chemical descriptors or domain knowledge.25 In the context of materials science, machine learning can also be integrated by training tight-binding Hamiltonians extracted from ab initio data.26 For instance, DeePTB employs deep neural networks to generate transferable tight-binding models across diverse atomic configurations, enabling efficient and accurate band structure predictions for a wide range of materials.27 Additionally, a kernel-based machine learning approach has been developed to predict DFT Hamiltonians directly from atomic environments. This method offers an accurate, transferable, and scalable alternative to traditional semi-empirical approximations for electronic structure calculations.28 Meanwhile, DeePMD leverages neural networks to fit interatomic potential surfaces, significantly accelerating molecular dynamics simulations with accuracy comparable to DFT.23,29 In efforts to study large-scale materials, such as twisted van der Waals interfaces, the Deep Hamiltonian (DeepH) framework30 and its specialized variants, including DeepH-E3,31 DeepH-Hybrid,32 xDeepH,33 and DeepH-PW,34 represent a significant advancement in Hamiltonian prediction for complex systems. By leveraging machine learning, DeepH makes use of the nearsightedness of pseudo-atomic orbital (PAO) bases,35 transforming what would typically be a global optimization problem into a series of localized calculations that align well with the locality of neural network layers. DeepH applies graph neural networks (GNNs) to represent materials, a technique that has been effectively used in AI for material studies.13 Similarly, HamGNN also utilizes GNNs to efficiently predict Hamiltonians.36 Compared to wavefunctions and charge densities, DeepH-E3 leverages equivariant neural networks to efficiently train local environments and predict the Hamiltonian, providing a more comprehensive representation of the system with smaller model sizes. It also enables density functional perturbation theory calculations37 and contributes to the development of a universal model applicable to all materials.38 The integration of AI with DFT further expands the scope of material design, particularly by advancing molecular dynamics simulations.23,29,39,40 This progress represents a significant milestone in computational materials science, fostering new insights and broadening applications in the study of complex material systems.
Among the various DFT software, HONPAS (Hefei Order-N Packages for Ab Initio Simulations) stands out as a robust tool, especially notable for implementing the HSE06 hybrid functional.41 The HSE06 functional, developed by Jochen Heyd, Gustavo E. Scuseria, and Matthias Ernzerhof, combines Hartree–Fock exchange with DFT correlation, making it effective for accurately predicting electronic structures in systems with challenging van der Waals interactions.42 HONPAS has been successfully applied to materials simulations involving systems with more than 10000 atoms, demonstrating its robustness and scalability for large-scale electronic structure calculations.43–47 However, despite optimizations such as NAO2GTO for efficient computation of the Hartree–Fock exchange (HFX) matrix,48,49 hybrid functional calculations remain computationally demanding and require extensive parallelization, typically limiting practical applications to systems with fewer than a hundred atoms.
Even when generalized to plane-wave DFT, reconstruction from plane-wave Hamiltonians to atomic orbital Hamiltonians remains necessary in DeepH-PW.34 Due to the locality of the DeepH scheme, it currently depends on atomic orbital-based DFT packages. Examples of such software include SIESTA,50 HONPAS,49 FHI-aims,51 OpenMX,52,53 and ABACUS,54 among others. At present, DeepH supports only OpenMX and ABACUS. Therefore, in this work, we extend its compatibility to HONPAS. Like HONPAS, OpenMX utilizes low-scaling methods and achieves excellent parallel performance, although its most accurate functional support is limited to the GGA level. In contrast, HONPAS supports HSE06 functional with a parallelization implementation. Compared to ABACUS, which utilizes the LCAO technique based on atomic orbitals,54,55 HONPAS adopts the NAO2GTO approach and ISDF method for the HFX calculations.48,49,56 These methods significantly improve the efficiency in handling two-electron integrals and hybrid functional calculations, thereby streamlining the dataset preparation process. Built upon SIESTA, HONPAS shares a similar input/output structure and workflow, which facilitates adoption by the large existing SIESTA user community. In DFT calculations, common basis sets include plane waves and atomic orbitals, with the latter well-suited for large-scale parallel computations.51,57,58 For typical materials, double-zeta (DZ) and double-zeta polarized (DZP) basis sets are often used.
Given the well-established hybrid functional capabilities of HONPAS, the DeepH + HONPAS combination is well-suited for machine learning-assisted material calculations, particularly for systems requiring the accuracy of hybrid functionals. Our work leverages the strengths of both HONPAS and DeepH by creating an interface that facilitates hybrid functional calculations.
However, utilizing DeepH effectively requires a solid understanding of material structure and the mechanisms of DFT, as setting its input parameters demands physical insight. To address this, we first introduce the DeepH + HONPAS architecture. We then present our code implementation and performance testing. Finally, we apply this code to predict electronic properties in twisted van der Waals bilayer systems, specifically comparing twisted bilayer graphene and twisted bilayer MoS2. Notably, the hybrid HSE06 functional produces a larger band gap than the PBE functional in gapped MoS2 and at the Γ point in graphene systems.
The SCF iterations, however, constitute the most time-consuming component of DFT calculations. To bypass this bottleneck, DeepH leverages machine learning techniques to learn the Hamiltonian features of a wide range of materials, enabling it to predict the Hamiltonian for new materials with similar chemical composition. The validity of this approach is grounded in the one-to-one mapping between the material structure and the Hamiltonian
, provided that the predicted Hamiltonian is consistent with the one obtained from SCF iterations.
In their work on DeepH, He Li et al. emphasized the importance of directly training the Hamiltonian, not merely because it underlies ab initio DFT, but because this approach bypasses intermediate quantities such as total energies or forces and allows for a more physically interpretable and transferable model, especially in electronic structure prediction tasks.30 To facilitate learning the Hamiltonian, DeepH introduces a cutoff radius, Rc, in the Hamiltonian matrix, defining a local environment that typically comprises only a few atoms. This locality and nearsightedness are ensured by the destructive interference between the many-particle eigenstates.59,60 Leveraging this locality, large Hamiltonians can be related to smaller ones through covariant transformations, where physical quantities transform covariantly. Consequently, the information within these local environments can effectively capture the behavior of the entire Hamiltonian when transformed back.
The local environment Hamiltonian matrix, denoted as H′, can be trained independently and then transformed back into the full Hamiltonian matrix H through a rotation transformation:
![]() | (1) |
In HONPAS, numerical atomic orbitals (NAOs) are fitted to Gaussian type orbitals (GTOs) to capture the non-local features of hybrid functionals. GTOs are particularly useful since they decay more slowly over long-range distances, facilitating more accurate calculations. The approximation is given by:
![]() | (2) |
This approach is highly efficient for evaluating two-electron integrals:
![]() | (3) |
The basis functions remain fixed throughout the DeepH procedure. For the prediction of large-scale structures, necessary information, such as atomic positions, basis sets, and bonds, is represented by the overlap matrix Sij = 〈ϕi|ϕj〉, which encapsulates non-SCF material information. This overlap matrix serves as the initial feature in the input layer, which is then mapped to the Hamiltonian matrix through the trained neural network model:
![]() | (4) |
The core architecture of DeepH's neural network is a message-passing neural network (MPNN) constructed on a crystal graph representation of the material. In this setup, DeepH transforms the material structure into a crystal graph, where each atom is represented by a vertex, vi, and each atom pair by an edge, eij. The initial vertex feature vectors are derived from elemental embeddings, while edge feature vectors are based on Gaussian-type distances. Edge features in DeepH are further represented using real spherical harmonics, Ylm: l = 0 represents distance, while higher l values incorporate relative directional information and Hamiltonian elements, or hoppings, between atoms.
In the message-passing (MP) layers, both vertex and edge features are iteratively updated, with the final edge output representing the Hamiltonian matrix element, Hij. According to the original DeepH paper, the model uses five MP layers and one local coordinate message-passing (LCMP) layer. The neural network contains 471, 409 + 129 × Nout parameters, where Nout is the number of selected orbital pairs.
For a previously unseen structure, creating an effective dataset can be challenging. Consequently, dataset preparation demands physical intuition and a deep understanding of the material structure to make accurate predictions. For instance, in a two-dimensional bilayer system, it is appropriate to randomly shift the top layer in-plane by distances on the order of the unit cell length to simulate various stacking patterns. Alternatively, datasets can be generated from molecular dynamics trajectories. These perturbations on atomic positions incorporate factors like stacking order, interlayer folding, and bond length variations. Thus, this model is capable of predicting similar materials with different bond lengths or interlayer configurations.
The data structure used in DeepH primarily consists of HDF5 (.h5) files. To convert DFT data into a format suitable for neural network input, a preprocessing step is required. This can be executed using the deeph-preprocess tool after configuring the preprocess.ini file. For datasets generated from HONPAS, four essential files are required for each structure: .STRUCT_OUT, .HSX, .ORB_INDX, and .XV. Additionally, an assertion line is added in DeepH to ensure that the DFT output files are located in the correct directory. It is recommended to name the output file as output when running HONPAS, which can be achieved by executing the following command:
In this line, the MPI parallel execution is enabled, and the output file name is set to “output”. Additionally, to save the .HSX file in the output directory, the flag SaveHS .true. is set in the input.fdf file. The .HSX file contains the Hamiltonian and overlap matrices, which are essential for training since the Hamiltonian elements, , are the targets. In the HONPAS input configuration, selecting the appropriate size of the atomic orbitals is crucial as it impacts both the accuracy and computational cost. For example, for carbon atoms, the SZ (single-zeta) basis includes 4 orbitals (one 2s and three 2p orbitals), while the DZP (double-zeta polarized) basis includes 13 orbitals (two 2s, six 2p, and five 3d orbitals). This information, including detailed global positions and symmetries, is available in the .ORB_INDX file.
For the training process, it is recommended to utilize a GPU to accelerate computations, although DeepH is also compatible with the CPU version of PyTorch. Similar to preprocessing, training is conducted using the “deeph-train” command with settings specified in the train.ini configuration file. Two key parameters for the training phase are “atom_fea_len” and “edge_fea_len”, which define the embedding for atomic features and interatomic distances, respectively. These parameters control the dimensionality of the graph neural network. Additionally, the orbital parameter plays an essential role. This parameter is structured as a dictionary that lists all the Hamiltonian elements for the desired orbital pairs. Below is an example showcasing simple SZ orbital overlaps in graphene:
In the dictionary configuration, the key “6 6” represents a pair of carbon atoms, while the entry “[1, 2]” specifies the Hamiltonian matrix element at position (2, 3) for these two atoms, each having 4 orbitals. For a larger basis set or when different elements are included, the dictionary will adjust in dimension according to the number of orbitals assigned to describe each atom. The DeepH package provides a script to automatically generate all orbital pairs in dictionary format.
During training, the learning rate is adapted based on the decay rate of the validation loss. Starting at an initial rate of 0.001, it is progressively reduced to 0.0004, then to 0.0002, each time the loss plateaus. For extended training sessions, DeepH includes options for “pretrained_model” and “resume” to continue from a saved state, while all training history is stored in the specified “train_dir”.
In the prediction phase, the “deeph-inference” code is used to apply the trained model to a new structure for inference. The “inference.ini” file contains a “task” list parameter, which includes the following steps: (1) parse overlap, (2) obtain local coordinates, (3) compute the predicted Hamiltonian, (4) rotate back, and (5) perform sparse calculations. These tasks can be executed sequentially.
Since DeepH requires the overlap matrix of the final large structure, we set the maximum number of self-consistency field (SCF) iteration steps to 1 in HONPAS, ensuring that no SCF calculation is performed during the DFT run. This approach minimizes the time required to obtain the overlap matrix, which is stored in the .HSX file. To retrieve only the overlap, the following settings in the input.fdf file are necessary:
The “ForceAuxCell” flag is activated to ensure that the diagonal elements of the overlap matrix are set to 1 during the single Gamma point calculation. The .HSX file contains both the Hamiltonian and overlap matrix, which can be parsed using the same code as in the preprocessing stage. After task (4), the predicted Hamiltonian is stored in the working directory. To verify the prediction, the band structure is calculated in task (5). In the interface, we output the .bands file in HONPAS format, which is generated after performing sparse matrix diagonalization of the band Hamiltonians. Additionally, we also output a result.mat file containing the band structure, making it easy to read using MATLAB and eliminating the need to search for post-processing scripts specific to the DFT package.
We highlight that, through the machine learning approach, we can significantly reduce the computational cost, enabling the calculation of such structures on personal computers rather than relying on supercomputers. In Fig. 2, we present the band structures of TBG at the three twist angles, calculated directly from HONPAS and predicted by the trained model. The bands are well-aligned, demonstrating the remarkable ability of the model to predict unseen structures accurately. Fig. 2(d)–(f) display the band structures predicted using the DeepH framework, while Fig. 2(g)–(i) present results obtained with the DeepH-E3 framework. Both approaches exhibit similar performance; however, the DeepH-E3 framework features a smaller model size, leading to reduced memory consumption during training. Additionally, our implementation includes scripts for utilizing the DeepH-E3 framework.
We prepare the HONPAS dataset based on the published dataset from DeepH's work.67 For reproduction of the result of this work, we also publish the HONPAS dataset.68 The graphene dataset consists of 300 4 × 4 bilayer graphene cells (Fig. 3(a)), which were generated from molecular dynamics simulations of AA-stacked bilayer graphene at T = 300 K. These structures are then calculated directly in HONPAS to produce their Hamiltonians, overlap matrices, and band structures for benchmarking. The k-points are sampled using the Monkhorst–Pack method with a 4 × 4 × 1 grid. The pseudo-atomic orbital (PAO) basis set is chosen to be DZ to ensure accuracy. We adopt the generalized gradient approximation (GGA) exchange–correlation functional, parameterized by Perdew, Burke, and Ernzerhof (PBE). The convergence threshold for the density matrix is set to 10−5 eV.
![]() | ||
Fig. 3 (a) A dataset consisting of 300 distinct 4 × 4 bilayer graphene (BG) cells used for training. Each structure contributes a Hamiltonian matrix to the dataset. (b) Heatmap of the converged validation loss function for interactions between carbon atoms, with a convergence threshold of 1 × 10−4 for the learning rate. (c) The moiré pattern of magic-angle twisted bilayer graphene (MATBG) at a twist angle of 1.05°, with the frame indicating its unit cell, which contains 11![]() |
The dimension of the overlap matrix for each pair of atoms is uniformly 8 × 8, as only carbon element is present in the material, and the DZ basis set for carbon contains 8 orbitals. To highlight the performance advantage of GPU acceleration in our workflow, we benchmarked training times on the bilayer graphene dataset using both CPU and CUDA backends. Specifically, we compared a CPU workstation with two Hygon C86 7285 32-core processors (PyTorch 2.1.0+cpu) and a desktop equipped with an Intel i9-10850K and an NVIDIA RTX 4090 GPU (PyTorch 2.1.0+cu121). When using 32 CPU threads, the training time per epoch ranged from 280 to 300 seconds. In contrast, with GPU acceleration and 8 CPU threads, the training time was reduced to approximately 10 seconds per epoch. This >25× speedup underscores the practical necessity of CUDA support for efficiently handling large or high-resolution datasets in our framework. Typically, convergence can be achieved at 2000 to 3000 epochs, depending on the cutoff radius and accuracy. The preprocessing time is typically in the range of hundreds of seconds (e.g., 262.96 s with 10 threads). Therefore, the combined time cost of preprocessing and training is practically negligible, and this process only needs to be performed once, provided the predicting structure shares the same element species and similar dimensions as the dataset.
Using the model trained on the bilayer graphene dataset, we study larger-scale TBG systems, including magic-angle twisted bilayer graphene (MATBG) and twisted trilayer graphene (MATTG). The twisting configuration of MATBG is (32, 31) with a twist angle of 1.05°, containing 11908 carbon atoms (Fig. 3(a)) and that of MATTG (ABA) is near (22, 21) 1.54° with 8322 carbon atoms (Fig. 4(a)). Here pairs of integers (32, 31) and (22, 21) denote the commensurate rotation vector with graphene unit cell vectors as bases. The theoretically predicted magic angle for ABA stacking trilayer graphene is near 1.5°,
times that of bilayer counterpart. ABA stacking MATTG is predicted to host both flat bands and Dirac cone as well as superconductivity.69–75 In Fig. 3(b), the validation loss at the final step of training has a maximum amplitude of 0.1300 meV, indicating that the neural network model has converged and has an acceptable error margin of 10−4 eV. The largest matrix element, located at position (3, 3), corresponds to the hopping between py orbitals of carbon atoms. Other significant elements correspond to hoppings between the in-plane orbitals, px and py, where the z-axis is defined as the out-of-plane direction perpendicular to the material. The model's ability to predict large structures, as shown in Fig. 3(c), demonstrates DeepH's excellent transferability for unseen structures with similar chemical compositions. In Fig. 3(d), we present the flat band structure of magic-angle graphene. The predicted bands near the zero-energy Fermi level are flat, which is consistent with the continuum model, the tight-binding (TB) model, and qualitatively with experimental observations.65
We also show the predicted electronic band structure of MATTG, which contains a set of flat bands and a Dirac cone (Fig. 4(b)). Though the localized environment surrounding a particular atom of TTG is difference from that of TBG, they both can be accounted by interlayer and intralayer hoppings. Therefore, MATTG Hamiltonian can still be predicted by the neural network model trained with bilayer graphene dataset.
Hybrid functional Hamiltonians are fundamentally different from the LDA or GGA ones, as the nonlocal exact exchange part modifies both the matrix elements and the optimized cutoff radius for training. We prepare the dataset by performing HSE06 functional calculations on 300 bilayer graphene structures using HONPAS. The cutoff radius is optimized to 5.8 bohr to achieve the best electronic band structure fitting. The fitted bands for TBG are shown in Fig. 5. In panel (a), direct DFT calculations of 7.34° TBG using PBE and HSE06 are carried out to highlight the slight difference at the Γ point. The gap at the Γ point obtained by HSE06 (2.52 eV) is larger than that of PBE (2.15 eV). In Fig. 5(b)–(d), band structures for 7.34°, 4.41°, and 2.88° TBG, obtained by both HONPAS and DeepH + HONPAS, are presented. As a result, the neural network is able to capture the differences between functionals, as long as the corresponding dataset is prepared properly.
Hybrid functionals play a more important role in gapped semiconductors, where the PBE functional underestimates band gaps. In Fig. 7, the HSE06 band structures of 7.34° TBMoS2 calculated using both HONPAS and DeepH + HONPAS show excellent agreement at the valence and conduction band edges. The deviations away from the band edges are due to the necessary cutoff radius in the crystal graph neural network, which needs to be balanced between efficiency and accuracy due to the inherent nearsightedness in neural network training. Also, band gap of PBE functional (1.18 eV) in Fig. 6(d) is smaller than that of HSE06 (1.69 eV) functional in Fig. 7.
![]() | ||
Fig. 7 HSE06 band structures of TBMoS2. The band fitting parameters are provided in the SI. The predicted HSE06 band gap is 0.51 eV larger than the PBE results. |
In Fig. 9, we present a comparison of single-core CPU time between DeepH + HONPAS and HONPAS. This time cost includes the first three substeps of the inference process. DeepH + HONPAS shows a remarkable improvement in efficiency, accelerating the acquisition of the Hamiltonian matrix by nearly 100 times for PBE to 500 times for HSE06. This efficiency also enables the study of larger systems, where machine learning can offer even greater advantages over traditional DFT calculations.
The SI includes model training validation errors, statistical analyses, band fitting parameters, and the effects of cutoff radius selection. See DOI: https://doi.org/10.1039/d5dd00128e.
This journal is © The Royal Society of Chemistry 2025 |