Machine learning on multiple topological materials datasets

He, Yuqing;De Breuck, Pierre-Paul;Weng, Hongming;Giantomassi, Matteo;Rignanese, Gian-Marco
(2025) npj Computational Materials — Vol. 11, n° 1, p. 181 (2025)

Files

he2025.pdf
  • Open Access
  • Adobe PDF
  • 1.88 MB

Details

Authors
  • He, YuqingUCLouvain
    Author
  • De Breuck, Pierre-Paulorcid-logoUCLouvain
    Author
  • Weng, Hongmingorcid-logoBeijing National Laboratory for Condensed Matter Physics and Institute of Physics, Chinese Academy of Sciences, Beijing, China.
    Author
  • Author
  • Author
Abstract
A dataset of 35,608 materials with their topological properties is constructed by combining the density functional theory (DFT) results of Materiae and the Topological Materials Database. Thanks to this, machine-learning approaches are developed to categorize materials into five distinct topological types, with the XGBoost model achieving an impressive 85.2%classification accuracy. By conducting generalization tests on different sub-datasets, differences are identified between the original datasets in terms of topological types, chemical elements, unknown magnetic compounds, and feature space coverage. Their impact on model performance is analyzed. Turning to the simpler binary classification between trivial insulators and nontrivial topological materials, three different approaches are also tested. Key characteristics influencing material topology are identified, with the maximum packing efficiency and the fraction of p valence electrons being highlighted as critical features.
Affiliations

Citations

He, Y., De Breuck, P.-P., Weng, H., Giantomassi, M., & Rignanese, G.-M. (2025). Machine learning on multiple topological materials datasets. npj Computational Materials, 11(1), 181. https://doi.org/10.1038/s41524-025-01687-2 (Original work published 2025)