Datatractor: Metadata, automation, and registries for extractor interoperability in the chemical and materials sciences

Evans, Matthew;Rignanese, Gian-Marco;Elbert, David;Kraus, Peter
(2025) MRS Bulletin — Vol. 50, n° 7, p. 838-845 (2025)

Files

evans2025.pdf
  • Open Access
  • Adobe PDF
  • 1.55 MB

Details

Authors
  • Evans, Mattheworcid-logoUCLouvain
    Author
  • Author
  • Elbert, Davidorcid-logoHopkins Extreme Materials Institute, Johns Hopkins University, Baltimore, USA
    Author
  • Kraus, Peterorcid-logoTechnische Universität Berlin, Berlin, Germany
    Author
Abstract
Two key issues hindering the transition toward findable, accessible, interoperable, and reusable (FAIR) data science are the poor discoverability and inconsistent instructions for the use of data extractor tools (i.e., how we go from raw data files created by instruments, to accessible metadata and scientific insight). If the existing format conversion tools are hard to find, install, and use, their reimplementation will lead to a duplication of effort and an increase in the associated maintenance burden is inevitable. The Datatractor framework presented in this article addresses these issues. First, by providing a curated registry of such extractor tools, their discoverability will increase. Second, by describing them using a standardized but lightweight schema, their installation and use are machine-actionable. Finally, we provide a reference implementation for such data extraction. The Datatractor framework can be used to provide a public-facing data extraction service, or be incorporated into other research data management tools providing added value.
Affiliations

Citations

Evans, M., Rignanese, G.-M., Elbert, D., & Kraus, P. (2025). Datatractor: Metadata, automation, and registries for extractor interoperability in the chemical and materials sciences. MRS Bulletin, 50(7), 838-845. https://doi.org/10.1557/s43577-025-00925-8 (Original work published 2025)