The large-scale dissemination of information in digital environments has increased the volume and heterogeneity of content, complicating the assessment of information quality. We present a lightweight, interpretable and non-invasive framework for assessing information quality based solely on diffusion features grounded in theoretical information quality dimensions, validated here through academic citation dynamics as a structured empirical proxy. Using a heterogeneous dataset of 29,264 sciences, technology, engineering, mathematics (STEM) and social science papers from ArnetMiner and OpenAlex, we model the diffusion network of each paper as a set of three theoretically motivated features: diversity, timeliness, and salience. A Generalized Additive Model (GAM) trained on these features achieved Pearson correlations of 0.834 for next-year citation gain and up to 95.62% accuracy in predicting high-impact papers. Feature relevance studies reveal timeliness and salience as the most robust predictors, while diversity offers less stable benefits in the academic setting but may be more informative in social media contexts. The framework’s transparency, domain-generalizable design, and minimal feature requirements position it as a scalable tool for information quality assessment in observable diffusion networks. While demonstrated here using academic citation dynamics, the feature definitions are directly adaptable to other diffusion settings; empirical validation in non-academic domains remains an important avenue for future work.
Lopes Temporao, A., Temporão, M., Vande Kerckhove, C., & Abreu Araujo, F. (2026). Towards a general diffusion-based information quality assessment model. npj Artificial Intelligence. Accepted/in-press. https://doi.org/10.1038/s44387-026-00119-w (Original work published 2026)