Mutual information is one of the most popular criteria used in feature selection, for which many estimation techniques have been proposed. The large majority of them are based on probability density estimation and perform badly when faced to high-dimensional data, because of the curse of dimensionality. However, being able to evaluate robustly the mutual information between a subset of features and an output vector can be of great interest in feature selection. This is particularly the case when some features are only jointly redundant or relevant. In this paper, different mutual information estimators are compared according to important criteria for feature selection; the interest of a nearest neighbors-based estimator is shown.
Doquire, G., & Verleysen, M. (2013). A Performance Evaluation of Mutual Estimators for Multivariate Feature Selection. In P.L.Carmona et al. (ed.), Pattern Recognition - Applications and Methods (p. p. 51-63). Springer-Verlag. https://doi.org/10.1007/978-3-642-36530-0_5