Parameter-free feature selection with mutual information

Verleysen, Michel; François, Damien

Parameter-free feature selection with mutual information

;

(2008) First workshop of the ERCIM Working Group on Computing and Statistics — Location: Neuchâtel/Suisse (19.June.2008)

Files

150-Parameter-freefeatureselectionwithmutualinformation.pdf

Restricted Access
Adobe PDF
902.25 KB

Request a copy

Details

Authors

Verleysen, MichelUCLouvain
Author
François, DamienUCLouvain
Author

Abstract

Machine learning of high-dimensional data faces the curse of dimensionality, a set of phenomena that limit the performance of the tools. Many limitations come directly from the representation of the data, and not from the analysis tool. It is therefore needed to reduce the data dimensionality. There are basically two ways to do this: either to select features among the original variables, or to project the latter on new ones. Although more general and thus more powerful in theory, projecting features induces a loss of interpretability. On the contrary, by selecting original features, one can come back to the application and interpret which are the relevant factors for the analysis; this is important advantage in many applications. This paper shows how to use Mutual Information (MI) for feature selection. In practice, the MI criterion has to be estimated and the search for possible feature subsets restricted for computation time reasons. It is shown how to use resampling and permutation tests to select optimal parameters for the estimator, and to stop the search procedure in a sound way. It is also shown how to design an estimator of feature subset relevance inspired from the mutual information criterion, with the supplementary advantage to restrict the estimation to a two-dimensional problem.

Affiliations

UCLouvainFSA/ELEC - Département d'électricité

Citations

APA
Chicago
FWB

Verleysen, M., & François, D. (2008). Parameter-free feature selection with mutual information. Proceedings of the first workshop of the ERCIM Working Group on Computing and Statistics, p. 13. https://hdl.handle.net/2078.5/254139