Forget early exaggeration in t-SNE: early hierarchization preserves global structure

Lee, John;Couplet, Edouard;Lambert, Pierre;Journaux, Ludovic;Verleysen, Michel;et.al.
(2024) European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (9.October.2024)

Files

Lee2024-146.pdf
  • Open Access
  • Adobe PDF
  • 3.63 MB

Details

Authors
  • Lee, Johnorcid-logoUCLouvain
    Author
  • Author
  • Lambert, PierreUCLouvain
    Author
  • Journaux, LudovicLaboratoire d’Informatique de Bourgogne
    Author
  • Mulders, Douniaorcid-logoUCLouvain
    Author
  • de Bodt, Cyrilorcid-logoUCLouvain
    Author
  • Author
Show more
Abstract
As a local method of dimensionality reduction, t-SNE requires careful initialization in order to preserve the data global structure to the best extent. In regular t-SNE, the low-dimensional embedding is initialized either randomly or with PCA; next, gradient descent refines the embedding coordinates in two phases. In the first one, called early exaggeration, attractive forces between points are artificially strengthened to delay any detrimental effect of repulsive forces while points are still poorly organized. In this paper, a novel initialization of t-SNE is proposed. It works by hierarchizing the data points into a space-partitioning binary tree and successive runs of t-SNE with 4, 8, 16, ..., N points. Between two runs, the prototypical point in each tree branch is split into its two children prototypes, with some little random noise, and the embedding is rescaled to account for the increased population. Experimental results show the effectiveness of the method. The proposed method is compatible with any method of neighbor embedding (t-SNE, UMAP, etc.) provided early exaggeration can be disabled and initial coordinates can be fed into.
Affiliations

Citations

Lee, J., Couplet, E., Lambert, P., Journaux, L., Mulders, D., de Bodt, C., & Verleysen, M. (2024). Forget early exaggeration in t-SNE: early hierarchization preserves global structure. ESANN 2024 proceedings. Published. European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. https://hdl.handle.net/2078.5/233864