An Extensible Massively Multilingual Lexical Simplification Pipeline Dataset using the MultiLS Framework

Shardlow, Matthew; Alva-Manchego, Fernando; Batista-Navarro, Riza; Bott, Stefan; Calderon Ramirez, Saul; Cardon, Rémi; François, Thomas; Hayakawa, Akio; Horbach, Andrea; Hülsing, Anna; Ide, Yusuke; Marvin Imperial, Joseph; Nohejl, Adam; North, Kai; Occhipinti, Laura; Peréz Rojas, Nelson; Raihan, Nishat; Ranasinghe, Tharindu; Solis Salazar, Martin; Zampieri, Marcos

An Extensible Massively Multilingual Lexical Simplification Pipeline Dataset using the MultiLS Framework

Shardlow, Matthew

;

Alva-Manchego, Fernando

;

Batista-Navarro, Riza

;

Bott, Stefan

;

Saggion, Horacio

;et.al.

(2024) Tools and Resources for People with REAding DIfficulties (READI) @ LREC-COLING 2024 — Location: Torino, Italia (20.May.2024)

Files

2024readi-14.pdf

Open Access
Adobe PDF
194.63 KB

Download

Details

Authors

Shardlow, MatthewManchester Metropolitan University
Author
Alva-Manchego, FernandoCardiff University
Author
Batista-Navarro, RizaUniversity of Manchester
Author
Bott, StefanUniversitat Pompeu Fabra
Author
Cardon, RémiUCLouvain
Author
François, ThomasUCLouvain
Author
Saggion, HoracioUniversitat Pompeu Fabra
Author

Abstract

We present preliminary findings on the MultiLS dataset, developed in support of the 2024 Multilingual Lexical Simplification Pipeline (MLSP) Shared Task. This dataset currently comprises of 300 instances of lexical complexity prediction and lexical simplification across 10 languages. In this paper, we (1) describe the annotation protocol in support of the contribution of future datasets and (2) present summary statistics on the existing data that we have gathered. Multilingual lexical simplification can be used to support low-ability readers to engage with otherwise difficult texts in their native, often low-resourced, languages.

Affiliations

UCLouvainSSH/ILC/PLIN - Pôle de recherche en linguistique

Citations

APA
Chicago
FWB

Shardlow, M., Alva-Manchego, F., Batista-Navarro, R., Bott, S., Calderon Ramirez, S., Cardon, R., François, T., Hayakawa, A., Horbach, A., Hülsing, A., Ide, Y., Marvin Imperial, J., Nohejl, A., North, K., Occhipinti, L., Peréz Rojas, N., Raihan, N., Ranasinghe, T., Solis Salazar, M., et al. (2024). An Extensible Massively Multilingual Lexical Simplification Pipeline Dataset using the MultiLS Framework. In Rodrigo Wilkens, Rémi Cardon, Amalia Todirascu, Núria Gala (ed.), Proceedings of the 3rd Workshop on Tools and Resources for People with REAding DIfficulties (READI) @ LREC-COLING 2024 (ELRA and ICCL, p. p. 38-46). ELRA and ICCL. https://hdl.handle.net/2078.5/231508