Compiling a multi-register corpus of learner English: Rationale and design

De Cock, Sylvie; Gilquin, Gaëtanelle; Granger, Sylviane; Jadoulle, Pauline; Paquot, Magali

Compiling a multi-register corpus of learner English: Rationale and design

De Cock, Sylvie

;

Gilquin, Gaëtanelle

;

Granger, Sylviane

;

Jadoulle, Pauline

;

Paquot, Magali

(2025) PLIN Linguistic Day 2025 — Location: Université catholique de Louvain (18.April.2025)

Files

No attached file found for this publication.

Details

Authors

De Cock, SylvieUCLouvain
Author
Gilquin, GaëtanelleUCLouvain
Author
Granger, SylvianeUCLouvain
Author
Jadoulle, PaulineUCLouvain
Author
Paquot, MagaliUCLouvain
Author

Abstract

(en) Register variation is a crucial aspect of language production. Depending on the context in which it is used and the communicative purposes that it serves, language tends to display distinctive characteristics, which may have to do with lexis, but also phraseology or syntax, among others (e.g. Biber 2012). While register is important for any type of language, it is particularly relevant to learner language, because learners may not show the same register awareness as native or expert writers/speakers (Gilquin & Paquot 2008). Corpora have raised awareness of register variation and have provided valuable insights into linguistic patterns associated with different registers. Specific methods based on corpora have also been developed to study register variation, most notably multi-dimensional analysis (Biber 1988). In learner corpus research, register has been taken into account in the sense that studies are carried out on the basis of learner corpora representing certain registers (e.g. telecollaborative discourse in Vyatkina 2012). However, studies comparing learner language registers are still relatively rare, with a few exceptions such as Fuchs et al. (2016) or Larsson et al. (2021). One reason for the lack of register studies in learner corpus research is that, until recently, learner corpora represented only a small range of registers, most notably argumentative essays for writing (as in ICLE, Granger et al. 2020) and interviews for speech (as in LINDSEI, Gilquin et al. 2010). In addition, when learner corpus researchers have compared different registers, it has mainly been on the basis of texts produced by different learners (e.g. argumentative essays produced by one group of students and interviews produced by another group). A possible issue with this method is that individual writers’/speakers’ styles may affect the comparison of registers. In this poster, we describe the compilation of a new corpus of learner English that brings together texts from multiple registers produced by the same learners. The data are being collected at UCLouvain among (mainly) French-speaking learners of English who are students in their second year of English major studies. These students are required to produce written and spoken texts in English representing both formal and less formal registers. The written registers include a career readiness essay, a cover letter, a persuasive essay, a critical literacy narrative and diary entries. The spoken registers include a monologue about students’ future career, a debate and an informal conversation between two students. We have made special efforts to collect rich metadata about the learners and the tasks, relying on Paquot et al.’s (2024) Core Metadata Schema for Learner Corpora. We have included information such as learners’ knowledge of languages, their exposure to English in different situations, but also their literacy. Detailed information about the tasks (e.g. instructions, time constraints and use of language reference tools for writing) and the registers (e.g. communicative purposes, settings, number of addressees) is provided too. Once completed, this corpus, which we intend to release in open access format, will make it possible to compare registers while controlling for individual styles, and to investigate the effect of register on the linguistic features of learner English.

Affiliations

UCLouvainSSH/ILC/PLIN - Pôle de recherche en linguistique

Citations

APA
Chicago
FWB

De Cock, S., Gilquin, G., Granger, S., Jadoulle, P., & Paquot, M. (2025). Compiling a multi-register corpus of learner English: Rationale and design. PLIN Linguistic Day 2025, Université catholique de Louvain. https://hdl.handle.net/2078.5/248823