Séminaire du CultureLab III/2026

Prise de notes

Similarité sémantique et étymologie

Lieu
Salle de réunion de l'Institut Jean Nicod (ENS - 29 rue d'Ulm, 75005 Paris)
Date
15h30
La prochaine séance du séminaire du CultureLab aura lieu vendredi 19 juin 2026 à 15h30 à l’École Normale Supérieure (salle de réunion de l'Institut Jean Nicod, 29 rue d'Ulm, 75005 Paris). Nous aurons le plaisir d'accueillir Alexandre François, Konstantin Henke et Mathieu Dehouck (LATTICE, CNRS, École Normale Supérieure, Université Sorbonne nouvelle) qui présenteront le projet EvoSem qui étudie l'évolution sémantique des concepts à partir de leur étymologie. 


Résumé de l'exposé (en anglais)

Assessing semantic similarity through EvoSem, a catalogue of etymological links

Semantic similarity among concepts can be assessed using many different metrics. One of them is colexification, i.e. when two concepts can be expressed by the same form in certain languages (François 2008); e.g. FEEL and HEAR are colexified by Italian sentire. A more recent approach to assessing semantic closeness is dialexification (François f/c), i.e. when two meanings can be at least related through etymology. For example, the concepts FAMILY and CITY are semantically so distant that no language colexifies them together in synchrony; but they are still close enough that they can be “dialexified”, i.e. expressed by cognate words (descendants of the same root). Thus, Gothic gards FAMILY is cognate with Polish gród CITY, and both descend from the Proto-Indo-European root *gʰórdʰos. The same dialexification pattern {FAMILY : CITY} is found under Proto-Semitic *ʔahl-, with {Afar áhli FAMILY : Akkadian ālum CITY}. All in all, dialexification captures broader semantic networks than what colexifi-cation can do on its own.

These reflections led to the creation of EvoSem (tiny.cc/EvoSem), an empirical catalogue of dialexification patterns across the world’s languages (Dehouck et al. 2023). In this joint talk, Alex François will present the main tenets underlying the creation of EvoSem. Konstantin Henke will present the challenges of harvesting etymological dictionaries in an automated way, and will show the value of creating etymology trees to represent the history of words ‒ including borrowings. Finally, Mathieu Dehouck will show how such a resource can be used to detect interesting pairs of concepts – i.e. concepts dialexified more than expected in a given context – and will discuss why such pairs can appear.

References
Dehouck, Mathieu, Alexandre François, Siva Kalyan, Martial Pastor & David Kletz. 2023. EvoSem: A database of polysemous cognate sets. In Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change, 66–75. Singapore: Association for Computational Linguistics. [link]
François, Alexandre. 2008. Semantic maps and the typology of colexification: Intertwining poly-semous networks across languages. In M. Vanhove (ed.), From Polysemy to Semantic change: Towards a typology of lexical semantic associations, 163–215. Amsterdam: Benjamins. [link]
— (f/c) Recent advances in lexical typology: Comparing patterns of lexification. In A. Aikhenvald & RMW Dixon (eds.). The Cambridge Handbook of Linguistic Typology, 2nd ed. Cambridge: CUP.