Semantic change detection task in a rel atively low-resource language like Italian is challenging. By using contextualized word embeddings, we formalize the task as a distance metric for two flexible-size sets of vectors. Various distance met rics like average Euclidean Distance, av erage Canberra distance, Hausdorff dis tance, as well as Jensen Shannon diver gence between cluster distributions based on K-means clustering and Gaussian mix ture model are used. The final predic-tion is given by an ensemble of top-ranked words based on each distance metric. The proposed method achieved better perfor-mance than a frequency and collocation based baselines.

University of Padova @ DIACR-Ita

Wang B.;Di Buccio E.;Melucci M.
2020

Abstract

Semantic change detection task in a rel atively low-resource language like Italian is challenging. By using contextualized word embeddings, we formalize the task as a distance metric for two flexible-size sets of vectors. Various distance met rics like average Euclidean Distance, av erage Canberra distance, Hausdorff dis tance, as well as Jensen Shannon diver gence between cluster distributions based on K-means clustering and Gaussian mix ture model are used. The final predic-tion is given by an ensemble of top-ranked words based on each distance metric. The proposed method achieved better perfor-mance than a frequency and collocation based baselines.
2020
CEUR Workshop Proceedings
7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop, EVALITA 2020
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3368700
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact