Semantic change detection task in a rel atively low-resource language like Italian is challenging. By using contextualized word embeddings, we formalize the task as a distance metric for two flexible-size sets of vectors. Various distance met rics like average Euclidean Distance, av erage Canberra distance, Hausdorff dis tance, as well as Jensen Shannon diver gence between cluster distributions based on K-means clustering and Gaussian mix ture model are used. The final predic-tion is given by an ensemble of top-ranked words based on each distance metric. The proposed method achieved better perfor-mance than a frequency and collocation based baselines.
University of Padova @ DIACR-Ita
Wang B.;Di Buccio E.;Melucci M.
2020
Abstract
Semantic change detection task in a rel atively low-resource language like Italian is challenging. By using contextualized word embeddings, we formalize the task as a distance metric for two flexible-size sets of vectors. Various distance met rics like average Euclidean Distance, av erage Canberra distance, Hausdorff dis tance, as well as Jensen Shannon diver gence between cluster distributions based on K-means clustering and Gaussian mix ture model are used. The final predic-tion is given by an ensemble of top-ranked words based on each distance metric. The proposed method achieved better perfor-mance than a frequency and collocation based baselines.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.