Combining internal representations of a pre-trained Transformer model, such as the popular BERT, is an interesting and challenging task nowadays. Usually, internal representations are combined by simple heuristics, e.g. concatenation or average of a subset of layers, with a consequent need for calibrating multiple hyper-parameters during the fine-tuning phase. Inspired by the recent literature, we propose a principled approach to optimally combine internal representations of a Transformer model via Multiple Kernel Learning strategies. Broadly speaking, the proposed system consists of two elements. The former is a canonical Transformer model fine-tuned on the target task. The latter is a Multiple Kernel Learning algorithm that extracts and combines representations developed in the internal layers of the Transformer and performs predictions. Most important, we use the system as a powerful tool to inspect the information encoded into the Transformer network, emphasizing the limits of state-of-the-art models.

Exploring the structure of BERT through Kernel Learning

Lauriola I.;Aiolli F.
2021

Abstract

Combining internal representations of a pre-trained Transformer model, such as the popular BERT, is an interesting and challenging task nowadays. Usually, internal representations are combined by simple heuristics, e.g. concatenation or average of a subset of layers, with a consequent need for calibrating multiple hyper-parameters during the fine-tuning phase. Inspired by the recent literature, we propose a principled approach to optimally combine internal representations of a Transformer model via Multiple Kernel Learning strategies. Broadly speaking, the proposed system consists of two elements. The former is a canonical Transformer model fine-tuned on the target task. The latter is a Multiple Kernel Learning algorithm that extracts and combines representations developed in the internal layers of the Transformer and performs predictions. Most important, we use the system as a powerful tool to inspect the information encoded into the Transformer network, emphasizing the limits of state-of-the-art models.
2021
Proceedings of the International Joint Conference on Neural Networks
2021 International Joint Conference on Neural Networks, IJCNN 2021
978-1-6654-3900-8
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3440276
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact