Exploring the structure of BERT through Kernel Learning

Combining internal representations of a pre-trained Transformer model, such as the popular BERT, is an interesting and challenging task nowadays. Usually, internal representations are combined by simple heuristics, e.g. concatenation or average of a subset of layers, with a consequent need for calibrating multiple hyper-parameters during the fine-tuning phase. Inspired by the recent literature, we propose a principled approach to optimally combine internal representations of a Transformer model via Multiple Kernel Learning strategies. Broadly speaking, the proposed system consists of two elements. The former is a canonical Transformer model fine-tuned on the target task. The latter is a Multiple Kernel Learning algorithm that extracts and combines representations developed in the internal layers of the Transformer and performs predictions. Most important, we use the system as a powerful tool to inspect the information encoded into the Transformer network, emphasizing the limits of state-of-the-art models.

Exploring the structure of BERT through Kernel Learning

Lauriola I.;Lavelli A.;Moschitti A.;Aiolli F.

2021

Abstract

Combining internal representations of a pre-trained Transformer model, such as the popular BERT, is an interesting and challenging task nowadays. Usually, internal representations are combined by simple heuristics, e.g. concatenation or average of a subset of layers, with a consequent need for calibrating multiple hyper-parameters during the fine-tuning phase. Inspired by the recent literature, we propose a principled approach to optimally combine internal representations of a Transformer model via Multiple Kernel Learning strategies. Broadly speaking, the proposed system consists of two elements. The former is a canonical Transformer model fine-tuned on the target task. The latter is a Multiple Kernel Learning algorithm that extracts and combines representations developed in the internal layers of the Transformer and performs predictions. Most important, we use the system as a powerful tool to inspect the information encoded into the Transformer network, emphasizing the limits of state-of-the-art models.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Titolo del Libro
	
				Proceedings of the International Joint Conference on Neural Networks
			
	Collana/serie monografica
	
				PROCEEDINGS OF ... INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS
			
	Titolo convegno
	
				2021 International Joint Conference on Neural Networks, IJCNN 2021
			
	Codice DOI
	
				https://dx.doi.org/10.1109/IJCNN52387.2021.9534117
			
	Codice WOS
	
				WOS:000722581706069
			
	Codice Scopus
	
				2-s2.0-85116450019
			
	Codice ISBN
	
				978-1-6654-3900-8
			
	Appare nelle tipologie:
	
				04.01 - Contributo in atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3440276

Citazioni

ND

0

0

ND

social impact