This study deals with the description of an efficient methodology for the development of a new supervised machine learning system for the extraction and classification of semi-compositional and compositional « verbal terminological collocations » - as defined in Costa/Silva (2004) - which appear in medical language. In particular, we describe the phases of preprocessing of the specialized corpus, automatic extraction of base (noun) + collocate (verb) and formulation of the ground truth for the classification of terminological collocations in order to train and validate an effective automatic system for the French language, which is optimized on the basis of the adopted theoretical premise.
Vers une méthodologie pour l’extraction et la classification automatiques des collocations terminologiques verbales en langue médicale
federica vezzani
2023
Abstract
This study deals with the description of an efficient methodology for the development of a new supervised machine learning system for the extraction and classification of semi-compositional and compositional « verbal terminological collocations » - as defined in Costa/Silva (2004) - which appear in medical language. In particular, we describe the phases of preprocessing of the specialized corpus, automatic extraction of base (noun) + collocate (verb) and formulation of the ground truth for the classification of terminological collocations in order to train and validate an effective automatic system for the French language, which is optimized on the basis of the adopted theoretical premise.File | Dimensione | Formato | |
---|---|---|---|
vezzani-degruyter.pdf
Open Access dal 25/10/2024
Tipologia:
Published (publisher's version)
Licenza:
Accesso libero
Dimensione
143.93 kB
Formato
Adobe PDF
|
143.93 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.