Efficient Multilingual Deep Learning Model for Keyword Categorization

Polato, M.; Demchenko, D.; Kuanyshkereyev, A.; Navarin, N.

doi:10.1109/SSCI50451.2021.9660132

Keywords categorization is an essential tool for SEO (Search Engine Optimization), digital marketers, and online advertising. Keywords represent one of the most valuable pieces of information to infer the users' intents and interests. An effective keyword categorization method allows understanding what types of content are in the greatest demand and can help improve future content strategies or marketing/ad campaigns. In this paper, we present a novel deep learning model for multilingual keyword categorization. The model relies on fastText multilingual word embeddings, and its architecture is inspired by the DeepSets model. To make use of (training) words not included in the pre-trained fastText embeddings, we initialize them as the average embedding overall of the co-occurrent words. Then, we fine-tune these representations by allowing the network to back-propagate the error to the input. We assess the quality of our proposal on a real-world dataset provided by a Spanish company where keywords are categorized upon the Google Product Taxonomy (GPT). Empirical results show that our model can achieve high accuracy scores while being extremely efficient.