Today managing textual resources and providing full-text search capabilities on them is a relevant issue also for database management systems. Stemming is part of the indexing and searching processes, when we deal with textual resources. In this paper we present a languageindependent probabilistic model which can automatically generate stemmers for several different languages. The variety of word forms makes the match between the end user’s words and the document words impossible even if they refer to the same concept - this mismatch degrades retrieval performance. Stemmers can improve the retrieval effectiveness, but the design and the implementation of stemmers requires a laborious amount of effort. The proposed model describes the mutual reinforcement relationship between stems and derivations and then provides a probabilistic interpretation of it. A series of experiments shows that the stemmers generated by the probabilistic model are as effective as the ones based on linguistic knowledge.

A Probabilistic Model for Stemmer Generation

BACCHIN, MICHELA;FERRO, NICOLA;MELUCCI, MASSIMO
2004

Abstract

Today managing textual resources and providing full-text search capabilities on them is a relevant issue also for database management systems. Stemming is part of the indexing and searching processes, when we deal with textual resources. In this paper we present a languageindependent probabilistic model which can automatically generate stemmers for several different languages. The variety of word forms makes the match between the end user’s words and the document words impossible even if they refer to the same concept - this mismatch degrades retrieval performance. Stemmers can improve the retrieval effectiveness, but the design and the implementation of stemmers requires a laborious amount of effort. The proposed model describes the mutual reinforcement relationship between stems and derivations and then provides a probabilistic interpretation of it. A series of experiments shows that the stemmers generated by the probabilistic model are as effective as the ones based on linguistic knowledge.
2004
Proc. 12th Italian Symposium on Advanced Database Systems (SEBD 2004)
8890140917
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/2441589
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 38
  • OpenAlex ND
social impact