In textual analysis, many corpora include texts which have a chronological order. The temporal evolution of (key) words is relevant in order to highlight the distinctive features of the chronological corpus. In a typical bag-of-words approach data are organized in word-type x time-point contingency tables. Such discrete data can be thought of as continuous objects represented by functional relationships. The aims of this study are identifying a specific sequential pattern for each word as a functional object, and determining prototype patterns representing clusters of words portraying a similar evolution. We propose the application of a flexible waveletbased model for curve clustering to a corpus of end-of-year addresses delivered by the ten Presidents of Italian Republic in the period 1949-2011.
Chronological analysis of textual data and curve clustering: preliminary results based on wavelets
TUZZI, ARJUNA
2012
Abstract
In textual analysis, many corpora include texts which have a chronological order. The temporal evolution of (key) words is relevant in order to highlight the distinctive features of the chronological corpus. In a typical bag-of-words approach data are organized in word-type x time-point contingency tables. Such discrete data can be thought of as continuous objects represented by functional relationships. The aims of this study are identifying a specific sequential pattern for each word as a functional object, and determining prototype patterns representing clusters of words portraying a similar evolution. We propose the application of a flexible waveletbased model for curve clustering to a corpus of end-of-year addresses delivered by the ten Presidents of Italian Republic in the period 1949-2011.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.