Measurement models based on Item Response Theory (IRT) have garnered substantial credit in the field of psychometrics and psychological testing. When dealing with items on a polytomous response scale, opting for a IRT model requires considering a number of theoretical assumptions, such as the nature of the measurement scale or the type of investigated construct. A typically adopted solution is the application of IRT model with few assumptions (e.g., the Graded Response Model), which allow greater flexibility but that may not reflect the theory that guided the psychometrician in developing the psychological scale. This research delves into this topic comparing three IRT models for polytomous items with an increasing level of flexibility (Rating Scale Model, Graded Response Model, and Partial Credit Model) in terms of goodness of model and item fit, accordingly to a simulative and an applied case using real data. The main aim of the present contribution is to underline that IRT model selection (a-priori or following a model comparison approach) holds the potential to influence the description of a psychological construct, which may have a series of repercussion if applied to psychopathological assessment. Interestingly, results derived from the administration of the tool may not match its theoretical assumptions, leaving room for theoretical and psychometric rethinking. In conclusion, the strength of a “good”IRT model comes not only from the results of statistical testing, but ‒at least in the first instance - also from its consistency with the foundation theory that led the scale development.
How much flexibility do we need? Investigating the use of polytomous IRT models in psychological measurement.
Giovanni Bruno
Project Administration
;Gioia Bottesi;Daniela Di Riso;Andrea Spoto
2024
Abstract
Measurement models based on Item Response Theory (IRT) have garnered substantial credit in the field of psychometrics and psychological testing. When dealing with items on a polytomous response scale, opting for a IRT model requires considering a number of theoretical assumptions, such as the nature of the measurement scale or the type of investigated construct. A typically adopted solution is the application of IRT model with few assumptions (e.g., the Graded Response Model), which allow greater flexibility but that may not reflect the theory that guided the psychometrician in developing the psychological scale. This research delves into this topic comparing three IRT models for polytomous items with an increasing level of flexibility (Rating Scale Model, Graded Response Model, and Partial Credit Model) in terms of goodness of model and item fit, accordingly to a simulative and an applied case using real data. The main aim of the present contribution is to underline that IRT model selection (a-priori or following a model comparison approach) holds the potential to influence the description of a psychological construct, which may have a series of repercussion if applied to psychopathological assessment. Interestingly, results derived from the administration of the tool may not match its theoretical assumptions, leaving room for theoretical and psychometric rethinking. In conclusion, the strength of a “good”IRT model comes not only from the results of statistical testing, but ‒at least in the first instance - also from its consistency with the foundation theory that led the scale development.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.