Measurement models based on Item Response Theory (IRT) have garnered substantial credit in the field of psychometrics and psychological testing. When dealing with items on a polytomous response scale, opting for a IRT model requires considering a number of theoretical assumptions, such as the nature of the measurement scale or the type of investigated construct. A typically adopted solution is the application of IRT model with few assumptions (e.g., the Graded Response Model), which allow greater flexibility but that may not reflect the theory that guided the psychometrician in developing the psychological scale. This research delves into this topic comparing three IRT models for polytomous items with an increasing level of flexibility (Rating Scale Model, Graded Response Model, and Partial Credit Model) in terms of goodness of model and item fit, accordingly to a simulative and an applied case using real data. The main aim of the present contribution is to underline that IRT model selection (a-priori or following a model comparison approach) holds the potential to influence the description of a psychological construct, which may have a series of repercussion if applied to psychopathological assessment. Interestingly, results derived from the administration of the tool may not match its theoretical assumptions, leaving room for theoretical and psychometric rethinking. In conclusion, the strength of a “good”IRT model comes not only from the results of statistical testing, but ‒at least in the first instance - also from its consistency with the foundation theory that led the scale development.

How much flexibility do we need? Investigating the use of polytomous IRT models in psychological measurement.

Giovanni Bruno
Project Administration
;
Gioia Bottesi;Daniela Di Riso;Andrea Spoto
2024

Abstract

Measurement models based on Item Response Theory (IRT) have garnered substantial credit in the field of psychometrics and psychological testing. When dealing with items on a polytomous response scale, opting for a IRT model requires considering a number of theoretical assumptions, such as the nature of the measurement scale or the type of investigated construct. A typically adopted solution is the application of IRT model with few assumptions (e.g., the Graded Response Model), which allow greater flexibility but that may not reflect the theory that guided the psychometrician in developing the psychological scale. This research delves into this topic comparing three IRT models for polytomous items with an increasing level of flexibility (Rating Scale Model, Graded Response Model, and Partial Credit Model) in terms of goodness of model and item fit, accordingly to a simulative and an applied case using real data. The main aim of the present contribution is to underline that IRT model selection (a-priori or following a model comparison approach) holds the potential to influence the description of a psychological construct, which may have a series of repercussion if applied to psychopathological assessment. Interestingly, results derived from the administration of the tool may not match its theoretical assumptions, leaving room for theoretical and psychometric rethinking. In conclusion, the strength of a “good”IRT model comes not only from the results of statistical testing, but ‒at least in the first instance - also from its consistency with the foundation theory that led the scale development.
2024
AIP Sperimentale 2024, 30° Congresso annuale - Book of Abstracts
AIP Sperimentale 2024, 30° Congresso annuale
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3541739
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact