Likelihood-based inference for moderate to high dimensional models

Huang, Caizhu

Likelihood based statistics and its standard asymptotic distribution results offer a general solution for hypothesis testing in parametric models. However, such approximate solutions are not reliable when the dimension $p$ of the parameter may increase with the sample size $n$. However, in such divergent dimensional regimes, higher-order likelihood approximations, even though developed in the fixed $p$ scenario, such as the directional test \citep{sartori:2014} and modifications of log-likelihood ratio test \cite{skovgaard:2001}, may still give substantial improvements over standard first-order solutions. Taking inspiration from a classification of asymptotic regimes recently introduced by \cite{battey2022some}, we focus on a moderate dimensional asymptotic setting, in which $p/n \to 0$, for instance with $p=O(n^\tau)$, with $\tau \in (0,1)$, and a high dimensional asymptotic setting, in which $p/n \to \kappa \in (0,1)$. On the other hand, we will not consider ultra-high dimensional settings, in which $p/n$ converges to a constant greater than 1, or even diverges. Within several prominent frameworks, we propose then to provide reliable solutions via higher-order approximations. In particular, the first part of the thesis examines higher-order likelihood solutions for moderate and high dimensional multivariate normal models. In the high dimensional regimes, we prove that the directional $p$-value is exactly uniformly distributed under the null hypothesis for seven prominent hypotheses concerning means and/or covariance matrices of multivariate normal distributions. We also consider a multivariate Behrens-Fisher problem, that is testing a hypothesis of equality of mean vectors in $k$ independent multivariate normal distribution with different covariance matrices. In this case, the parameter being tested is not a canonical parameter of an exponential family and therefore we cannot expect the accuracy of the methods to hold in high dimensional regimes. For this reason, we restrict ourselves to moderate dimensional regimes. Simulation results show that the higher-order approximations outperform the standard first-order solutions. Finally, we also study moderate dimensional logistic regression models. We consider three types of hypotheses: where the whole parameter is of interest, (i.e. no nuisance parameters problem), when a scalar component of the parameter is of interest, and when a vector component of the parameter is of interest. We give a tentative proof that the directional test gives reliable results provided that $p=o(n^{3/4})$ under a particular Gaussian assumption on the design matrix. Extended simulation results showed that the higher-order approximations perform good when the dimension of the parameter of interest is small or the dimension of the nuisance parameter is large. In this model setting, also Skovgaard's modified likelihood ratio statistic is empirically found to provide very accurate results. A more thorough theoretical study of these statistics in this setting is certainly an interesting future development of this thesis.

Likelihood-based inference for moderate to high dimensional models / Huang, Caizhu. - (2023 Apr 27).

Likelihood-based inference for moderate to high dimensional models

HUANG, CAIZHU

2023

Abstract

Likelihood based statistics and its standard asymptotic distribution results offer a general solution for hypothesis testing in parametric models. However, such approximate solutions are not reliable when the dimension $p$ of the parameter may increase with the sample size $n$. However, in such divergent dimensional regimes, higher-order likelihood approximations, even though developed in the fixed $p$ scenario, such as the directional test \citep{sartori:2014} and modifications of log-likelihood ratio test \cite{skovgaard:2001}, may still give substantial improvements over standard first-order solutions. Taking inspiration from a classification of asymptotic regimes recently introduced by \cite{battey2022some}, we focus on a moderate dimensional asymptotic setting, in which $p/n \to 0$, for instance with $p=O(n^\tau)$, with $\tau \in (0,1)$, and a high dimensional asymptotic setting, in which $p/n \to \kappa \in (0,1)$. On the other hand, we will not consider ultra-high dimensional settings, in which $p/n$ converges to a constant greater than 1, or even diverges. Within several prominent frameworks, we propose then to provide reliable solutions via higher-order approximations. In particular, the first part of the thesis examines higher-order likelihood solutions for moderate and high dimensional multivariate normal models. In the high dimensional regimes, we prove that the directional $p$-value is exactly uniformly distributed under the null hypothesis for seven prominent hypotheses concerning means and/or covariance matrices of multivariate normal distributions. We also consider a multivariate Behrens-Fisher problem, that is testing a hypothesis of equality of mean vectors in $k$ independent multivariate normal distribution with different covariance matrices. In this case, the parameter being tested is not a canonical parameter of an exponential family and therefore we cannot expect the accuracy of the methods to hold in high dimensional regimes. For this reason, we restrict ourselves to moderate dimensional regimes. Simulation results show that the higher-order approximations outperform the standard first-order solutions. Finally, we also study moderate dimensional logistic regression models. We consider three types of hypotheses: where the whole parameter is of interest, (i.e. no nuisance parameters problem), when a scalar component of the parameter is of interest, and when a vector component of the parameter is of interest. We give a tentative proof that the directional test gives reliable results provided that $p=o(n^{3/4})$ under a particular Gaussian assumption on the design matrix. Extended simulation results showed that the higher-order approximations perform good when the dimension of the parameter of interest is small or the dimension of the nuisance parameter is large. In this model setting, also Skovgaard's modified likelihood ratio statistic is empirically found to provide very accurate results. A more thorough theoretical study of these statistics in this setting is certainly an interesting future development of this thesis.

Scheda breve

Scheda completa

Scheda completa (DC)

	Titolo in inglese
	
				Likelihood-based inference for moderate to high dimensional models
			
	Anno di discussione
	
				27-apr-2023
			
	Abstract
	
				Likelihood based statistics and its standard asymptotic distribution results offer a general  solution for hypothesis testing in parametric models. However, such approximate solutions are not reliable when the dimension $p$  of the  parameter may increase with the sample size $n$. However, in such divergent dimensional regimes, higher-order likelihood approximations, even though developed in the fixed $p$ scenario, such as the directional test \citep{sartori:2014} and modifications of log-likelihood ratio test \cite{skovgaard:2001}, may still give substantial improvements over standard first-order solutions.

Taking inspiration from a classification of asymptotic regimes recently introduced by  \cite{battey2022some}, we focus on a moderate dimensional asymptotic setting, in which $p/n \to 0$, for instance with $p=O(n^\tau)$, with $\tau \in (0,1)$, and a high dimensional asymptotic setting, in which $p/n \to \kappa \in (0,1)$. On the other hand, we will not consider ultra-high dimensional settings, in which $p/n$ converges to a constant greater than 1, or even diverges. 

 Within several prominent frameworks, we propose then to provide  reliable solutions via higher-order approximations.
  In particular, the first part of the thesis examines  higher-order likelihood  solutions for moderate and high dimensional multivariate normal models. In the high dimensional regimes, we prove that the directional $p$-value is exactly uniformly distributed under the null hypothesis for seven prominent hypotheses concerning  means and/or covariance matrices of multivariate normal distributions.
  
  We also consider a multivariate Behrens-Fisher problem, that is testing  a hypothesis of equality of mean vectors in $k$ independent multivariate normal distribution  with different covariance matrices. In this case, the parameter being tested is not a canonical parameter of an exponential family and therefore we cannot expect the accuracy of the methods to hold in high dimensional regimes. For this reason, we restrict ourselves to moderate dimensional regimes.  Simulation results show that the higher-order approximations outperform the standard first-order solutions.
  
  Finally, we also study  moderate dimensional logistic regression models. We consider three types of hypotheses: where the whole parameter is  of interest, (i.e. no nuisance parameters  problem), when a scalar component of the  parameter is of interest, and when a  vector component of the parameter is of interest. We give a tentative  proof that the directional test gives  reliable results  provided that $p=o(n^{3/4})$ under a particular  Gaussian assumption on the design matrix. Extended simulation results showed that the higher-order approximations perform good when the dimension of the parameter of interest is small or the dimension of the nuisance parameter is large.  In this model setting, also Skovgaard's modified likelihood ratio statistic is empirically found to provide very accurate results. A more thorough theoretical study of these statistics in this setting is certainly an interesting future development of this thesis.
			
	Citazione
	
				Likelihood-based inference for moderate to high dimensional models / Huang, Caizhu. - (2023 Apr 27).
			
	Appare nelle tipologie:
	
				08.01 - Tesi di Dottorato UNIPD (Deposito Legale)

File in questo prodotto:

File	Dimensione	Formato
Final_Thesis_Caizhu_Huang.pdf embargo fino al 26/04/2026 Descrizione: Final_Thesis_Caizhu_Huang Tipologia: Tesi di dottorato Licenza: Altro Dimensione 16.61 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	16.61 MB	Adobe PDF	Visualizza/Apri Richiedi una copia