Please use this identifier to cite or link to this item: http://hdl.handle.net/1843/61118
Type: Tese
Title: Robust linear mixed models, alternative methods to quantile regression for panel data, and adaptive LASSO quantile regression with fixed effects
Other Titles: Modelos lineares mistos robustos, métodos alternativos de regressão quantílica para dados de painel, e regressão quantílica com LASSO adaptativo para efeitos fixos
Authors: Ian Meneghel Danilevicz
First Advisor: Valdério Anselmo Reisen
metadata.dc.contributor.advisor2: Pascal Bondon
First Referee: Arthur Tenenhaus
Second Referee: Stéphane Chrétien
Third Referee: Paulo Canas Rodrigues
metadata.dc.contributor.referee4: Maria Helena Mourino Silva Nunes
metadata.dc.contributor.referee5: Glaura da Conceição Franco
Abstract: This thesis consists of three chapters on longitudinal data analysis. Linear mixed models are discussed, both random effects (where individual intercepts are interpreted as random variables) and fixed effects (where individual intercepts are considered unknown constants, i.e., they must be estimated). Furthermore, robust models (resistant to outliers) and efficient models (with low estimator variability) are proposed in the scope of repeated measures. The second part of the thesis is dedicated to quantile regression, which explores the full conditional distribution of an outcome given its predictors. It introduces a more general method for dealing with heteroscedastic variables and longitudinal data. The first chapter is motivated by evaluating the statistical association between air pollution exposure and children and adolescents' lung ability among six months. A robust linear mixed model combined with an equally robust principal component analysis is proposed to deal with multicollinearity between covariates and the impact of extreme observations on the estimates. Huber and Tukey loss functions (M-estimation examples) are considered to obtain more robust estimators than the least squared function usually used to estimate the parameters of linear mixed models. A finite sample size study is carried out in the case where the covariates follow linear time series models with or without additive outliers. The impact of time correlation and outliers on fixed effect parameter estimates in linear mixed models is investigated. In addition, weights are introduced to reduce the estimates' bias even more. The study of the real data revealed that the robust principal component analysis exhibits three principal components explaining more than 90% of the total variability. The second principal component, which corresponds to particles smaller than 10 microns, significantly affects respiratory capacity. In addition, biological indicators such as passive smoking have a negative and significant effect on children's lung ability. The second chapter analyses fixed effect panel data with three different loss functions. To avoid the number of parameters increases with the sample size, we propose to penalize each regression method with the least absolute shrinkage and selection operator (LASSO). The asymptotic properties of two of these new techniques are established. A Monte Carlo study is performed for homoscedastic and heteroscedastic models. Although the model is more challenging to estimate in the heteroscedastic case for most statistical methods, the proposed methods perform well in both scenarios. This confirms that the proposed quantile regression methods are robust to heteroscedasticity. Their performance is tested on economic panel data from the Organisation for Economic Cooperation and Development (OECD). The objective of the third chapter is to simultaneously restrict the number of individual regression constants and explanatory covariates. In addition to the LASSO, an adaptive LASSO is proposed, which enjoys oracle proprieties, i.e., it owns the asymptotic selection of the true model if it exists, and it has the classical asymptotic normality property. Monte Carlo simulations are performed in the case of low dimensionality (much more observations than parameters) and in the case of moderate dimensionality (equivalent number of observations and parameters). In both cases, the adaptive method performs much better than the non-adaptive methods. Finally, we apply our methodology to a cohort dataset of moderate dimensionality. For each chapter, open-source software is written, which is available to the scientific community.
Abstract: A tese é composta por três capítulos. O primeiro enfoca a relação entre exposição à poluição do ar e doenças respiratórias em crianças e adolescentes. A coorte inclui 82 indivíduos observados mensalmente durante seis meses. Propomos um modelo linear misto robusto combinado com uma análise de componentes principais para gerenciar a multicolinearidade entre as covariáveis e o impacto de observações extremas nas estimativas. O segundo capítulo analisa dados em painel usando modelos de efeitos fixos e diferentes funções de perda. Para evitar que a dimensão paramétrica aumente com o tamanho da amostra, penalizamos cada método de regressão por LASSO. As propriedades assintóticas dessas novas técnicas são estabelecidas. Testamos o desempenho dos métodos com dados de painel econômico da OCDE. O objetivo almejado no terceiro capítulo é restringir as constantes de regressão individuais e as covariáveis explicativas simultaneamente. O LASSO adaptativo reduz a dimensionalidade garantindo assimptoticamente a seleção correta do modelo. Testamos a precisão dos métodos propostos em dados de coorte de tamanho moderado.
Subject: . Estatística – Teses
Dados longitudinais – Teses
.Regressão quantílica, – Teses
M-estimation – Teses
Observações atípicas – Teses
language: eng
metadata.dc.publisher.country: Brasil
Publisher: Universidade Federal de Minas Gerais
Publisher Initials: UFMG
metadata.dc.publisher.department: ICX - DEPARTAMENTO DE ESTATÍSTICA
metadata.dc.publisher.program: Programa de Pós-Graduação em Estatística
Rights: Acesso Restrito
metadata.dc.rights.uri: http://creativecommons.org/licenses/by-nc-nd/3.0/pt/
URI: http://hdl.handle.net/1843/61118
Issue Date: 15-Dec-2022
metadata.dc.description.embargo: 15-Dec-2024
Appears in Collections:Teses de Doutorado

Files in This Item:
File Description SizeFormat 
exemplo_v2 (1) (1).pdf
???org.dspace.app.webui.jsptag.ItemTag.restrictionUntil??? 2024-12-15
manuscrito da tese de doutorado ian m danilevicz3.76 MBAdobe PDFView/Open    Request a copy


This item is licensed under a Creative Commons License Creative Commons