Existência de estimadores de máxima verossimilhança em modelos de regressão logística

Denise Pimenta Nacle

Please use this identifier to cite or link to this item: http://hdl.handle.net/1843/BIRC-BB4PJ6

Type:	Dissertação de Mestrado
Title:	Existência de estimadores de máxima verossimilhança em modelos de regressão logística
Authors:	Denise Pimenta Nacle
First Advisor:	Enrico Antonio Colosimo
First Referee:	Braulio Roberto Gonçalves Marinho Couto
Second Referee:	Clarice Garcia Borges Demétrio
Abstract:	O modelo de regressão logística é o método estatístico freqüentemente utilizado para tratar respostas binárias, e a estimação dos seus coeficientes é geralmente feita usando o método de máxima verossimilhança. Mas, como tal método baseia-se em propriedades assintóticas dos estimadores, necessitando de amostras geralmente grandes, os resultados obtidos da teoria assintótica podem não ser adequados ou podem não existir, mesmo quando se dispõe de amostras grandes, mas cujos dados são esparsos. Os dados logísticos podem ser classificados em três categorias mutuamente exclusivas e exaustivas, segundo Albert e Anderson (1984): Separação Completa, Separação Quase-Completa e Overlap. Para as duas primeiras categorias, os estimadores de máxima verossimilhança não existem. Este trabalho foi motivado por dois bancos de dados reais que estão classificados na Categoria de Separação Quase-Completa e, portanto, os estimadores de máxima verossimilhança não existem. São apresentadas duas propostas de solução da literatura (regressão logística exata e adição de uma pequena constante aos dados) e, ainda, uma nova solução que consiste simplesmente em retirar, aleatoriamente, uma observação de uma das caselas não-nulas (com mesmo valor da covariável ou mesmo valor da resposta) e adicioná-la à casela nula. Através de Simulações de Monte Carlo, foram comparadas as três propostas de solução quanto ao Erro Quadrático Médio, em que os melhores resultados foram obtidos pela adição de uma pequena constante aos dados e pela eficácia da nova proposta.
Abstract:	The logistic regression model is the statistical method frequently used to deal with binary responses and the estimation of their coefficients is usually done using the method of maximum likelihood. But as this method is based on the asymptotic properties of the estimators, it needs sample sizes generally large, so the theory asymptotics results cannot be appropriate or cannot exist, even when we have the use of large samples, but their data are sparse. The logistic data can be classified into three mutually exclusive and exhaustive categories, according to Albert and Anderson (1984): complete separation, quasicomplete separation and overlap. For the first two categories, the maximum likelihood estimators do not exist. This researche has been motivated for two real data sets that are classified on the category of separation quasicomplete and, consequently, there are no maximum likelihood estimators. Then, two proposals from the literature (exact logistic regression and addition of a small constant in data) were discussed and it was presented the new proposal, that consists in taking away randomly the results of any of the non-null cell (with same value from covariable or same response value ) and add it to the null cell. The comparison of the proposals is done by using simulations of Monte Carlo. The criterion used for this comparison was the mean-square error. The best results obtained were based on the addition of a small constant in data and the effectiveness of the new proposal.
Subject:	Análise de regressão Método de Monte Carlo Estatística Verossimilhança (Estatística)
language:	Português
Publisher:	Universidade Federal de Minas Gerais
Publisher Initials:	UFMG
Rights:	Acesso Aberto
URI:	http://hdl.handle.net/1843/BIRC-BB4PJ6
Issue Date:	9-Nov-2004
Appears in Collections:	Dissertações de Mestrado

Files in This Item:

File	Description	Size	Format
disserta__odenisenacle21122004.pdf		1.02 MB	Adobe PDF	View/Open

Show full item record