On the impact of attribute noise on movie review polarity classification

Karen Stéfany Martins

Use este identificador para citar o ir al link de este elemento: http://hdl.handle.net/1843/39044

Tipo:	Dissertação
Título:	On the impact of attribute noise on movie review polarity classification
Título(s) alternativo(s):	O impacto do ruído de atributo na classificação da polaridade de críticas de filmes
Autor(es):	Karen Stéfany Martins
primer Tutor:	Pedro Olmo Stancioli Vaz de Melo
primer Co-tutor:	Rodrygo Luis Teodoro Santos
primer miembro del tribunal :	Helena de Medeiros Caseli
Segundo miembro del tribunal:	Adriano Alonso Veloso
Resumen:	With the growth of the internet, movie review websites have changed the cinematography industry. It has been affecting the movie's box office, for example. The review polarity is very important in several applications. Some of them use machine learning classifiers to define the review polarity. However, these classifiers are not perfect. They are often criticized for the lack of explanation of their successes and failures. This work helps to fill this gap by proposing a methodology to characterize, identify, and measure the impact of problematic instances in the task of polarity classification of movie reviews. We characterize such instances by two types of attribute noise: \neutrality, where the review text does not convey a clear polarity, and \discrepancy, where the polarity of the text does not match the polarity of its rating. To do that, we propose a human classifier which is composed of three independent human annotators. Each annotator classifies the reviews on two levels. On the first level, they classify the review in relation to its polarity, that is, positive or negative. Next, on the second level, they answer whether they are confident or not about their classification and why. Then, we aggregate their answers using the majority vote. Finally, we test state-of-the-art machine learning classifiers on these reviews. From these steps, we quantify the amount of attribute noise in polarity classification of movie reviews and provide empirical evidence about the need to pay attention to such problematic instances, as they are much harder to classify, for both machine and human classifiers. Our proposed methodology is simple and can be easily applied to other classification tasks. To the best of our knowledge, this is the first systematic analysis of the impact of attribute noise in polarity detection from well-formed textual reviews.
Abstract:	A partir do crescimento da Internet, sites de críticas de filmes mudaram o setor cinematográfico. Eles podem afetar as bilheterias dos filmes, por exemplo. A polaridade dessas críticas é muito importante em várias aplicações. Algumas delas usam classificadores baseados em aprendizado de máquina para definir a polaridade. No entanto, esses classificadores não são perfeitos. Eles são frequentemente criticados pela falta de explicação dos seus sucessos e fracassos. Este trabalho ajuda a preencher essa lacuna, propondo uma metodologia para caracterizar, identificar e medir o impacto de instâncias problemáticas na tarefa de classificação da polaridade de críticas de filmes. Caracterizamos essas instâncias por dois tipos de ruído de atributo: neutralidade, quando o texto da crítica não transmite uma polaridade clara e discrepância, quando a polaridade do texto não corresponde à polaridade definida pelo autor. Para fazer isso, propomos um classificador humano composto por três juízes humanos independentes. Cada juíz classifica as críticas em dois níveis. No primeiro nível, eles classificam em relação à sua polaridade, isto é, positiva ou negativa. Em seguida, no segundo nível, eles respondem se estão confiantes ou não sobre a sua classificação e o por quê. Em seguida, agregamos suas respostas usando o voto da maioria. Por fim, testamos os classificadores baseados em aprendizado de máquina nessas críticas. A partir dessas etapas, quantificamos a quantidade de ruído em atributo na classificação de polaridade de críticas de filmes e fornecemos evidências empíricas sobre a necessidade de prestar atenção a essas instâncias problemáticas, pois são muito mais difíceis de classificar, tanto para os classificadores máquinas quanto para os humanos. Nossa metodologia proposta é simples e pode ser facilmente aplicada a outras tarefas de classificação. Até onde sabemos, esta é a primeira análise sistemática do impacto do ruído de atributo na detecção de polaridade a partir de críticas textuais bem formadas.
Asunto:	Computação – Teses Aprendizado profundo – Teses Mineração de opinião – Teses Crítica cinematográfica – Teses
Idioma:	eng
País:	Brasil
Editor:	Universidade Federal de Minas Gerais
Sigla da Institución:	UFMG
Departamento:	ICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃO
Curso:	Programa de Pós-Graduação em Ciência da Computação
Tipo de acceso:	Acesso Restrito
URI:	http://hdl.handle.net/1843/39044
Fecha del documento:	16-oct-2020
Término del Embargo:	16-oct-2021
Aparece en las colecciones:	Dissertações de Mestrado

archivos asociados a este elemento:

archivo	Descripción	Tamaño	Formato
Dissertação_Karen_Martins_ON THE IMPACT OF ATTRIBUTE NOISE ON MOVIE REVIEW POLARITY_UFMG.pdf		1.11 MB	Adobe PDF	Visualizar/Abrir

Mostrar registro completo del elemento Visualizar estadísticas