On the impact of attribute noise on movie review polarity classification

Karen Stéfany Martins

Please use this identifier to cite or link to this item: http://hdl.handle.net/1843/39044

Full metadata record

DC Field	Value	Language
dc.contributor.advisor1	Pedro Olmo Stancioli Vaz de Melo	pt_BR
dc.contributor.advisor1Lattes	http://lattes.cnpq.br/3262926164579789	pt_BR
dc.contributor.advisor-co1	Rodrygo Luis Teodoro Santos	pt_BR
dc.contributor.referee1	Helena de Medeiros Caseli	pt_BR
dc.contributor.referee2	Adriano Alonso Veloso	pt_BR
dc.creator	Karen Stéfany Martins	pt_BR
dc.creator.Lattes	http://lattes.cnpq.br/4045223666758899	pt_BR
dc.date.accessioned	2022-01-08T02:40:53Z	-
dc.date.available	2022-01-08T02:40:53Z	-
dc.date.issued	2020-10-16	-
dc.identifier.uri	http://hdl.handle.net/1843/39044	-
dc.description.abstract	A partir do crescimento da Internet, sites de críticas de filmes mudaram o setor cinematográfico. Eles podem afetar as bilheterias dos filmes, por exemplo. A polaridade dessas críticas é muito importante em várias aplicações. Algumas delas usam classificadores baseados em aprendizado de máquina para definir a polaridade. No entanto, esses classificadores não são perfeitos. Eles são frequentemente criticados pela falta de explicação dos seus sucessos e fracassos. Este trabalho ajuda a preencher essa lacuna, propondo uma metodologia para caracterizar, identificar e medir o impacto de instâncias problemáticas na tarefa de classificação da polaridade de críticas de filmes. Caracterizamos essas instâncias por dois tipos de ruído de atributo: neutralidade, quando o texto da crítica não transmite uma polaridade clara e discrepância, quando a polaridade do texto não corresponde à polaridade definida pelo autor. Para fazer isso, propomos um classificador humano composto por três juízes humanos independentes. Cada juíz classifica as críticas em dois níveis. No primeiro nível, eles classificam em relação à sua polaridade, isto é, positiva ou negativa. Em seguida, no segundo nível, eles respondem se estão confiantes ou não sobre a sua classificação e o por quê. Em seguida, agregamos suas respostas usando o voto da maioria. Por fim, testamos os classificadores baseados em aprendizado de máquina nessas críticas. A partir dessas etapas, quantificamos a quantidade de ruído em atributo na classificação de polaridade de críticas de filmes e fornecemos evidências empíricas sobre a necessidade de prestar atenção a essas instâncias problemáticas, pois são muito mais difíceis de classificar, tanto para os classificadores máquinas quanto para os humanos. Nossa metodologia proposta é simples e pode ser facilmente aplicada a outras tarefas de classificação. Até onde sabemos, esta é a primeira análise sistemática do impacto do ruído de atributo na detecção de polaridade a partir de críticas textuais bem formadas.	pt_BR
dc.description.resumo	With the growth of the internet, movie review websites have changed the cinematography industry. It has been affecting the movie's box office, for example. The review polarity is very important in several applications. Some of them use machine learning classifiers to define the review polarity. However, these classifiers are not perfect. They are often criticized for the lack of explanation of their successes and failures. This work helps to fill this gap by proposing a methodology to characterize, identify, and measure the impact of problematic instances in the task of polarity classification of movie reviews. We characterize such instances by two types of attribute noise: \neutrality, where the review text does not convey a clear polarity, and \discrepancy, where the polarity of the text does not match the polarity of its rating. To do that, we propose a human classifier which is composed of three independent human annotators. Each annotator classifies the reviews on two levels. On the first level, they classify the review in relation to its polarity, that is, positive or negative. Next, on the second level, they answer whether they are confident or not about their classification and why. Then, we aggregate their answers using the majority vote. Finally, we test state-of-the-art machine learning classifiers on these reviews. From these steps, we quantify the amount of attribute noise in polarity classification of movie reviews and provide empirical evidence about the need to pay attention to such problematic instances, as they are much harder to classify, for both machine and human classifiers. Our proposed methodology is simple and can be easily applied to other classification tasks. To the best of our knowledge, this is the first systematic analysis of the impact of attribute noise in polarity detection from well-formed textual reviews.	pt_BR
dc.language	eng	pt_BR
dc.publisher	Universidade Federal de Minas Gerais	pt_BR
dc.publisher.country	Brasil	pt_BR
dc.publisher.department	ICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃO	pt_BR
dc.publisher.program	Programa de Pós-Graduação em Ciência da Computação	pt_BR
dc.publisher.initials	UFMG	pt_BR
dc.rights	Acesso Restrito	pt_BR
dc.subject	Attribute noise	pt_BR
dc.subject	Deep Learning	pt_BR
dc.subject	Explainability	pt_BR
dc.subject	Opinion reviews	pt_BR
dc.subject	Opinion mining	pt_BR
dc.subject	Movie reviews	pt_BR
dc.subject.other	Computação – Teses	pt_BR
dc.subject.other	Aprendizado profundo – Teses	pt_BR
dc.subject.other	Mineração de opinião – Teses	pt_BR
dc.subject.other	Crítica cinematográfica – Teses	pt_BR
dc.title	On the impact of attribute noise on movie review polarity classification	pt_BR
dc.title.alternative	O impacto do ruído de atributo na classificação da polaridade de críticas de filmes	pt_BR
dc.type	Dissertação	pt_BR
dc.description.embargo	2021-10-16	-
dc.identifier.orcid	https://orcid.org/0000-0001-7949-4573	pt_BR
Appears in Collections:	Dissertações de Mestrado

Files in This Item:

File	Description	Size	Format
Dissertação_Karen_Martins_ON THE IMPACT OF ATTRIBUTE NOISE ON MOVIE REVIEW POLARITY_UFMG.pdf		1.11 MB	Adobe PDF	View/Open

Show simple item record