Testing statistical methods for sociolinguistic profiling of Brazilian Portuguese speakers

João Victor Pessoa Rocha

Please use this identifier to cite or link to this item: http://hdl.handle.net/1843/66155

Type:	Dissertação
Title:	Testing statistical methods for sociolinguistic profiling of Brazilian Portuguese speakers
Other Titles:	Testando métodos estatísticos para o perfilamento sociolinguístico de falantes do Português Brasileiro
Authors:	João Victor Pessoa Rocha
First Advisor:	Heliana Ribeiro de Mello
First Co-advisor:	Crysttian Arantes Paixão
First Referee:	Flavio Codeco Coelho
Second Referee:	Livia Oushiro
Abstract:	This work constitutes a computationally driven and cross-methodological analysis of sociolectal marker recognition, positioning it in the growing area of Computational Sociolinguistics. This research had twomain goals: (i) selecting an efficient method for sociolect (dis)similarity recognition; and (ii) describing how speech transcriptions can help profile a speaker. The main term we used to describe an in-group’s language was sociolect because we believe it is more accurate regarding what sociolinguists deal with. To this end, a spontaneous speech corpus of Brazilian Portuguese compiled according to the Language into Act theory (L-AcT) framework was used to extract the data. This linguistic resource provides, besides the transcriptions, the metadata information about the interaction and the speakers, sound files, sound-text alignment files, and transcriptions annotated with the PALAVRAS parser (Bick, 2000). To achieve the aforementioned goals, three methods were tested: (i) Variation-Based Distance and Similarity Modeling (VADIS) (Szmrecsanyi et al., 2019), (ii) Mann-Whitney test; and (iii) Poisson and Negative binomial (parametric modeling) with Estimated Marginal Means (EMM) (Searle et al., 1980) and Compact Letter Display (CLD) (Piepho, 2004). Each method was assessed in relation to twelve linguistic variables: apheretic forms, apocopated diminutives, foreign words, interjections, reduced and articulated prepositions, pronoun phenomena, rhotacism, pronunciation of senhor/senhora, non-standard negation particles, non-standard plural marking in noun phrases, non-standard verb conjugation, and non-standard verb agreement. The VADIS methodology was not successful at fitting our data, because of data conversion from numerical to categorical and the amount of data available. On the other hand, the non-parametric model was able to retrieve significant predictors for ten linguistic phenomena and show the sociolect similarity, but it did not capture any predictor interaction. However, the parametric model retrieved significant predictors for seven response variables and two double predictor interactions, displaying more intricate sociolect groupings. Therefore, according to the findings, the Poisson and Negative binomial models alongside EMM and CLD are productive methods to linguistically profile speakers through speech transcription. Furthermore, our study emphasized the role of sociolects as powerful social markers, uncovering complex relations between society and language. Finally, this thesis advances the sociolinguistics field by the implementation of computational methods in research about Brazilian Portuguese.
Subject:	Sociolinguística Atos de fala (Linguística) Linguística de Corpus
language:	eng
metadata.dc.publisher.country:	Brasil
Publisher:	Universidade Federal de Minas Gerais
Publisher Initials:	UFMG
metadata.dc.publisher.department:	FALE - FACULDADE DE LETRAS
metadata.dc.publisher.program:	Programa de Pós-Graduação em Estudos Linguísticos
Rights:	Acesso Aberto
URI:	http://hdl.handle.net/1843/66155
Issue Date:	23-Feb-2024
Appears in Collections:	Dissertações de Mestrado

Files in This Item:

File	Description	Size	Format
Testing statistical methods for sociolinguistic profiling of Brazilian speakers.pdf		2.88 MB	Adobe PDF	View/Open

Show full item record