Please use this identifier to cite or link to this item:
http://hdl.handle.net/1843/66155
Type: | Dissertação |
Title: | Testing statistical methods for sociolinguistic profiling of Brazilian Portuguese speakers |
Other Titles: | Testando métodos estatísticos para o perfilamento sociolinguístico de falantes do Português Brasileiro |
Authors: | João Victor Pessoa Rocha |
First Advisor: | Heliana Ribeiro de Mello |
First Co-advisor: | Crysttian Arantes Paixão |
First Referee: | Flavio Codeco Coelho |
Second Referee: | Livia Oushiro |
Abstract: | This work constitutes a computationally driven and cross-methodological analysis of sociolectal marker recognition, positioning it in the growing area of Computational Sociolinguistics. This research had twomain goals: (i) selecting an efficient method for sociolect (dis)similarity recognition; and (ii) describing how speech transcriptions can help profile a speaker. The main term we used to describe an in-group’s language was sociolect because we believe it is more accurate regarding what sociolinguists deal with. To this end, a spontaneous speech corpus of Brazilian Portuguese compiled according to the Language into Act theory (L-AcT) framework was used to extract the data. This linguistic resource provides, besides the transcriptions, the metadata information about the interaction and the speakers, sound files, sound-text alignment files, and transcriptions annotated with the PALAVRAS parser (Bick, 2000). To achieve the aforementioned goals, three methods were tested: (i) Variation-Based Distance and Similarity Modeling (VADIS) (Szmrecsanyi et al., 2019), (ii) Mann-Whitney test; and (iii) Poisson and Negative binomial (parametric modeling) with Estimated Marginal Means (EMM) (Searle et al., 1980) and Compact Letter Display (CLD) (Piepho, 2004). Each method was assessed in relation to twelve linguistic variables: apheretic forms, apocopated diminutives, foreign words, interjections, reduced and articulated prepositions, pronoun phenomena, rhotacism, pronunciation of senhor/senhora, non-standard negation particles, non-standard plural marking in noun phrases, non-standard verb conjugation, and non-standard verb agreement. The VADIS methodology was not successful at fitting our data, because of data conversion from numerical to categorical and the amount of data available. On the other hand, the non-parametric model was able to retrieve significant predictors for ten linguistic phenomena and show the sociolect similarity, but it did not capture any predictor interaction. However, the parametric model retrieved significant predictors for seven response variables and two double predictor interactions, displaying more intricate sociolect groupings. Therefore, according to the findings, the Poisson and Negative binomial models alongside EMM and CLD are productive methods to linguistically profile speakers through speech transcription. Furthermore, our study emphasized the role of sociolects as powerful social markers, uncovering complex relations between society and language. Finally, this thesis advances the sociolinguistics field by the implementation of computational methods in research about Brazilian Portuguese. |
Subject: | Sociolinguística Atos de fala (Linguística) Linguística de Corpus |
language: | eng |
metadata.dc.publisher.country: | Brasil |
Publisher: | Universidade Federal de Minas Gerais |
Publisher Initials: | UFMG |
metadata.dc.publisher.department: | FALE - FACULDADE DE LETRAS |
metadata.dc.publisher.program: | Programa de Pós-Graduação em Estudos Linguísticos |
Rights: | Acesso Aberto |
URI: | http://hdl.handle.net/1843/66155 |
Issue Date: | 23-Feb-2024 |
Appears in Collections: | Dissertações de Mestrado |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Testing statistical methods for sociolinguistic profiling of Brazilian speakers.pdf | 2.88 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.