Testing statistical methods for sociolinguistic profiling of Brazilian Portuguese speakers

João Victor Pessoa Rocha

Testing statistical methods for sociolinguistic profiling of Brazilian Portuguese speakers

dc.creator	João Victor Pessoa Rocha
dc.date.accessioned	2024-03-20T15:39:06Z
dc.date.accessioned	2025-09-08T23:42:28Z
dc.date.available	2024-03-20T15:39:06Z
dc.date.issued	2024-02-23
dc.description.sponsorship	CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
dc.identifier.uri	https://hdl.handle.net/1843/66155
dc.language	eng
dc.publisher	Universidade Federal de Minas Gerais
dc.rights	Acesso Aberto
dc.subject	Sociolinguística
dc.subject	Atos de fala (Linguística)
dc.subject	Linguística de Corpus
dc.subject.other	sociolinguistic profiling
dc.subject.other	sociolects
dc.subject.other	speaker
dc.subject.other	modeling
dc.title	Testing statistical methods for sociolinguistic profiling of Brazilian Portuguese speakers
dc.title.alternative	Testando métodos estatísticos para o perfilamento sociolinguístico de falantes do Português Brasileiro
dc.type	Dissertação de mestrado
local.contributor.advisor-co1	Crysttian Arantes Paixão
local.contributor.advisor1	Heliana Ribeiro de Mello
local.contributor.advisor1Lattes	http://lattes.cnpq.br/5724573734505786
local.contributor.referee1	Flavio Codeco Coelho
local.contributor.referee1	Livia Oushiro
local.creator.Lattes	http://lattes.cnpq.br/6202477988100226
local.description.resumo	This work constitutes a computationally driven and cross-methodological analysis of sociolectal marker recognition, positioning it in the growing area of Computational Sociolinguistics. This research had twomain goals: (i) selecting an efficient method for sociolect (dis)similarity recognition; and (ii) describing how speech transcriptions can help profile a speaker. The main term we used to describe an in-group’s language was sociolect because we believe it is more accurate regarding what sociolinguists deal with. To this end, a spontaneous speech corpus of Brazilian Portuguese compiled according to the Language into Act theory (L-AcT) framework was used to extract the data. This linguistic resource provides, besides the transcriptions, the metadata information about the interaction and the speakers, sound files, sound-text alignment files, and transcriptions annotated with the PALAVRAS parser (Bick, 2000). To achieve the aforementioned goals, three methods were tested: (i) Variation-Based Distance and Similarity Modeling (VADIS) (Szmrecsanyi et al., 2019), (ii) Mann-Whitney test; and (iii) Poisson and Negative binomial (parametric modeling) with Estimated Marginal Means (EMM) (Searle et al., 1980) and Compact Letter Display (CLD) (Piepho, 2004). Each method was assessed in relation to twelve linguistic variables: apheretic forms, apocopated diminutives, foreign words, interjections, reduced and articulated prepositions, pronoun phenomena, rhotacism, pronunciation of senhor/senhora, non-standard negation particles, non-standard plural marking in noun phrases, non-standard verb conjugation, and non-standard verb agreement. The VADIS methodology was not successful at fitting our data, because of data conversion from numerical to categorical and the amount of data available. On the other hand, the non-parametric model was able to retrieve significant predictors for ten linguistic phenomena and show the sociolect similarity, but it did not capture any predictor interaction. However, the parametric model retrieved significant predictors for seven response variables and two double predictor interactions, displaying more intricate sociolect groupings. Therefore, according to the findings, the Poisson and Negative binomial models alongside EMM and CLD are productive methods to linguistically profile speakers through speech transcription. Furthermore, our study emphasized the role of sociolects as powerful social markers, uncovering complex relations between society and language. Finally, this thesis advances the sociolinguistics field by the implementation of computational methods in research about Brazilian Portuguese.
local.publisher.country	Brasil
local.publisher.department	FALE - FACULDADE DE LETRAS
local.publisher.initials	UFMG
local.publisher.program	Programa de Pós-Graduação em Estudos Linguísticos

Arquivos

Pacote original

Agora exibindo 1 - 1 de 1

Nome:: Testing statistical methods for sociolinguistic profiling of Brazilian speakers.pdf
Tamanho:: 2.82 MB
Formato:: Adobe Portable Document Format

Baixar

Licença do pacote

Agora exibindo 1 - 1 de 1

Nome:: license.txt
Tamanho:: 2.07 KB
Formato:: Plain Text
Descrição:

Baixar

Coleções

Pós-Graduação em Estudos Lingüísticos - Dissertações