Testing statistical methods for sociolinguistic profiling of Brazilian Portuguese speakers

dc.creatorJoão Victor Pessoa Rocha
dc.date.accessioned2024-03-20T15:39:06Z
dc.date.accessioned2025-09-08T23:42:28Z
dc.date.available2024-03-20T15:39:06Z
dc.date.issued2024-02-23
dc.description.sponsorshipCAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
dc.identifier.urihttps://hdl.handle.net/1843/66155
dc.languageeng
dc.publisherUniversidade Federal de Minas Gerais
dc.rightsAcesso Aberto
dc.subjectSociolinguística
dc.subjectAtos de fala (Linguística)
dc.subjectLinguística de Corpus
dc.subject.othersociolinguistic profiling
dc.subject.othersociolects
dc.subject.otherspeaker
dc.subject.othermodeling
dc.titleTesting statistical methods for sociolinguistic profiling of Brazilian Portuguese speakers
dc.title.alternativeTestando métodos estatísticos para o perfilamento sociolinguístico de falantes do Português Brasileiro
dc.typeDissertação de mestrado
local.contributor.advisor-co1Crysttian Arantes Paixão
local.contributor.advisor1Heliana Ribeiro de Mello
local.contributor.advisor1Latteshttp://lattes.cnpq.br/5724573734505786
local.contributor.referee1Flavio Codeco Coelho
local.contributor.referee1Livia Oushiro
local.creator.Latteshttp://lattes.cnpq.br/6202477988100226
local.description.resumoThis work constitutes a computationally driven and cross-methodological analysis of sociolectal marker recognition, positioning it in the growing area of Computational Sociolinguistics. This research had twomain goals: (i) selecting an efficient method for sociolect (dis)similarity recognition; and (ii) describing how speech transcriptions can help profile a speaker. The main term we used to describe an in-group’s language was sociolect because we believe it is more accurate regarding what sociolinguists deal with. To this end, a spontaneous speech corpus of Brazilian Portuguese compiled according to the Language into Act theory (L-AcT) framework was used to extract the data. This linguistic resource provides, besides the transcriptions, the metadata information about the interaction and the speakers, sound files, sound-text alignment files, and transcriptions annotated with the PALAVRAS parser (Bick, 2000). To achieve the aforementioned goals, three methods were tested: (i) Variation-Based Distance and Similarity Modeling (VADIS) (Szmrecsanyi et al., 2019), (ii) Mann-Whitney test; and (iii) Poisson and Negative binomial (parametric modeling) with Estimated Marginal Means (EMM) (Searle et al., 1980) and Compact Letter Display (CLD) (Piepho, 2004). Each method was assessed in relation to twelve linguistic variables: apheretic forms, apocopated diminutives, foreign words, interjections, reduced and articulated prepositions, pronoun phenomena, rhotacism, pronunciation of senhor/senhora, non-standard negation particles, non-standard plural marking in noun phrases, non-standard verb conjugation, and non-standard verb agreement. The VADIS methodology was not successful at fitting our data, because of data conversion from numerical to categorical and the amount of data available. On the other hand, the non-parametric model was able to retrieve significant predictors for ten linguistic phenomena and show the sociolect similarity, but it did not capture any predictor interaction. However, the parametric model retrieved significant predictors for seven response variables and two double predictor interactions, displaying more intricate sociolect groupings. Therefore, according to the findings, the Poisson and Negative binomial models alongside EMM and CLD are productive methods to linguistically profile speakers through speech transcription. Furthermore, our study emphasized the role of sociolects as powerful social markers, uncovering complex relations between society and language. Finally, this thesis advances the sociolinguistics field by the implementation of computational methods in research about Brazilian Portuguese.
local.publisher.countryBrasil
local.publisher.departmentFALE - FACULDADE DE LETRAS
local.publisher.initialsUFMG
local.publisher.programPrograma de Pós-Graduação em Estudos Linguísticos

Arquivos

Pacote original

Agora exibindo 1 - 1 de 1
Carregando...
Imagem de Miniatura
Nome:
Testing statistical methods for sociolinguistic profiling of Brazilian speakers.pdf
Tamanho:
2.82 MB
Formato:
Adobe Portable Document Format

Licença do pacote

Agora exibindo 1 - 1 de 1
Carregando...
Imagem de Miniatura
Nome:
license.txt
Tamanho:
2.07 KB
Formato:
Plain Text
Descrição: