Anotação de textos não canônicos: um estudo exploratorio de Grande sertão: veredas pelas dependências universais

Descrição

Tipo

Artigo de evento

Título alternativo

Primeiro orientador

Membros da banca

Resumo

This paper reports on an exploratory study of a sample of 175 sentences retrieved from the renowned Brazilian novel Grande Sertão: Veredas [Portuguese for Great Backlands: Paths; English translation: The devil to pay in the backlands], which were annotated for POS and syntactic relations following the Universal Dependencies guidelines. The study aimed to explore the feasibility of annotating non-canonical text to create treebanks for Brazilian Portuguese. We computed accuracy and precision of the model in order to verify categories annotated more and less successfully. The results show the model performed slightly better for POS than dependency relations and pointed out categories with higher demand for manual revision as being those related to orality phenomena represented by Guimarães Rosa in his novel. The study shows the potential of annotating noncanonical text to enhance existing models with categories less represented in the treebanks.

Abstract

Assunto

Processamento da linguagem natural (Computação), Linguística de corpus

Palavras-chave

Universal Dependencies, Non-canonical text, Brazilian Portuguese

Citação

Curso

Endereço externo

https://aclanthology.org/2022.udfestbr-1.1.pdf

Avaliação

Revisão

Suplementado Por

Referenciado Por