CorpuScript: an automated text-cleaning tool for corpus linguistics
Carregando...
Data
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal de Minas Gerais
Descrição
Tipo
Artigo de evento
Título alternativo
Primeiro orientador
Membros da banca
Resumo
The process of corpus compilation remains a significant challenge in the field of corpus linguistics. This paper introduces CorpuScript, an innovative text-cleaning software aimed at aiding researchers in the process of corpus preparation. By combining software engineering with corpus linguistics methods, this tool can significantly improve the workflow for corpora compilation, specifically in the task of corpus cleaning. The necessity for CorpuScript emerged from recurring challenges experienced by our research team, particularly during our current corpus research project, in which a considerable large number of texts needed to be cleaned before
being used for data analysis. Considering the pressing need for an automated solution that could improve the text-cleaning process in our research project, CorpuScript was carefully developed to help us accelerate the corpus compilation, while meeting the
requirements outlined in our corpus design.
Abstract
Assunto
Linguística de corpus, Linguística computacional, Engenharia de software
Palavras-chave
Citação
Departamento
Curso
Endereço externo
https://www.elc-ebralc.net.br/_files/ugd/75f182_29b728735a1a48b99531566f48678cdc.pdf