Building the first English-Brazilian Portuguese: corpus for automatic post-editing

dc.creatorFelipe de Almeida Costa
dc.creatorThiago Castro Ferreira
dc.creatorAdriana Silvina Pagano
dc.creatorWagner Meira Junior
dc.date.accessioned2024-03-14T23:40:11Z
dc.date.accessioned2025-09-09T01:32:52Z
dc.date.available2024-03-14T23:40:11Z
dc.date.issued2020-12
dc.format.mimetypepdf
dc.identifier.doihttps://doi.org/10.18653/v1/2020.coling-main.533
dc.identifier.urihttps://hdl.handle.net/1843/65891
dc.languageeng
dc.publisherUniversidade Federal de Minas Gerais
dc.relation.ispartofInternational Conference on Computational Linguistics
dc.rightsAcesso Aberto
dc.subjectLinguística de corpus
dc.subjectTradução mecânica
dc.titleBuilding the first English-Brazilian Portuguese: corpus for automatic post-editing
dc.typeArtigo de evento
local.citation.epage6069
local.citation.issue28
local.citation.spage6063
local.description.resumoThis paper introduces the first corpus for Automatic Post-Editing of English and a low-resource language, Brazilian Portuguese. The source English texts were extracted from the WebNLG corpus and automatically translated into Portuguese using a state-of-the-art industrial neural machine translator. Post-edits were then obtained in an experiment with native speakers of Brazilian Portuguese. To assess the quality of the corpus, we performed error analysis and computed complexity indicators measuring how difficult the APE task would be. We report preliminary results of Phrase-Based and Neural Machine Translation Models on this new corpus. Data and code publicly available in our repository.
local.identifier.orcidhttps://orcid.org/0000-0002-3150-3503
local.identifier.orcidhttps://orcid.org/0000-0003-0200-3646
local.identifier.orcidhttps://orcid.org/0000-0002-2614-2723
local.publisher.countryBrasil
local.publisher.departmentFALE - FACULDADE DE LETRAS
local.publisher.initialsUFMG

Arquivos

Pacote original

Agora exibindo 1 - 1 de 1
Carregando...
Imagem de Miniatura
Nome:
Building the first English-Brazilian Portuguese corpus for automatic post-editing.pdf
Tamanho:
161.91 KB
Formato:
Adobe Portable Document Format

Licença do pacote

Agora exibindo 1 - 1 de 1
Carregando...
Imagem de Miniatura
Nome:
License.txt
Tamanho:
1.99 KB
Formato:
Plain Text
Descrição: