Building the first English-Brazilian Portuguese: corpus for automatic post-editing

Descrição

Tipo

Artigo de evento

Título alternativo

Primeiro orientador

Membros da banca

Resumo

This paper introduces the first corpus for Automatic Post-Editing of English and a low-resource language, Brazilian Portuguese. The source English texts were extracted from the WebNLG corpus and automatically translated into Portuguese using a state-of-the-art industrial neural machine translator. Post-edits were then obtained in an experiment with native speakers of Brazilian Portuguese. To assess the quality of the corpus, we performed error analysis and computed complexity indicators measuring how difficult the APE task would be. We report preliminary results of Phrase-Based and Neural Machine Translation Models on this new corpus. Data and code publicly available in our repository.

Abstract

Assunto

Linguística de corpus, Tradução mecânica

Palavras-chave

Citação

Curso

Endereço externo

Avaliação

Revisão

Suplementado Por

Referenciado Por