Please use this identifier to cite or link to this item: http://hdl.handle.net/1843/49361
Full metadata record
DC FieldValueLanguage
dc.creatorJoão Vítor Possamai de Menezespt_BR
dc.creatorMaria Mendes Cantonipt_BR
dc.creatorDenis Burnhampt_BR
dc.creatorAdriano Vilela Barbosapt_BR
dc.date.accessioned2023-02-01T14:11:36Z-
dc.date.available2023-02-01T14:11:36Z-
dc.date.issued2020-
dc.citation.volume9pt_BR
dc.citation.spage93pt_BR
dc.citation.epage104pt_BR
dc.identifier.doihttps://doi.org/10.20396/joss.v9i00.14960pt_BR
dc.identifier.issn2236-9740pt_BR
dc.identifier.urihttp://hdl.handle.net/1843/49361-
dc.description.resumoThis work presents a method for lexical tone classification in audio-visual speech. The method is applied to a speech data set consisting of syllables and words produced by a female native speaker of Cantonese. The data were recorded in an audio-visual speech production experiment. The visual component of speech was measured by tracking the positions of active markers placed on the speaker's face, whereas the acoustic component was measured with an ordinary microphone. A pitch tracking algorithm is used to estimate F0 from the acoustic signal. A procedure for head motion compensation is applied to the tracked marker positions in order to separate the head and face motion components. The data are then organized into four signal groups: F0, Face, Head, Face+Head. The signals in each of these groups are parameterized by means of a polynomial approximation and then used to train an LDA (Linear Discriminant Analysis) classifier that maps the input signals into one of the output classes (the lexical tones of the language). One classifier is trained for each signal group. The ability of each signal group to predict the correct lexical tones was assessed by the accuracy of the corresponding LDA classifier. The accuracy of the classifiers was obtained by means of a k-fold cross validation method. The classifiers for all signal groups performed above chance, with F0 achieving the highest accuracy, followed by Face+Head, Face, and Head, respectively. The differences in performance between all signal groups were statistically significant.pt_BR
dc.format.mimetypepdfpt_BR
dc.languageengpt_BR
dc.publisherUniversidade Federal de Minas Geraispt_BR
dc.publisher.countryBrasilpt_BR
dc.publisher.departmentFALE - FACULDADE DE LETRASpt_BR
dc.publisher.initialsUFMGpt_BR
dc.relation.ispartofJournal of Speech Sciencespt_BR
dc.rightsAcesso Abertopt_BR
dc.subjectMultimodal speechpt_BR
dc.subjectLexical tonept_BR
dc.subjectCantonese languagept_BR
dc.subjectStatistical learningpt_BR
dc.subjectLinear discriminant analysispt_BR
dc.subject.otherFalapt_BR
dc.titleA method for lexical tone classification in audio-visual speechpt_BR
dc.typeArtigo de Periódicopt_BR
dc.url.externahttps://econtents.bc.unicamp.br/inpec/index.php/joss/article/view/14960pt_BR
dc.identifier.orcidhttp://orcid.org/0000-0002-7612-9754pt_BR
dc.identifier.orcidhttps://orcid.org/0000-0001-9515-1802pt_BR
dc.identifier.orcidhttp://orcid.org/0000-0002-1980-3458pt_BR
dc.identifier.orcidhttp://orcid.org/0000-0003-1083-8256pt_BR
Appears in Collections:Artigo de Periódico

Files in This Item:
File Description SizeFormat 
A method for lexical tone classification in audio-visual speech.pdf394.82 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.