A method for lexical tone classification in audio-visual speech

João Vítor Possamai de Menezes; Maria Mendes Cantoni; Denis Burnham; Adriano Vilela Barbosa

Please use this identifier to cite or link to this item: http://hdl.handle.net/1843/49361

Full metadata record

DC Field	Value	Language
dc.creator	João Vítor Possamai de Menezes	pt_BR
dc.creator	Maria Mendes Cantoni	pt_BR
dc.creator	Denis Burnham	pt_BR
dc.creator	Adriano Vilela Barbosa	pt_BR
dc.date.accessioned	2023-02-01T14:11:36Z	-
dc.date.available	2023-02-01T14:11:36Z	-
dc.date.issued	2020	-
dc.citation.volume	9	pt_BR
dc.citation.spage	93	pt_BR
dc.citation.epage	104	pt_BR
dc.identifier.doi	https://doi.org/10.20396/joss.v9i00.14960	pt_BR
dc.identifier.issn	2236-9740	pt_BR
dc.identifier.uri	http://hdl.handle.net/1843/49361	-
dc.description.resumo	This work presents a method for lexical tone classification in audio-visual speech. The method is applied to a speech data set consisting of syllables and words produced by a female native speaker of Cantonese. The data were recorded in an audio-visual speech production experiment. The visual component of speech was measured by tracking the positions of active markers placed on the speaker's face, whereas the acoustic component was measured with an ordinary microphone. A pitch tracking algorithm is used to estimate F0 from the acoustic signal. A procedure for head motion compensation is applied to the tracked marker positions in order to separate the head and face motion components. The data are then organized into four signal groups: F0, Face, Head, Face+Head. The signals in each of these groups are parameterized by means of a polynomial approximation and then used to train an LDA (Linear Discriminant Analysis) classifier that maps the input signals into one of the output classes (the lexical tones of the language). One classifier is trained for each signal group. The ability of each signal group to predict the correct lexical tones was assessed by the accuracy of the corresponding LDA classifier. The accuracy of the classifiers was obtained by means of a k-fold cross validation method. The classifiers for all signal groups performed above chance, with F0 achieving the highest accuracy, followed by Face+Head, Face, and Head, respectively. The differences in performance between all signal groups were statistically significant.	pt_BR
dc.format.mimetype	pdf	pt_BR
dc.language	eng	pt_BR
dc.publisher	Universidade Federal de Minas Gerais	pt_BR
dc.publisher.country	Brasil	pt_BR
dc.publisher.department	FALE - FACULDADE DE LETRAS	pt_BR
dc.publisher.initials	UFMG	pt_BR
dc.relation.ispartof	Journal of Speech Sciences	pt_BR
dc.rights	Acesso Aberto	pt_BR
dc.subject	Multimodal speech	pt_BR
dc.subject	Lexical tone	pt_BR
dc.subject	Cantonese language	pt_BR
dc.subject	Statistical learning	pt_BR
dc.subject	Linear discriminant analysis	pt_BR
dc.subject.other	Fala	pt_BR
dc.title	A method for lexical tone classification in audio-visual speech	pt_BR
dc.type	Artigo de Periódico	pt_BR
dc.url.externa	https://econtents.bc.unicamp.br/inpec/index.php/joss/article/view/14960	pt_BR
dc.identifier.orcid	http://orcid.org/0000-0002-7612-9754	pt_BR
dc.identifier.orcid	https://orcid.org/0000-0001-9515-1802	pt_BR
dc.identifier.orcid	http://orcid.org/0000-0002-1980-3458	pt_BR
dc.identifier.orcid	http://orcid.org/0000-0003-1083-8256	pt_BR
Appears in Collections:	Artigo de Periódico

Files in This Item:

File	Description	Size	Format
A method for lexical tone classification in audio-visual speech.pdf		394.82 kB	Adobe PDF	View/Open

Show simple item record