Reconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofone

Detalhes bibliográficos
Ano de defesa: 2019
Autor(a) principal: Ribeiro, Fábio Cisne
Orientador(a): Cortez, Paulo César
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Não Informado pela instituição
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://www.repositorio.ufc.br/handle/riufc/40251
Resumo: This thesis has as main objective the development of a system for voice commands recognition in noisy environments through isolated words spoken independent of a speaker, with emphasis on the use of throat microphone which is a acquisition sensor for speech signal more robust for this type of environment. The technology studied is presented through integrated hardware and software device that allow the use of speech as an instrument for the operation of a technological equipment. Thus, were research which techniques are best to perform the proposed voice processing. There is no other database with voice commands captured using throat microphone in Portuguese language in the researched literature. We created a database with isolated voice commands with captured utterances of 150 people (men and women). All voice samples are captured in Brazilian Portuguese, and are the digits “0” through “9” and the words “Ok” and “Cancel”. To remove the captured noises two filters were used, the Least Mean Squares in the temporal space and the Wavelet Transform in the space in frequency, so that this set allowed to remove the noises that are captured by the laringophone. The best feature extractor tested is the Perceptual LinearPrediction and its best configuration is the use of 9 or 10 indexes in the order of their coefficients. For classification it been used a voting committee composed of three classifiers, MLP, BMLP and SOM to recognize the voice command. For classification a voting committee composed of three classifiers, Multilayer Perceptron, Binary Multilayer Perceptron and SelfOrganizing Maps to recognize command of voice. The results show that throat microphone is robust in noise environment, reaching 96,6% of hit rate in our voice command recognition system. It was observed that vowels with low intensity and fricatives present in the words “3” and “7” in Portuguese confuse the classifier.
id UFC-7_96dd26b74fd0537bf0f15ffe976ded41
oai_identifier_str oai:repositorio.ufc.br:riufc/40251
network_acronym_str UFC-7
network_name_str Repositório Institucional da Universidade Federal do Ceará (UFC)
repository_id_str
spelling Ribeiro, Fábio CisneCortez, Paulo César2019-03-12T11:11:39Z2019-03-12T11:11:39Z2019RIBEIRO, F. C. Reconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofone. 2019. 17 f. Tese (Doutorado em Engenharia de Teleinformática)–Centro de Tecnologia, Universidade Federal do Ceará, Fortaleza, 2019.http://www.repositorio.ufc.br/handle/riufc/40251This thesis has as main objective the development of a system for voice commands recognition in noisy environments through isolated words spoken independent of a speaker, with emphasis on the use of throat microphone which is a acquisition sensor for speech signal more robust for this type of environment. The technology studied is presented through integrated hardware and software device that allow the use of speech as an instrument for the operation of a technological equipment. Thus, were research which techniques are best to perform the proposed voice processing. There is no other database with voice commands captured using throat microphone in Portuguese language in the researched literature. We created a database with isolated voice commands with captured utterances of 150 people (men and women). All voice samples are captured in Brazilian Portuguese, and are the digits “0” through “9” and the words “Ok” and “Cancel”. To remove the captured noises two filters were used, the Least Mean Squares in the temporal space and the Wavelet Transform in the space in frequency, so that this set allowed to remove the noises that are captured by the laringophone. The best feature extractor tested is the Perceptual LinearPrediction and its best configuration is the use of 9 or 10 indexes in the order of their coefficients. For classification it been used a voting committee composed of three classifiers, MLP, BMLP and SOM to recognize the voice command. For classification a voting committee composed of three classifiers, Multilayer Perceptron, Binary Multilayer Perceptron and SelfOrganizing Maps to recognize command of voice. The results show that throat microphone is robust in noise environment, reaching 96,6% of hit rate in our voice command recognition system. It was observed that vowels with low intensity and fricatives present in the words “3” and “7” in Portuguese confuse the classifier.Esta tese tem como objetivo principal o desenvolvimento de um sistema para reconhecimento de comandos de voz em ambientes ruidosos através de palavras isoladas e independentes do locutor, com ênfase no uso do laringofone, que é um sensor de aquisição do sinal da fala mais robusto para ambientes ruidosos. A tecnologia estudada apresenta-se através de dispositivos integrados de hardware e software, que permitem usar a fala como instrumento de operação de equipamentos tecnológicos. Assim, foram pesquisadas quais técnicas que melhor se adéquam para realização do processamento de voz proposto. Como não há outro conjunto de dados com comandos de voz capturados usando o laringofone na língua Portuguesa do Brasil na literatura pesquisada, criamos um conjunto de dados com comandos de voz isolados com elocuções capturadas de 150 pessoas (homens e mulheres). Todas as amostras de voz são capturadas em Português Brasileiro, e são os dígitos “0” a “9” e as palavras “Ok” e “Cancelar”. Para remover os ruídos capturados, dois filtros foram utilizados, o Least Mean Squares no espaço temporal e a Transformada Wavelet no espaço em frequência, de forma que esse conjunto permitiu remover os ruídos que são capturados pelo laringofone. O melhor extrator de características testado é o Perceptual Linear Prediction e sua melhor configuração é utilizando 9 ou 10 índices na ordem dos seus coeficientes. Para classificação utilizou-se um comitê votador composto por três classificadores, Perceptron Multicamadas, Perceptron Multicamadas Binário e Mapas Auto-Organizáveis para reconhecer o comando de voz. Os resultados mostram que o laringofone é robusto no ambiente de ruído, alcançando 96,6% de taxa de acertos em nosso sistema de reconhecimento de comandos de voz. Foi observado que vogais com baixa intensidade e fricativos presentes nas palavras “3” e “7” em Português confundem o classificador.TeleinformáticaReconhecimento automático da vozSSistemas de reconhecimento de padrõesRedes neurais (Computação)Speech recognitionThroat microphonePattern recognitionNeural networksReconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofoneinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisporreponame:Repositório Institucional da Universidade Federal do Ceará (UFC)instname:Universidade Federal do Ceará (UFC)instacron:UFCinfo:eu-repo/semantics/openAccessLICENSElicense.txtlicense.txttext/plain; charset=utf-81748http://repositorio.ufc.br/bitstream/riufc/40251/4/license.txt8a4605be74aa9ea9d79846c1fba20a33MD54ORIGINAL2019_tese_fcribeiro.pdf2019_tese_fcribeiro.pdfapplication/pdf378933http://repositorio.ufc.br/bitstream/riufc/40251/3/2019_tese_fcribeiro.pdf19207b4fd7bdd9ca0b3589eda650d714MD53riufc/402512021-08-13 13:18:20.934oai:repositorio.ufc.br:riufc/40251Tk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=Repositório InstitucionalPUBhttp://www.repositorio.ufc.br/ri-oai/requestbu@ufc.br || repositorio@ufc.bropendoar:2021-08-13T16:18:20Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)false
dc.title.pt_BR.fl_str_mv Reconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofone
title Reconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofone
spellingShingle Reconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofone
Ribeiro, Fábio Cisne
Teleinformática
Reconhecimento automático da voz
SSistemas de reconhecimento de padrões
Redes neurais (Computação)
Speech recognition
Throat microphone
Pattern recognition
Neural networks
title_short Reconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofone
title_full Reconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofone
title_fullStr Reconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofone
title_full_unstemmed Reconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofone
title_sort Reconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofone
author Ribeiro, Fábio Cisne
author_facet Ribeiro, Fábio Cisne
author_role author
dc.contributor.author.fl_str_mv Ribeiro, Fábio Cisne
dc.contributor.advisor1.fl_str_mv Cortez, Paulo César
contributor_str_mv Cortez, Paulo César
dc.subject.por.fl_str_mv Teleinformática
Reconhecimento automático da voz
SSistemas de reconhecimento de padrões
Redes neurais (Computação)
Speech recognition
Throat microphone
Pattern recognition
Neural networks
topic Teleinformática
Reconhecimento automático da voz
SSistemas de reconhecimento de padrões
Redes neurais (Computação)
Speech recognition
Throat microphone
Pattern recognition
Neural networks
description This thesis has as main objective the development of a system for voice commands recognition in noisy environments through isolated words spoken independent of a speaker, with emphasis on the use of throat microphone which is a acquisition sensor for speech signal more robust for this type of environment. The technology studied is presented through integrated hardware and software device that allow the use of speech as an instrument for the operation of a technological equipment. Thus, were research which techniques are best to perform the proposed voice processing. There is no other database with voice commands captured using throat microphone in Portuguese language in the researched literature. We created a database with isolated voice commands with captured utterances of 150 people (men and women). All voice samples are captured in Brazilian Portuguese, and are the digits “0” through “9” and the words “Ok” and “Cancel”. To remove the captured noises two filters were used, the Least Mean Squares in the temporal space and the Wavelet Transform in the space in frequency, so that this set allowed to remove the noises that are captured by the laringophone. The best feature extractor tested is the Perceptual LinearPrediction and its best configuration is the use of 9 or 10 indexes in the order of their coefficients. For classification it been used a voting committee composed of three classifiers, MLP, BMLP and SOM to recognize the voice command. For classification a voting committee composed of three classifiers, Multilayer Perceptron, Binary Multilayer Perceptron and SelfOrganizing Maps to recognize command of voice. The results show that throat microphone is robust in noise environment, reaching 96,6% of hit rate in our voice command recognition system. It was observed that vowels with low intensity and fricatives present in the words “3” and “7” in Portuguese confuse the classifier.
publishDate 2019
dc.date.accessioned.fl_str_mv 2019-03-12T11:11:39Z
dc.date.available.fl_str_mv 2019-03-12T11:11:39Z
dc.date.issued.fl_str_mv 2019
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv RIBEIRO, F. C. Reconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofone. 2019. 17 f. Tese (Doutorado em Engenharia de Teleinformática)–Centro de Tecnologia, Universidade Federal do Ceará, Fortaleza, 2019.
dc.identifier.uri.fl_str_mv http://www.repositorio.ufc.br/handle/riufc/40251
identifier_str_mv RIBEIRO, F. C. Reconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofone. 2019. 17 f. Tese (Doutorado em Engenharia de Teleinformática)–Centro de Tecnologia, Universidade Federal do Ceará, Fortaleza, 2019.
url http://www.repositorio.ufc.br/handle/riufc/40251
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositório Institucional da Universidade Federal do Ceará (UFC)
instname:Universidade Federal do Ceará (UFC)
instacron:UFC
instname_str Universidade Federal do Ceará (UFC)
instacron_str UFC
institution UFC
reponame_str Repositório Institucional da Universidade Federal do Ceará (UFC)
collection Repositório Institucional da Universidade Federal do Ceará (UFC)
bitstream.url.fl_str_mv http://repositorio.ufc.br/bitstream/riufc/40251/4/license.txt
http://repositorio.ufc.br/bitstream/riufc/40251/3/2019_tese_fcribeiro.pdf
bitstream.checksum.fl_str_mv 8a4605be74aa9ea9d79846c1fba20a33
19207b4fd7bdd9ca0b3589eda650d714
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)
repository.mail.fl_str_mv bu@ufc.br || repositorio@ufc.br
_version_ 1847793099464507392