Reconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofone
| Ano de defesa: | 2019 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Tese |
| Tipo de acesso: | Acesso aberto |
| Idioma: | por |
| Instituição de defesa: |
Não Informado pela instituição
|
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | http://www.repositorio.ufc.br/handle/riufc/40251 |
Resumo: | This thesis has as main objective the development of a system for voice commands recognition in noisy environments through isolated words spoken independent of a speaker, with emphasis on the use of throat microphone which is a acquisition sensor for speech signal more robust for this type of environment. The technology studied is presented through integrated hardware and software device that allow the use of speech as an instrument for the operation of a technological equipment. Thus, were research which techniques are best to perform the proposed voice processing. There is no other database with voice commands captured using throat microphone in Portuguese language in the researched literature. We created a database with isolated voice commands with captured utterances of 150 people (men and women). All voice samples are captured in Brazilian Portuguese, and are the digits “0” through “9” and the words “Ok” and “Cancel”. To remove the captured noises two filters were used, the Least Mean Squares in the temporal space and the Wavelet Transform in the space in frequency, so that this set allowed to remove the noises that are captured by the laringophone. The best feature extractor tested is the Perceptual LinearPrediction and its best configuration is the use of 9 or 10 indexes in the order of their coefficients. For classification it been used a voting committee composed of three classifiers, MLP, BMLP and SOM to recognize the voice command. For classification a voting committee composed of three classifiers, Multilayer Perceptron, Binary Multilayer Perceptron and SelfOrganizing Maps to recognize command of voice. The results show that throat microphone is robust in noise environment, reaching 96,6% of hit rate in our voice command recognition system. It was observed that vowels with low intensity and fricatives present in the words “3” and “7” in Portuguese confuse the classifier. |
| id |
UFC-7_96dd26b74fd0537bf0f15ffe976ded41 |
|---|---|
| oai_identifier_str |
oai:repositorio.ufc.br:riufc/40251 |
| network_acronym_str |
UFC-7 |
| network_name_str |
Repositório Institucional da Universidade Federal do Ceará (UFC) |
| repository_id_str |
|
| spelling |
Ribeiro, Fábio CisneCortez, Paulo César2019-03-12T11:11:39Z2019-03-12T11:11:39Z2019RIBEIRO, F. C. Reconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofone. 2019. 17 f. Tese (Doutorado em Engenharia de Teleinformática)–Centro de Tecnologia, Universidade Federal do Ceará, Fortaleza, 2019.http://www.repositorio.ufc.br/handle/riufc/40251This thesis has as main objective the development of a system for voice commands recognition in noisy environments through isolated words spoken independent of a speaker, with emphasis on the use of throat microphone which is a acquisition sensor for speech signal more robust for this type of environment. The technology studied is presented through integrated hardware and software device that allow the use of speech as an instrument for the operation of a technological equipment. Thus, were research which techniques are best to perform the proposed voice processing. There is no other database with voice commands captured using throat microphone in Portuguese language in the researched literature. We created a database with isolated voice commands with captured utterances of 150 people (men and women). All voice samples are captured in Brazilian Portuguese, and are the digits “0” through “9” and the words “Ok” and “Cancel”. To remove the captured noises two filters were used, the Least Mean Squares in the temporal space and the Wavelet Transform in the space in frequency, so that this set allowed to remove the noises that are captured by the laringophone. The best feature extractor tested is the Perceptual LinearPrediction and its best configuration is the use of 9 or 10 indexes in the order of their coefficients. For classification it been used a voting committee composed of three classifiers, MLP, BMLP and SOM to recognize the voice command. For classification a voting committee composed of three classifiers, Multilayer Perceptron, Binary Multilayer Perceptron and SelfOrganizing Maps to recognize command of voice. The results show that throat microphone is robust in noise environment, reaching 96,6% of hit rate in our voice command recognition system. It was observed that vowels with low intensity and fricatives present in the words “3” and “7” in Portuguese confuse the classifier.Esta tese tem como objetivo principal o desenvolvimento de um sistema para reconhecimento de comandos de voz em ambientes ruidosos através de palavras isoladas e independentes do locutor, com ênfase no uso do laringofone, que é um sensor de aquisição do sinal da fala mais robusto para ambientes ruidosos. A tecnologia estudada apresenta-se através de dispositivos integrados de hardware e software, que permitem usar a fala como instrumento de operação de equipamentos tecnológicos. Assim, foram pesquisadas quais técnicas que melhor se adéquam para realização do processamento de voz proposto. Como não há outro conjunto de dados com comandos de voz capturados usando o laringofone na língua Portuguesa do Brasil na literatura pesquisada, criamos um conjunto de dados com comandos de voz isolados com elocuções capturadas de 150 pessoas (homens e mulheres). Todas as amostras de voz são capturadas em Português Brasileiro, e são os dígitos “0” a “9” e as palavras “Ok” e “Cancelar”. Para remover os ruídos capturados, dois filtros foram utilizados, o Least Mean Squares no espaço temporal e a Transformada Wavelet no espaço em frequência, de forma que esse conjunto permitiu remover os ruídos que são capturados pelo laringofone. O melhor extrator de características testado é o Perceptual Linear Prediction e sua melhor configuração é utilizando 9 ou 10 índices na ordem dos seus coeficientes. Para classificação utilizou-se um comitê votador composto por três classificadores, Perceptron Multicamadas, Perceptron Multicamadas Binário e Mapas Auto-Organizáveis para reconhecer o comando de voz. Os resultados mostram que o laringofone é robusto no ambiente de ruído, alcançando 96,6% de taxa de acertos em nosso sistema de reconhecimento de comandos de voz. Foi observado que vogais com baixa intensidade e fricativos presentes nas palavras “3” e “7” em Português confundem o classificador.TeleinformáticaReconhecimento automático da vozSSistemas de reconhecimento de padrõesRedes neurais (Computação)Speech recognitionThroat microphonePattern recognitionNeural networksReconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofoneinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisporreponame:Repositório Institucional da Universidade Federal do Ceará (UFC)instname:Universidade Federal do Ceará (UFC)instacron:UFCinfo:eu-repo/semantics/openAccessLICENSElicense.txtlicense.txttext/plain; charset=utf-81748http://repositorio.ufc.br/bitstream/riufc/40251/4/license.txt8a4605be74aa9ea9d79846c1fba20a33MD54ORIGINAL2019_tese_fcribeiro.pdf2019_tese_fcribeiro.pdfapplication/pdf378933http://repositorio.ufc.br/bitstream/riufc/40251/3/2019_tese_fcribeiro.pdf19207b4fd7bdd9ca0b3589eda650d714MD53riufc/402512021-08-13 13:18:20.934oai:repositorio.ufc.br:riufc/40251Tk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=Repositório InstitucionalPUBhttp://www.repositorio.ufc.br/ri-oai/requestbu@ufc.br || repositorio@ufc.bropendoar:2021-08-13T16:18:20Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)false |
| dc.title.pt_BR.fl_str_mv |
Reconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofone |
| title |
Reconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofone |
| spellingShingle |
Reconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofone Ribeiro, Fábio Cisne Teleinformática Reconhecimento automático da voz SSistemas de reconhecimento de padrões Redes neurais (Computação) Speech recognition Throat microphone Pattern recognition Neural networks |
| title_short |
Reconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofone |
| title_full |
Reconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofone |
| title_fullStr |
Reconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofone |
| title_full_unstemmed |
Reconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofone |
| title_sort |
Reconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofone |
| author |
Ribeiro, Fábio Cisne |
| author_facet |
Ribeiro, Fábio Cisne |
| author_role |
author |
| dc.contributor.author.fl_str_mv |
Ribeiro, Fábio Cisne |
| dc.contributor.advisor1.fl_str_mv |
Cortez, Paulo César |
| contributor_str_mv |
Cortez, Paulo César |
| dc.subject.por.fl_str_mv |
Teleinformática Reconhecimento automático da voz SSistemas de reconhecimento de padrões Redes neurais (Computação) Speech recognition Throat microphone Pattern recognition Neural networks |
| topic |
Teleinformática Reconhecimento automático da voz SSistemas de reconhecimento de padrões Redes neurais (Computação) Speech recognition Throat microphone Pattern recognition Neural networks |
| description |
This thesis has as main objective the development of a system for voice commands recognition in noisy environments through isolated words spoken independent of a speaker, with emphasis on the use of throat microphone which is a acquisition sensor for speech signal more robust for this type of environment. The technology studied is presented through integrated hardware and software device that allow the use of speech as an instrument for the operation of a technological equipment. Thus, were research which techniques are best to perform the proposed voice processing. There is no other database with voice commands captured using throat microphone in Portuguese language in the researched literature. We created a database with isolated voice commands with captured utterances of 150 people (men and women). All voice samples are captured in Brazilian Portuguese, and are the digits “0” through “9” and the words “Ok” and “Cancel”. To remove the captured noises two filters were used, the Least Mean Squares in the temporal space and the Wavelet Transform in the space in frequency, so that this set allowed to remove the noises that are captured by the laringophone. The best feature extractor tested is the Perceptual LinearPrediction and its best configuration is the use of 9 or 10 indexes in the order of their coefficients. For classification it been used a voting committee composed of three classifiers, MLP, BMLP and SOM to recognize the voice command. For classification a voting committee composed of three classifiers, Multilayer Perceptron, Binary Multilayer Perceptron and SelfOrganizing Maps to recognize command of voice. The results show that throat microphone is robust in noise environment, reaching 96,6% of hit rate in our voice command recognition system. It was observed that vowels with low intensity and fricatives present in the words “3” and “7” in Portuguese confuse the classifier. |
| publishDate |
2019 |
| dc.date.accessioned.fl_str_mv |
2019-03-12T11:11:39Z |
| dc.date.available.fl_str_mv |
2019-03-12T11:11:39Z |
| dc.date.issued.fl_str_mv |
2019 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
| format |
doctoralThesis |
| status_str |
publishedVersion |
| dc.identifier.citation.fl_str_mv |
RIBEIRO, F. C. Reconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofone. 2019. 17 f. Tese (Doutorado em Engenharia de Teleinformática)–Centro de Tecnologia, Universidade Federal do Ceará, Fortaleza, 2019. |
| dc.identifier.uri.fl_str_mv |
http://www.repositorio.ufc.br/handle/riufc/40251 |
| identifier_str_mv |
RIBEIRO, F. C. Reconhecimento de comandos de voz em português brasileiro em ambientes ruidosos usando laringofone. 2019. 17 f. Tese (Doutorado em Engenharia de Teleinformática)–Centro de Tecnologia, Universidade Federal do Ceará, Fortaleza, 2019. |
| url |
http://www.repositorio.ufc.br/handle/riufc/40251 |
| dc.language.iso.fl_str_mv |
por |
| language |
por |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.source.none.fl_str_mv |
reponame:Repositório Institucional da Universidade Federal do Ceará (UFC) instname:Universidade Federal do Ceará (UFC) instacron:UFC |
| instname_str |
Universidade Federal do Ceará (UFC) |
| instacron_str |
UFC |
| institution |
UFC |
| reponame_str |
Repositório Institucional da Universidade Federal do Ceará (UFC) |
| collection |
Repositório Institucional da Universidade Federal do Ceará (UFC) |
| bitstream.url.fl_str_mv |
http://repositorio.ufc.br/bitstream/riufc/40251/4/license.txt http://repositorio.ufc.br/bitstream/riufc/40251/3/2019_tese_fcribeiro.pdf |
| bitstream.checksum.fl_str_mv |
8a4605be74aa9ea9d79846c1fba20a33 19207b4fd7bdd9ca0b3589eda650d714 |
| bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
| repository.name.fl_str_mv |
Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC) |
| repository.mail.fl_str_mv |
bu@ufc.br || repositorio@ufc.br |
| _version_ |
1847793099464507392 |