Avaliação experimental de um classificador para apoiar a detecção de fraudes em compras públicas

Detalhes bibliográficos
Ano de defesa: 2022
Autor(a) principal: Fontes, Raphael Silva
Orientador(a): Rodrigues Júnior, Methanias Colaço
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Não Informado pela instituição
Programa de Pós-Graduação: Pós-Graduação em Ciência da Computação
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: https://ri.ufs.br/jspui/handle/riufs/15098
Resumo: Context: The United Nations (UN) describes corruption as an insidious plague, which has a wide range of corrosive effects on societies. In practice, corruption has a variety of instruments, from small amounts in accelerating the granting licenses, to large frauds in bidding processes in different areas of the country. For the health area, for example, spending on medicines involves a significant volume of resources, about R$18 billion in 2018, potentially exposed to harmful conduct to the public purse. In another area of important impact, fuel, the persistent debtor, the one who fails to pay the tax due, was responsible for R$ 14 billion of tax evasion in 2020. To try to combat these problems, it is necessary to classify and automatic subtotaling of Electronic Invoices (NF-es) issued for the purchase of these products, considering their unique identification codes and descriptions. However, the codes are not always registered correctly by the suppliers. Furthermore, if the product description is considered an alternative to the code, this is not a uniform field, being free-write and variable. Finally, some products have a hierarchical classification in their descriptions, which are important for complete identification. Objective: To build and evaluate the effectiveness of a classifier of Invoices for Fuels and Medicines, based on mining the unstructured texts of these invoices, in the context of purchases made by public bodies in the states of Sergipe and Rio Grande do Norte, analyzed by the State and Federal Prosecution Offices (MPE; MPF), Special Action Group to Combat Organized Crime (GAECO) and State Finance Departments. Method: After the development and initial parameterization of the classifier, two controlled experiments were carried out with NF-es held by the MPs, respecting the fiscal secrecy of those involved. Results: Considering the statistical significance, the classifier was able to identify drug descriptions and their hierarchical subclasses, with the following average results: accuracy of 99.81%, precision of 100%, recall or sensitivity of 99.64% and F1-measure of 99.82%. As for fuels, the classifier reached an accuracy of 100% and an F1-measure of 100%. Conclusion: It was possible to show that it is feasible to automate the classification of fuels and medicines, enabling investigations. For drugs, it was also possible to extract the hierarchical subclasses of the descriptions, namely: active ingredient, dosage, pharmaceutical form and quantity.
id UFS-2_2583f4ca92c6c4cbc90daf6b6763b615
oai_identifier_str oai:ufs.br:riufs/15098
network_acronym_str UFS-2
network_name_str Repositório Institucional da UFS
repository_id_str
spelling Fontes, Raphael SilvaRodrigues Júnior, Methanias Colaço2022-03-04T12:49:50Z2022-03-04T12:49:50Z2022-01-31FONTES, Raphael Silva. Avaliação experimental de um classificador para apoiar a detecção de fraudes em compras públicas. 2022. 68 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Sergipe, São Cristóvão, 2022.https://ri.ufs.br/jspui/handle/riufs/15098Context: The United Nations (UN) describes corruption as an insidious plague, which has a wide range of corrosive effects on societies. In practice, corruption has a variety of instruments, from small amounts in accelerating the granting licenses, to large frauds in bidding processes in different areas of the country. For the health area, for example, spending on medicines involves a significant volume of resources, about R$18 billion in 2018, potentially exposed to harmful conduct to the public purse. In another area of important impact, fuel, the persistent debtor, the one who fails to pay the tax due, was responsible for R$ 14 billion of tax evasion in 2020. To try to combat these problems, it is necessary to classify and automatic subtotaling of Electronic Invoices (NF-es) issued for the purchase of these products, considering their unique identification codes and descriptions. However, the codes are not always registered correctly by the suppliers. Furthermore, if the product description is considered an alternative to the code, this is not a uniform field, being free-write and variable. Finally, some products have a hierarchical classification in their descriptions, which are important for complete identification. Objective: To build and evaluate the effectiveness of a classifier of Invoices for Fuels and Medicines, based on mining the unstructured texts of these invoices, in the context of purchases made by public bodies in the states of Sergipe and Rio Grande do Norte, analyzed by the State and Federal Prosecution Offices (MPE; MPF), Special Action Group to Combat Organized Crime (GAECO) and State Finance Departments. Method: After the development and initial parameterization of the classifier, two controlled experiments were carried out with NF-es held by the MPs, respecting the fiscal secrecy of those involved. Results: Considering the statistical significance, the classifier was able to identify drug descriptions and their hierarchical subclasses, with the following average results: accuracy of 99.81%, precision of 100%, recall or sensitivity of 99.64% and F1-measure of 99.82%. As for fuels, the classifier reached an accuracy of 100% and an F1-measure of 100%. Conclusion: It was possible to show that it is feasible to automate the classification of fuels and medicines, enabling investigations. For drugs, it was also possible to extract the hierarchical subclasses of the descriptions, namely: active ingredient, dosage, pharmaceutical form and quantity.Contexto: A Organização das Nações Unidas (ONU) descreve a corrupção como uma praga insidiosa, que tem uma ampla gama de efeitos corrosivos nas sociedades. Na prática, a corrupção possui uma variedade de instrumentos, desde pequenas quantias no aceleramento de concessões de licenças, até grandes fraudes em processos licitatórios de diversas áreas do país. Para a área da saúde, por exemplo, os gastos com medicamentos envolvem um volume expressivo de recursos, cerca de R$ 18 bilhões em 2018, potencialmente expostos às condutas lesivas ao erário. Em outra área de importante impacto, a de combustíveis, o devedor contumaz, aquele que deixa de recolher o tributo devido, foi responsável por R$ 14 bilhões de sonegação de impostos em 2020. Para tentar combater esses problemas, faz-se necessária a classificação e subtotalização automática das Notas Fiscais Eletrônicas (NF-es) emitidas para aquisição destes produtos, considerando os seus códigos de identificação únicos e suas descrições. Todavia, nem sempre os códigos são cadastrados corretamente pelos fornecedores. Além disso, se a descrição do produto for considerada uma alternativa ao código, esta não é um campo uniformizado, possuindo escrita livre e variável. Por fim, alguns produtos possuem uma classificação hierárquica nas suas descrições, importantes para uma identificação completa. Objetivo: Construir e avaliar a eficácia de um classificador de Notas Fiscais de Combustíveis e Medicamentos, baseado na mineração dos textos desestruturados destas notas, no contexto de compras feitas por órgãos públicos dos estados de Sergipe e do Rio Grande do Norte, analisadas pelos Ministérios Públicos Estadual e Federal (MPE; MPF), Grupo de Atuação Especial de Combate ao Crime (GAECO) e Secretarias da Fazenda Estaduais. Método: Após o desenvolvimento e a parametrização inicial do classificador, foram executados dois experimentos controlados com NF-es custodiadas pelos MPs, respeitando o sigilo fiscal dos envolvidos. Resultados: Considerando a significância estatística, o classificador foi capaz de identificar as descrições de medicamentos e suas subclasses hierárquicas, com os seguintes resultados médios: acurácia de 99.81%, precisão de 100%, revocação ou sensibilidade de 99.64% e medida-F1 de 99.82%. Já para combustíveis, o classificador alcançou acurácia de 100% e medida-F1 de 100%. Conclusão: Foi possível evidenciar que é factível automatizar a classificação de combustíveis e medicamentos, viabilizando investigações. Para medicamentos, também foi possível extrair as subclasses hierárquicas das descrições, a saber: princípio ativo, dosagem, forma farmacêutica e quantidade.São CristóvãoporMedicamentoCombustívelCorrupçãoMineração de dadosNota Fiscal EletrônicaMedicineFuelCorruptionData miningElectronic invoiceCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOAvaliação experimental de um classificador para apoiar a detecção de fraudes em compras públicasinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisPós-Graduação em Ciência da ComputaçãoUniversidade Federal de Sergipereponame:Repositório Institucional da UFSinstname:Universidade Federal de Sergipe (UFS)instacron:UFSinfo:eu-repo/semantics/openAccessTEXTRAPHAEL_SILVA_FONTES.pdf.txtRAPHAEL_SILVA_FONTES.pdf.txtExtracted texttext/plain149966https://ri.ufs.br/jspui/bitstream/riufs/15098/3/RAPHAEL_SILVA_FONTES.pdf.txt76936b47f383610eaa26f8506f291ec9MD53THUMBNAILRAPHAEL_SILVA_FONTES.pdf.jpgRAPHAEL_SILVA_FONTES.pdf.jpgGenerated Thumbnailimage/jpeg1459https://ri.ufs.br/jspui/bitstream/riufs/15098/4/RAPHAEL_SILVA_FONTES.pdf.jpg944c1dd4533c1d16a465dab3e15042dfMD54LICENSElicense.txtlicense.txttext/plain; charset=utf-81475https://ri.ufs.br/jspui/bitstream/riufs/15098/1/license.txt098cbbf65c2c15e1fb2e49c5d306a44cMD51ORIGINALRAPHAEL_SILVA_FONTES.pdfRAPHAEL_SILVA_FONTES.pdfapplication/pdf1773097https://ri.ufs.br/jspui/bitstream/riufs/15098/2/RAPHAEL_SILVA_FONTES.pdf83f9922f959cbf5d61693455a72e0249MD52riufs/150982022-03-04 09:49:50.695oai:ufs.br:riufs/15098TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkNvbSBhIGFwcmVzZW50YcOnw6NvIGRlc3RhIGxpY2Vuw6dhLCB2b2PDqiAobyBhdXRvcihlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSDDoCBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBTZXJnaXBlIG8gZGlyZWl0byBuw6NvLWV4Y2x1c2l2byBkZSByZXByb2R1emlyIHNldSB0cmFiYWxobyBubyBmb3JtYXRvIGVsZXRyw7RuaWNvLCBpbmNsdWluZG8gb3MgZm9ybWF0b3Mgw6F1ZGlvIG91IHbDrWRlby4KClZvY8OqIGNvbmNvcmRhIHF1ZSBhIFVuaXZlcnNpZGFkZSBGZWRlcmFsIGRlIFNlcmdpcGUgcG9kZSwgc2VtIGFsdGVyYXIgbyBjb250ZcO6ZG8sIHRyYW5zcG9yIHNldSB0cmFiYWxobyBwYXJhIHF1YWxxdWVyIG1laW8gb3UgZm9ybWF0byBwYXJhIGZpbnMgZGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIHRhbWLDqW0gY29uY29yZGEgcXVlIGEgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZGUgU2VyZ2lwZSBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgZGUgc2V1IHRyYWJhbGhvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIHNldSB0cmFiYWxobyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcyBuZXN0YSBsaWNlbsOnYS4gVm9jw6ogdGFtYsOpbSBkZWNsYXJhIHF1ZSBvIGRlcMOzc2l0bywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgbsOjbyBpbmZyaW5nZSBkaXJlaXRvcyBhdXRvcmFpcyBkZSBuaW5ndcOpbS4KCkNhc28gbyB0cmFiYWxobyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgw6AgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZGUgU2VyZ2lwZSBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvLgoKQSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBTZXJnaXBlIHNlIGNvbXByb21ldGUgYSBpZGVudGlmaWNhciBjbGFyYW1lbnRlIG8gc2V1IG5vbWUocykgb3UgbyhzKSBub21lKHMpIGRvKHMpIApkZXRlbnRvcihlcykgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIGRvIHRyYWJhbGhvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzIGNvbmNlZGlkYXMgcG9yIGVzdGEgbGljZW7Dp2EuIAo=Repositório InstitucionalPUBhttps://ri.ufs.br/oai/requestrepositorio@academico.ufs.bropendoar:2022-03-04T12:49:50Repositório Institucional da UFS - Universidade Federal de Sergipe (UFS)false
dc.title.pt_BR.fl_str_mv Avaliação experimental de um classificador para apoiar a detecção de fraudes em compras públicas
title Avaliação experimental de um classificador para apoiar a detecção de fraudes em compras públicas
spellingShingle Avaliação experimental de um classificador para apoiar a detecção de fraudes em compras públicas
Fontes, Raphael Silva
Medicamento
Combustível
Corrupção
Mineração de dados
Nota Fiscal Eletrônica
Medicine
Fuel
Corruption
Data mining
Electronic invoice
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short Avaliação experimental de um classificador para apoiar a detecção de fraudes em compras públicas
title_full Avaliação experimental de um classificador para apoiar a detecção de fraudes em compras públicas
title_fullStr Avaliação experimental de um classificador para apoiar a detecção de fraudes em compras públicas
title_full_unstemmed Avaliação experimental de um classificador para apoiar a detecção de fraudes em compras públicas
title_sort Avaliação experimental de um classificador para apoiar a detecção de fraudes em compras públicas
author Fontes, Raphael Silva
author_facet Fontes, Raphael Silva
author_role author
dc.contributor.author.fl_str_mv Fontes, Raphael Silva
dc.contributor.advisor1.fl_str_mv Rodrigues Júnior, Methanias Colaço
contributor_str_mv Rodrigues Júnior, Methanias Colaço
dc.subject.por.fl_str_mv Medicamento
Combustível
Corrupção
Mineração de dados
Nota Fiscal Eletrônica
Medicine
topic Medicamento
Combustível
Corrupção
Mineração de dados
Nota Fiscal Eletrônica
Medicine
Fuel
Corruption
Data mining
Electronic invoice
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
dc.subject.eng.fl_str_mv Fuel
Corruption
Data mining
Electronic invoice
dc.subject.cnpq.fl_str_mv CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description Context: The United Nations (UN) describes corruption as an insidious plague, which has a wide range of corrosive effects on societies. In practice, corruption has a variety of instruments, from small amounts in accelerating the granting licenses, to large frauds in bidding processes in different areas of the country. For the health area, for example, spending on medicines involves a significant volume of resources, about R$18 billion in 2018, potentially exposed to harmful conduct to the public purse. In another area of important impact, fuel, the persistent debtor, the one who fails to pay the tax due, was responsible for R$ 14 billion of tax evasion in 2020. To try to combat these problems, it is necessary to classify and automatic subtotaling of Electronic Invoices (NF-es) issued for the purchase of these products, considering their unique identification codes and descriptions. However, the codes are not always registered correctly by the suppliers. Furthermore, if the product description is considered an alternative to the code, this is not a uniform field, being free-write and variable. Finally, some products have a hierarchical classification in their descriptions, which are important for complete identification. Objective: To build and evaluate the effectiveness of a classifier of Invoices for Fuels and Medicines, based on mining the unstructured texts of these invoices, in the context of purchases made by public bodies in the states of Sergipe and Rio Grande do Norte, analyzed by the State and Federal Prosecution Offices (MPE; MPF), Special Action Group to Combat Organized Crime (GAECO) and State Finance Departments. Method: After the development and initial parameterization of the classifier, two controlled experiments were carried out with NF-es held by the MPs, respecting the fiscal secrecy of those involved. Results: Considering the statistical significance, the classifier was able to identify drug descriptions and their hierarchical subclasses, with the following average results: accuracy of 99.81%, precision of 100%, recall or sensitivity of 99.64% and F1-measure of 99.82%. As for fuels, the classifier reached an accuracy of 100% and an F1-measure of 100%. Conclusion: It was possible to show that it is feasible to automate the classification of fuels and medicines, enabling investigations. For drugs, it was also possible to extract the hierarchical subclasses of the descriptions, namely: active ingredient, dosage, pharmaceutical form and quantity.
publishDate 2022
dc.date.accessioned.fl_str_mv 2022-03-04T12:49:50Z
dc.date.available.fl_str_mv 2022-03-04T12:49:50Z
dc.date.issued.fl_str_mv 2022-01-31
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv FONTES, Raphael Silva. Avaliação experimental de um classificador para apoiar a detecção de fraudes em compras públicas. 2022. 68 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Sergipe, São Cristóvão, 2022.
dc.identifier.uri.fl_str_mv https://ri.ufs.br/jspui/handle/riufs/15098
identifier_str_mv FONTES, Raphael Silva. Avaliação experimental de um classificador para apoiar a detecção de fraudes em compras públicas. 2022. 68 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Sergipe, São Cristóvão, 2022.
url https://ri.ufs.br/jspui/handle/riufs/15098
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.program.fl_str_mv Pós-Graduação em Ciência da Computação
dc.publisher.initials.fl_str_mv Universidade Federal de Sergipe
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFS
instname:Universidade Federal de Sergipe (UFS)
instacron:UFS
instname_str Universidade Federal de Sergipe (UFS)
instacron_str UFS
institution UFS
reponame_str Repositório Institucional da UFS
collection Repositório Institucional da UFS
bitstream.url.fl_str_mv https://ri.ufs.br/jspui/bitstream/riufs/15098/3/RAPHAEL_SILVA_FONTES.pdf.txt
https://ri.ufs.br/jspui/bitstream/riufs/15098/4/RAPHAEL_SILVA_FONTES.pdf.jpg
https://ri.ufs.br/jspui/bitstream/riufs/15098/1/license.txt
https://ri.ufs.br/jspui/bitstream/riufs/15098/2/RAPHAEL_SILVA_FONTES.pdf
bitstream.checksum.fl_str_mv 76936b47f383610eaa26f8506f291ec9
944c1dd4533c1d16a465dab3e15042df
098cbbf65c2c15e1fb2e49c5d306a44c
83f9922f959cbf5d61693455a72e0249
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFS - Universidade Federal de Sergipe (UFS)
repository.mail.fl_str_mv repositorio@academico.ufs.br
_version_ 1802111161300156416