Classificação associativa sob demanda

Detalhes bibliográficos
Ano de defesa: 2009
Autor(a) principal: Adriano Alonso Veloso
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Minas Gerais
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://hdl.handle.net/1843/SLSS-7WFMGG
Resumo: The ultimate goal of machines is to help humans to solve problems. The solutions for such problems are typically programmed by experts, and the machines need only to follow the specified steps to solve the problem. However, the solution of some problems may be too difficult to be explicitly programmed. In such difficult cases, instead of directly programming machines to solve the problem, machines can be programmed to learn the solution. Machine Learning encompasses techniques used to program machines to learn. It is one of the fastest-growing research areas today, mainly motivated by the fact that the advent of improved learning techniques would open up many newuses for machines (i.e., problems for which the solution is hard to program by hand). A prominent approach to machine learning is to repeatedly demonstrate how the problem is solved, and let the machine learn by example, so that it generalizes some rules about the solution and turn these into a program. This process is known as supervised learning. Specifically, the machine takes matched values of inputs (instantiations of the problem to be solved) and outputs (the solution) and absorb whatever information their relation contains in order to emulate the true mapping of inputs to outputs. When outputs are drawn from a pre-specified and finite set of possibilities, the process is known as classification, which is a major data mining task. Some classification problems are hard to solve, and motivate this thesis. The keyinsight that is exploited in this thesis is that a difficult problem can be decomposed into several much simpler sub-problems. This thesis is to show that, instead of directly solving a difficult problem, independently solving its sub-problems by taking into account their particular demands, often leads to improved classification performance. This is shown empirically, by solving real-world problems (for which the solutions are hard to program) using the computationaly efficient algorithms that are presented in this thesis. These problems include categorization of documents and name disambiguation in digital libraries, ranking documents retrieved by search engines, protein functional analysis, revenue optimization, among others. Improvements in classification performance are reported for all these problems (in some cases with gains of more than 100%). Further, theoretical evidence supporting our algorithms is also provided.
id UFMG_0d24e7b4e9277f3f830eeba6abcbfb63
oai_identifier_str oai:repositorio.ufmg.br:1843/SLSS-7WFMGG
network_acronym_str UFMG
network_name_str Repositório Institucional da UFMG
repository_id_str
spelling 2019-08-12T12:15:43Z2025-09-09T00:15:22Z2019-08-12T12:15:43Z2009-03-09https://hdl.handle.net/1843/SLSS-7WFMGGThe ultimate goal of machines is to help humans to solve problems. The solutions for such problems are typically programmed by experts, and the machines need only to follow the specified steps to solve the problem. However, the solution of some problems may be too difficult to be explicitly programmed. In such difficult cases, instead of directly programming machines to solve the problem, machines can be programmed to learn the solution. Machine Learning encompasses techniques used to program machines to learn. It is one of the fastest-growing research areas today, mainly motivated by the fact that the advent of improved learning techniques would open up many newuses for machines (i.e., problems for which the solution is hard to program by hand). A prominent approach to machine learning is to repeatedly demonstrate how the problem is solved, and let the machine learn by example, so that it generalizes some rules about the solution and turn these into a program. This process is known as supervised learning. Specifically, the machine takes matched values of inputs (instantiations of the problem to be solved) and outputs (the solution) and absorb whatever information their relation contains in order to emulate the true mapping of inputs to outputs. When outputs are drawn from a pre-specified and finite set of possibilities, the process is known as classification, which is a major data mining task. Some classification problems are hard to solve, and motivate this thesis. The keyinsight that is exploited in this thesis is that a difficult problem can be decomposed into several much simpler sub-problems. This thesis is to show that, instead of directly solving a difficult problem, independently solving its sub-problems by taking into account their particular demands, often leads to improved classification performance. This is shown empirically, by solving real-world problems (for which the solutions are hard to program) using the computationaly efficient algorithms that are presented in this thesis. These problems include categorization of documents and name disambiguation in digital libraries, ranking documents retrieved by search engines, protein functional analysis, revenue optimization, among others. Improvements in classification performance are reported for all these problems (in some cases with gains of more than 100%). Further, theoretical evidence supporting our algorithms is also provided.Universidade Federal de Minas GeraisComputaçãoMineração de dados (Computação)Classificação associativa sob demandainfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisAdriano Alonso Velosoinfo:eu-repo/semantics/openAccessporreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMGWagner Meira JuniorAndré Carlos Ponce de Leon Ferreira de CarvalhoBianca ZadroznyMohammed J. ZakiAlberto Henrique Frade LaenderMarcos Andre GoncalvesO objetivo primordial das máquinas é o de ajudar pessoas a resolver problemas. As soluções para tais problemas são geralmente programadas por especialistas, de tal forma que as máquinas precisam apenas seguir os passos que foram especificados no programa.No entanto, as soluçõoes para alguns problemas são muito difíceis de serem programadas explicitamente. Nestes casos, ao invés de programar a máquina para solucionar o problema, a máquina é programada para aprender a solução de tal problema. A Aprendizagem de Máquina compreende o desenvolvimento de técnicas que possam ser usadas para programar máquinas a aprender. Uma abordagem para a aprendizagem de máquina é demonstrar para a máquina,repetidas vezes, como o problema é solucionado, e simplesmente deixá-la aprender com esses exemplos, de forma que ela possa generalizar regras sobre a solução, e finalmente transformar tais regras em um programa que solucione o problema. Este processo é denominado aprendizagem supervisionada. Neste caso, são fornecidos exemplos de entradas e suas respectivas saídas, de forma que a máquina possa, após absorver o máximo de informação desses exemplos, emular o mapeamento de entradas a saídas. Quandoas saídas assumem valores pre-especificados, esse processo é denominado classificação. Classificação é uma das tarefas mais tradicionais em mineração de dados. Alguns problemas de classificação são extremamente difíceis de solucionar, e motivamesta tese. A intuição explorada nesta tese é que um problema de difícil solução pode ser decomposto em vários sub-problemas mais simples. Esta tese mostra que, solucionar de forma independente sub-problemas mais simples, ao invés de solucionar umproblema difícil diretamente, geralmente leva a resultados melhores. Isto é mostrado empiricamente, através da solução de problemas úteis e importantes, usando os algoritmos apresentados nesta tese. Tais problemas incluem categorização de documentos e remoção de ambiguidade em bibliotecas digitais, ordenação de documentos retornados por máquinas de busca, otimização de renda, entre muitos outros. Ganhos em efetividade são reportados em todos estes problemas (em alguns casos com ganhos maiores que 100%). Além disso, apresentamos evidéncia teórica que suporta nossos algoritmos.UFMGORIGINALadrianoalonsoveloso.pdfapplication/pdf3608755https://repositorio.ufmg.br//bitstreams/8131f96c-dc29-464e-8cfb-f14e4a0b7ed5/downloadafb64040d438e87c1be979a018a57a1dMD51trueAnonymousREADTEXTadrianoalonsoveloso.pdf.txttext/plain319269https://repositorio.ufmg.br//bitstreams/09477370-4a35-426d-af18-c3f65e50b46b/download3b6d5ea6b00e9c9509838367d0f99a90MD52falseAnonymousREAD1843/SLSS-7WFMGG2025-09-08 21:15:22.346open.accessoai:repositorio.ufmg.br:1843/SLSS-7WFMGGhttps://repositorio.ufmg.br/Repositório InstitucionalPUBhttps://repositorio.ufmg.br/oairepositorio@ufmg.bropendoar:2025-09-09T00:15:22Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false
dc.title.none.fl_str_mv Classificação associativa sob demanda
title Classificação associativa sob demanda
spellingShingle Classificação associativa sob demanda
Adriano Alonso Veloso
Computação
Mineração de dados (Computação)
title_short Classificação associativa sob demanda
title_full Classificação associativa sob demanda
title_fullStr Classificação associativa sob demanda
title_full_unstemmed Classificação associativa sob demanda
title_sort Classificação associativa sob demanda
author Adriano Alonso Veloso
author_facet Adriano Alonso Veloso
author_role author
dc.contributor.author.fl_str_mv Adriano Alonso Veloso
dc.subject.por.fl_str_mv Computação
Mineração de dados (Computação)
topic Computação
Mineração de dados (Computação)
description The ultimate goal of machines is to help humans to solve problems. The solutions for such problems are typically programmed by experts, and the machines need only to follow the specified steps to solve the problem. However, the solution of some problems may be too difficult to be explicitly programmed. In such difficult cases, instead of directly programming machines to solve the problem, machines can be programmed to learn the solution. Machine Learning encompasses techniques used to program machines to learn. It is one of the fastest-growing research areas today, mainly motivated by the fact that the advent of improved learning techniques would open up many newuses for machines (i.e., problems for which the solution is hard to program by hand). A prominent approach to machine learning is to repeatedly demonstrate how the problem is solved, and let the machine learn by example, so that it generalizes some rules about the solution and turn these into a program. This process is known as supervised learning. Specifically, the machine takes matched values of inputs (instantiations of the problem to be solved) and outputs (the solution) and absorb whatever information their relation contains in order to emulate the true mapping of inputs to outputs. When outputs are drawn from a pre-specified and finite set of possibilities, the process is known as classification, which is a major data mining task. Some classification problems are hard to solve, and motivate this thesis. The keyinsight that is exploited in this thesis is that a difficult problem can be decomposed into several much simpler sub-problems. This thesis is to show that, instead of directly solving a difficult problem, independently solving its sub-problems by taking into account their particular demands, often leads to improved classification performance. This is shown empirically, by solving real-world problems (for which the solutions are hard to program) using the computationaly efficient algorithms that are presented in this thesis. These problems include categorization of documents and name disambiguation in digital libraries, ranking documents retrieved by search engines, protein functional analysis, revenue optimization, among others. Improvements in classification performance are reported for all these problems (in some cases with gains of more than 100%). Further, theoretical evidence supporting our algorithms is also provided.
publishDate 2009
dc.date.issued.fl_str_mv 2009-03-09
dc.date.accessioned.fl_str_mv 2019-08-12T12:15:43Z
2025-09-09T00:15:22Z
dc.date.available.fl_str_mv 2019-08-12T12:15:43Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/1843/SLSS-7WFMGG
url https://hdl.handle.net/1843/SLSS-7WFMGG
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal de Minas Gerais
publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFMG
instname:Universidade Federal de Minas Gerais (UFMG)
instacron:UFMG
instname_str Universidade Federal de Minas Gerais (UFMG)
instacron_str UFMG
institution UFMG
reponame_str Repositório Institucional da UFMG
collection Repositório Institucional da UFMG
bitstream.url.fl_str_mv https://repositorio.ufmg.br//bitstreams/8131f96c-dc29-464e-8cfb-f14e4a0b7ed5/download
https://repositorio.ufmg.br//bitstreams/09477370-4a35-426d-af18-c3f65e50b46b/download
bitstream.checksum.fl_str_mv afb64040d438e87c1be979a018a57a1d
3b6d5ea6b00e9c9509838367d0f99a90
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv repositorio@ufmg.br
_version_ 1862106007195353088