Classificação associativa sob demanda

Adriano Alonso Veloso

Classificação associativa sob demanda

Detalhes bibliográficos
Ano de defesa:	2009
Autor(a) principal:	Adriano Alonso Veloso
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de Minas Gerais
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Computação Mineração de dados (Computação)
Link de acesso:	https://hdl.handle.net/1843/SLSS-7WFMGG
Resumo:	The ultimate goal of machines is to help humans to solve problems. The solutions for such problems are typically programmed by experts, and the machines need only to follow the specified steps to solve the problem. However, the solution of some problems may be too difficult to be explicitly programmed. In such difficult cases, instead of directly programming machines to solve the problem, machines can be programmed to learn the solution. Machine Learning encompasses techniques used to program machines to learn. It is one of the fastest-growing research areas today, mainly motivated by the fact that the advent of improved learning techniques would open up many newuses for machines (i.e., problems for which the solution is hard to program by hand). A prominent approach to machine learning is to repeatedly demonstrate how the problem is solved, and let the machine learn by example, so that it generalizes some rules about the solution and turn these into a program. This process is known as supervised learning. Specifically, the machine takes matched values of inputs (instantiations of the problem to be solved) and outputs (the solution) and absorb whatever information their relation contains in order to emulate the true mapping of inputs to outputs. When outputs are drawn from a pre-specified and finite set of possibilities, the process is known as classification, which is a major data mining task. Some classification problems are hard to solve, and motivate this thesis. The keyinsight that is exploited in this thesis is that a difficult problem can be decomposed into several much simpler sub-problems. This thesis is to show that, instead of directly solving a difficult problem, independently solving its sub-problems by taking into account their particular demands, often leads to improved classification performance. This is shown empirically, by solving real-world problems (for which the solutions are hard to program) using the computationaly efficient algorithms that are presented in this thesis. These problems include categorization of documents and name disambiguation in digital libraries, ranking documents retrieved by search engines, protein functional analysis, revenue optimization, among others. Improvements in classification performance are reported for all these problems (in some cases with gains of more than 100%). Further, theoretical evidence supporting our algorithms is also provided.

Metadados do item

id	UFMG_0d24e7b4e9277f3f830eeba6abcbfb63
oai_identifier_str	oai:repositorio.ufmg.br:1843/SLSS-7WFMGG
network_acronym_str	UFMG
network_name_str	Repositório Institucional da UFMG
repository_id_str
spelling	2019-08-12T12:15:43Z2025-09-09T00:15:22Z2019-08-12T12:15:43Z2009-03-09https://hdl.handle.net/1843/SLSS-7WFMGGThe ultimate goal of machines is to help humans to solve problems. The solutions for such problems are typically programmed by experts, and the machines need only to follow the specified steps to solve the problem. However, the solution of some problems may be too difficult to be explicitly programmed. In such difficult cases, instead of directly programming machines to solve the problem, machines can be programmed to learn the solution. Machine Learning encompasses techniques used to program machines to learn. It is one of the fastest-growing research areas today, mainly motivated by the fact that the advent of improved learning techniques would open up many newuses for machines (i.e., problems for which the solution is hard to program by hand). A prominent approach to machine learning is to repeatedly demonstrate how the problem is solved, and let the machine learn by example, so that it generalizes some rules about the solution and turn these into a program. This process is known as supervised learning. Specifically, the machine takes matched values of inputs (instantiations of the problem to be solved) and outputs (the solution) and absorb whatever information their relation contains in order to emulate the true mapping of inputs to outputs. When outputs are drawn from a pre-specified and finite set of possibilities, the process is known as classification, which is a major data mining task. Some classification problems are hard to solve, and motivate this thesis. The keyinsight that is exploited in this thesis is that a difficult problem can be decomposed into several much simpler sub-problems. This thesis is to show that, instead of directly solving a difficult problem, independently solving its sub-problems by taking into account their particular demands, often leads to improved classification performance. This is shown empirically, by solving real-world problems (for which the solutions are hard to program) using the computationaly efficient algorithms that are presented in this thesis. These problems include categorization of documents and name disambiguation in digital libraries, ranking documents retrieved by search engines, protein functional analysis, revenue optimization, among others. Improvements in classification performance are reported for all these problems (in some cases with gains of more than 100%). Further, theoretical evidence supporting our algorithms is also provided.Universidade Federal de Minas GeraisComputaçãoMineração de dados (Computação)Classificação associativa sob demandainfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisAdriano Alonso Velosoinfo:eu-repo/semantics/openAccessporreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMGWagner Meira JuniorAndré Carlos Ponce de Leon Ferreira de CarvalhoBianca ZadroznyMohammed J. ZakiAlberto Henrique Frade LaenderMarcos Andre GoncalvesO objetivo primordial das máquinas é o de ajudar pessoas a resolver problemas. As soluções para tais problemas são geralmente programadas por especialistas, de tal forma que as máquinas precisam apenas seguir os passos que foram especificados no programa.No entanto, as soluçõoes para alguns problemas são muito difíceis de serem programadas explicitamente. Nestes casos, ao invés de programar a máquina para solucionar o problema, a máquina é programada para aprender a solução de tal problema. A Aprendizagem de Máquina compreende o desenvolvimento de técnicas que possam ser usadas para programar máquinas a aprender. Uma abordagem para a aprendizagem de máquina é demonstrar para a máquina,repetidas vezes, como o problema é solucionado, e simplesmente deixá-la aprender com esses exemplos, de forma que ela possa generalizar regras sobre a solução, e finalmente transformar tais regras em um programa que solucione o problema. Este processo é denominado aprendizagem supervisionada. Neste caso, são fornecidos exemplos de entradas e suas respectivas saídas, de forma que a máquina possa, após absorver o máximo de informação desses exemplos, emular o mapeamento de entradas a saídas. Quandoas saídas assumem valores pre-especificados, esse processo é denominado classificação. Classificação é uma das tarefas mais tradicionais em mineração de dados. Alguns problemas de classificação são extremamente difíceis de solucionar, e motivamesta tese. A intuição explorada nesta tese é que um problema de difícil solução pode ser decomposto em vários sub-problemas mais simples. Esta tese mostra que, solucionar de forma independente sub-problemas mais simples, ao invés de solucionar umproblema difícil diretamente, geralmente leva a resultados melhores. Isto é mostrado empiricamente, através da solução de problemas úteis e importantes, usando os algoritmos apresentados nesta tese. Tais problemas incluem categorização de documentos e remoção de ambiguidade em bibliotecas digitais, ordenação de documentos retornados por máquinas de busca, otimização de renda, entre muitos outros. Ganhos em efetividade são reportados em todos estes problemas (em alguns casos com ganhos maiores que 100%). Além disso, apresentamos evidéncia teórica que suporta nossos algoritmos.UFMGORIGINALadrianoalonsoveloso.pdfapplication/pdf3608755https://repositorio.ufmg.br//bitstreams/8131f96c-dc29-464e-8cfb-f14e4a0b7ed5/downloadafb64040d438e87c1be979a018a57a1dMD51trueAnonymousREADTEXTadrianoalonsoveloso.pdf.txttext/plain319269https://repositorio.ufmg.br//bitstreams/09477370-4a35-426d-af18-c3f65e50b46b/download3b6d5ea6b00e9c9509838367d0f99a90MD52falseAnonymousREAD1843/SLSS-7WFMGG2025-09-08 21:15:22.346open.accessoai:repositorio.ufmg.br:1843/SLSS-7WFMGGhttps://repositorio.ufmg.br/Repositório InstitucionalPUBhttps://repositorio.ufmg.br/oairepositorio@ufmg.bropendoar:2025-09-09T00:15:22Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false
dc.title.none.fl_str_mv	Classificação associativa sob demanda
title	Classificação associativa sob demanda
spellingShingle	Classificação associativa sob demanda Adriano Alonso Veloso Computação Mineração de dados (Computação)
title_short	Classificação associativa sob demanda
title_full	Classificação associativa sob demanda
title_fullStr	Classificação associativa sob demanda
title_full_unstemmed	Classificação associativa sob demanda
title_sort	Classificação associativa sob demanda
author	Adriano Alonso Veloso
author_facet	Adriano Alonso Veloso
author_role	author
dc.contributor.author.fl_str_mv	Adriano Alonso Veloso
dc.subject.por.fl_str_mv	Computação Mineração de dados (Computação)
topic	Computação Mineração de dados (Computação)
description	The ultimate goal of machines is to help humans to solve problems. The solutions for such problems are typically programmed by experts, and the machines need only to follow the specified steps to solve the problem. However, the solution of some problems may be too difficult to be explicitly programmed. In such difficult cases, instead of directly programming machines to solve the problem, machines can be programmed to learn the solution. Machine Learning encompasses techniques used to program machines to learn. It is one of the fastest-growing research areas today, mainly motivated by the fact that the advent of improved learning techniques would open up many newuses for machines (i.e., problems for which the solution is hard to program by hand). A prominent approach to machine learning is to repeatedly demonstrate how the problem is solved, and let the machine learn by example, so that it generalizes some rules about the solution and turn these into a program. This process is known as supervised learning. Specifically, the machine takes matched values of inputs (instantiations of the problem to be solved) and outputs (the solution) and absorb whatever information their relation contains in order to emulate the true mapping of inputs to outputs. When outputs are drawn from a pre-specified and finite set of possibilities, the process is known as classification, which is a major data mining task. Some classification problems are hard to solve, and motivate this thesis. The keyinsight that is exploited in this thesis is that a difficult problem can be decomposed into several much simpler sub-problems. This thesis is to show that, instead of directly solving a difficult problem, independently solving its sub-problems by taking into account their particular demands, often leads to improved classification performance. This is shown empirically, by solving real-world problems (for which the solutions are hard to program) using the computationaly efficient algorithms that are presented in this thesis. These problems include categorization of documents and name disambiguation in digital libraries, ranking documents retrieved by search engines, protein functional analysis, revenue optimization, among others. Improvements in classification performance are reported for all these problems (in some cases with gains of more than 100%). Further, theoretical evidence supporting our algorithms is also provided.
publishDate	2009
dc.date.issued.fl_str_mv	2009-03-09
dc.date.accessioned.fl_str_mv	2019-08-12T12:15:43Z 2025-09-09T00:15:22Z
dc.date.available.fl_str_mv	2019-08-12T12:15:43Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://hdl.handle.net/1843/SLSS-7WFMGG
url	https://hdl.handle.net/1843/SLSS-7WFMGG
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Universidade Federal de Minas Gerais
publisher.none.fl_str_mv	Universidade Federal de Minas Gerais
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFMG instname:Universidade Federal de Minas Gerais (UFMG) instacron:UFMG
instname_str	Universidade Federal de Minas Gerais (UFMG)
instacron_str	UFMG
institution	UFMG
reponame_str	Repositório Institucional da UFMG
collection	Repositório Institucional da UFMG
bitstream.url.fl_str_mv	https://repositorio.ufmg.br//bitstreams/8131f96c-dc29-464e-8cfb-f14e4a0b7ed5/download https://repositorio.ufmg.br//bitstreams/09477370-4a35-426d-af18-c3f65e50b46b/download
bitstream.checksum.fl_str_mv	afb64040d438e87c1be979a018a57a1d 3b6d5ea6b00e9c9509838367d0f99a90
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv	repositorio@ufmg.br
_version_	1862106007195353088

Classificação associativa sob demanda

Registros relacionados