Structure learning of Bayesian networks via data perturbation

Detalhes bibliográficos
Ano de defesa: 2018
Autor(a) principal: Gross, Tadeu Junior
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://www.teses.usp.br/teses/disponiveis/18/18153/tde-19022019-134517/
Resumo: Structure learning of Bayesian Networks (BNs) is an NP-hard problem, and the use of sub-optimal strategies is essential in domains involving many variables. One of them is to generate multiple approximate structures and then to reduce the ensemble to a representative structure. It is possible to use the occurrence frequency (on the structures ensemble) as the criteria for accepting a dominant directed edge between two nodes and thus obtaining the single structure. In this doctoral research, it was made an analogy with an adapted one-dimensional random-walk for analytically deducing an appropriate decision threshold to such occurrence frequency. The obtained closed-form expression has been validated across benchmark datasets applying the Matthews Correlation Coefficient as the performance metric. In the experiments using a recent medical dataset, the BN resulting from the analytical cutoff-frequency captured the expected associations among nodes and also achieved better prediction performance than the BNs learned with neighbours thresholds to the computed. In literature, the feature accounted along of the perturbed structures has been the edges and not the directed edges (arcs) as in this thesis. That modified strategy still was applied to an elderly dataset to identify potential relationships between variables of medical interest but using an increased threshold instead of the predict by the proposed formula - such prudence is due to the possible social implications of the finding. The motivation behind such an application is that in spite of the proportion of elderly individuals in the population has increased substantially in the last few decades, the risk factors that should be managed in advance to ensure a natural process of mental decline due to ageing remain unknown. In the learned structural model, it was graphically investigated the probabilistic dependence mechanism between two variables of medical interest: the suspected risk factor known as Metabolic Syndrome and the indicator of mental decline referred to as Cognitive Impairment. In this investigation, the concept known in the context of BNs as D-separation has been employed. Results of the carried out study revealed that the dependence between Metabolic Syndrome and Cognitive Variables indeed exists and depends on both Body Mass Index and age.
id USP_40093c1de135f85b2b21297de97bf9c4
oai_identifier_str oai:teses.usp.br:tde-19022019-134517
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str
spelling Structure learning of Bayesian networks via data perturbationAprendizagem estrutural de Redes Bayesianas via perturbação de dadosAnalytical thresholdAprendizado de estruturas robustasAssociations discoveryBayesian networkClosed-form expression to compute the cutoff-frequencyCognitive impairmentD-separaçãoD-separationData perturbation via bootstrap replicasDescoberta de associaçõesDirected acyclic graphEnvelhecimento da populaçãoEstabilidade de arcosExpressão fechada para calcular a frequência-de-corteFatores de riscoGrafo acíclico dirigidoLearning of robust structureslimiar analíticoMédia de modelosMetabolic syndromeModel averagingPerturbação de dados via bootstrapPopulation ageingRede BayesianaRisk factorsSíndrome metabólicaStability of arcsTranstorno cognitivoStructure learning of Bayesian Networks (BNs) is an NP-hard problem, and the use of sub-optimal strategies is essential in domains involving many variables. One of them is to generate multiple approximate structures and then to reduce the ensemble to a representative structure. It is possible to use the occurrence frequency (on the structures ensemble) as the criteria for accepting a dominant directed edge between two nodes and thus obtaining the single structure. In this doctoral research, it was made an analogy with an adapted one-dimensional random-walk for analytically deducing an appropriate decision threshold to such occurrence frequency. The obtained closed-form expression has been validated across benchmark datasets applying the Matthews Correlation Coefficient as the performance metric. In the experiments using a recent medical dataset, the BN resulting from the analytical cutoff-frequency captured the expected associations among nodes and also achieved better prediction performance than the BNs learned with neighbours thresholds to the computed. In literature, the feature accounted along of the perturbed structures has been the edges and not the directed edges (arcs) as in this thesis. That modified strategy still was applied to an elderly dataset to identify potential relationships between variables of medical interest but using an increased threshold instead of the predict by the proposed formula - such prudence is due to the possible social implications of the finding. The motivation behind such an application is that in spite of the proportion of elderly individuals in the population has increased substantially in the last few decades, the risk factors that should be managed in advance to ensure a natural process of mental decline due to ageing remain unknown. In the learned structural model, it was graphically investigated the probabilistic dependence mechanism between two variables of medical interest: the suspected risk factor known as Metabolic Syndrome and the indicator of mental decline referred to as Cognitive Impairment. In this investigation, the concept known in the context of BNs as D-separation has been employed. Results of the carried out study revealed that the dependence between Metabolic Syndrome and Cognitive Variables indeed exists and depends on both Body Mass Index and age.O aprendizado da estrutura de uma Rede Bayesiana (BN) é um problema NP-difícil, e o uso de estratégias sub-ótimas é essencial em domínios que envolvem muitas variáveis. Uma delas consiste em gerar várias estruturas aproximadas e depois reduzir o conjunto a uma estrutura representativa. É possível usar a frequência de ocorrência (no conjunto de estruturas) como critério para aceitar um arco dominante entre dois nós e assim obter essa estrutura única. Nesta pesquisa de doutorado, foi feita uma analogia com um passeio aleatório unidimensional adaptado para deduzir analiticamente um limiar de decisão apropriado para essa frequência de ocorrência. A expressão de forma fechada obtida foi validada usando bases de dados de referência e aplicando o Coeficiente de Correlação de Matthews como métrica de desempenho. Nos experimentos utilizando dados médicos recentes, a BN resultante da frequência de corte analítica capturou as associações esperadas entre os nós e também obteve melhor desempenho de predição do que as BNs aprendidas com limiares vizinhos ao calculado. Na literatura, a característica contabilizada ao longo das estruturas perturbadas tem sido as arestas e não as arestas direcionadas (arcos) como nesta tese. Essa estratégia modificada ainda foi aplicada a um conjunto de dados de idosos para identificar potenciais relações entre variáveis de interesse médico, mas usando um limiar aumentado em vez do previsto pela fórmula proposta - essa cautela deve-se às possíveis implicações sociais do achado. A motivação por trás dessa aplicação é que, apesar da proporção de idosos na população ter aumentado substancialmente nas últimas décadas, os fatores de risco que devem ser controlados com antecedência para garantir um processo natural de declínio mental devido ao envelhecimento permanecem desconhecidos. No modelo estrutural aprendido, investigou-se graficamente o mecanismo de dependência probabilística entre duas variáveis de interesse médico: o fator de risco suspeito conhecido como Síndrome Metabólica e o indicador de declínio mental denominado Comprometimento Cognitivo. Nessa investigação, empregou-se o conceito conhecido no contexto de BNs como D-separação. Esse estudo revelou que a dependência entre Síndrome Metabólica e Variáveis Cognitivas de fato existe e depende tanto do Índice de Massa Corporal quanto da idade.Biblioteca Digitais de Teses e Dissertações da USPMaciel, Carlos DiasGross, Tadeu Junior2018-11-29info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttp://www.teses.usp.br/teses/disponiveis/18/18153/tde-19022019-134517/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2019-04-09T23:21:59Zoai:teses.usp.br:tde-19022019-134517Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212019-04-09T23:21:59Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Structure learning of Bayesian networks via data perturbation
Aprendizagem estrutural de Redes Bayesianas via perturbação de dados
title Structure learning of Bayesian networks via data perturbation
spellingShingle Structure learning of Bayesian networks via data perturbation
Gross, Tadeu Junior
Analytical threshold
Aprendizado de estruturas robustas
Associations discovery
Bayesian network
Closed-form expression to compute the cutoff-frequency
Cognitive impairment
D-separação
D-separation
Data perturbation via bootstrap replicas
Descoberta de associações
Directed acyclic graph
Envelhecimento da população
Estabilidade de arcos
Expressão fechada para calcular a frequência-de-corte
Fatores de risco
Grafo acíclico dirigido
Learning of robust structures
limiar analítico
Média de modelos
Metabolic syndrome
Model averaging
Perturbação de dados via bootstrap
Population ageing
Rede Bayesiana
Risk factors
Síndrome metabólica
Stability of arcs
Transtorno cognitivo
title_short Structure learning of Bayesian networks via data perturbation
title_full Structure learning of Bayesian networks via data perturbation
title_fullStr Structure learning of Bayesian networks via data perturbation
title_full_unstemmed Structure learning of Bayesian networks via data perturbation
title_sort Structure learning of Bayesian networks via data perturbation
author Gross, Tadeu Junior
author_facet Gross, Tadeu Junior
author_role author
dc.contributor.none.fl_str_mv Maciel, Carlos Dias
dc.contributor.author.fl_str_mv Gross, Tadeu Junior
dc.subject.por.fl_str_mv Analytical threshold
Aprendizado de estruturas robustas
Associations discovery
Bayesian network
Closed-form expression to compute the cutoff-frequency
Cognitive impairment
D-separação
D-separation
Data perturbation via bootstrap replicas
Descoberta de associações
Directed acyclic graph
Envelhecimento da população
Estabilidade de arcos
Expressão fechada para calcular a frequência-de-corte
Fatores de risco
Grafo acíclico dirigido
Learning of robust structures
limiar analítico
Média de modelos
Metabolic syndrome
Model averaging
Perturbação de dados via bootstrap
Population ageing
Rede Bayesiana
Risk factors
Síndrome metabólica
Stability of arcs
Transtorno cognitivo
topic Analytical threshold
Aprendizado de estruturas robustas
Associations discovery
Bayesian network
Closed-form expression to compute the cutoff-frequency
Cognitive impairment
D-separação
D-separation
Data perturbation via bootstrap replicas
Descoberta de associações
Directed acyclic graph
Envelhecimento da população
Estabilidade de arcos
Expressão fechada para calcular a frequência-de-corte
Fatores de risco
Grafo acíclico dirigido
Learning of robust structures
limiar analítico
Média de modelos
Metabolic syndrome
Model averaging
Perturbação de dados via bootstrap
Population ageing
Rede Bayesiana
Risk factors
Síndrome metabólica
Stability of arcs
Transtorno cognitivo
description Structure learning of Bayesian Networks (BNs) is an NP-hard problem, and the use of sub-optimal strategies is essential in domains involving many variables. One of them is to generate multiple approximate structures and then to reduce the ensemble to a representative structure. It is possible to use the occurrence frequency (on the structures ensemble) as the criteria for accepting a dominant directed edge between two nodes and thus obtaining the single structure. In this doctoral research, it was made an analogy with an adapted one-dimensional random-walk for analytically deducing an appropriate decision threshold to such occurrence frequency. The obtained closed-form expression has been validated across benchmark datasets applying the Matthews Correlation Coefficient as the performance metric. In the experiments using a recent medical dataset, the BN resulting from the analytical cutoff-frequency captured the expected associations among nodes and also achieved better prediction performance than the BNs learned with neighbours thresholds to the computed. In literature, the feature accounted along of the perturbed structures has been the edges and not the directed edges (arcs) as in this thesis. That modified strategy still was applied to an elderly dataset to identify potential relationships between variables of medical interest but using an increased threshold instead of the predict by the proposed formula - such prudence is due to the possible social implications of the finding. The motivation behind such an application is that in spite of the proportion of elderly individuals in the population has increased substantially in the last few decades, the risk factors that should be managed in advance to ensure a natural process of mental decline due to ageing remain unknown. In the learned structural model, it was graphically investigated the probabilistic dependence mechanism between two variables of medical interest: the suspected risk factor known as Metabolic Syndrome and the indicator of mental decline referred to as Cognitive Impairment. In this investigation, the concept known in the context of BNs as D-separation has been employed. Results of the carried out study revealed that the dependence between Metabolic Syndrome and Cognitive Variables indeed exists and depends on both Body Mass Index and age.
publishDate 2018
dc.date.none.fl_str_mv 2018-11-29
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://www.teses.usp.br/teses/disponiveis/18/18153/tde-19022019-134517/
url http://www.teses.usp.br/teses/disponiveis/18/18153/tde-19022019-134517/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1815258263603642368