Designing convolutional neural network architectures based on dynamical system concepts

Martha Dais Ferreira

Designing convolutional neural network architectures based on dynamical system concepts

Detalhes bibliográficos
Ano de defesa:	2019
Autor(a) principal:	Martha Dais Ferreira
Orientador(a):	Rodrigo Fernandes de Mello
Banca de defesa:	Fabio Gagliardi Cozman, Ana Carolina Lorena, Hélio Pedrini
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	eng
Instituição de defesa:	Universidade de São Paulo
Programa de Pós-Graduação:	Ciências da Computação e Matemática Computacional
Departamento:	Não Informado pela instituição
País:	BR
Link de acesso:	https://doi.org/10.11606/T.55.2019.tde-26042019-082539
Resumo:	Technology advances have motivated the production and storage of large amounts of data and, consequently, the need for processing them out in order to support decision making. In this context, Deep Learning (DL) has emerged and provided major advances to solve complex supervised tasks through the direct manipulation of raw data content, such as images, audios and videos. Convolutional Neural Networks (CNN) are among the state-of-the-art strategies in DL, confirming relevant performance results in tasks of different domains. Currently, the design of CNN architectures is one of the major challenges involved in the practical use of DL, since it requires considerable knowledge about the application domain, linear and nonlinear algebraic transformations. Architectures are either manually designed, using empirical procedures, or with the support of evolutionary algorithms, an option that excessively consumes computational resources while analyzing candidate solutions. In addition to the architecture design, the possibility of overfitting has attracting the scientific community to study the effect of such complex models and whether they produce some memorization effect on training sets. Those two main challenges motivated this PhD thesis which brings up a proposal to support the automatic design of CNN architectures based on Dynamical System (DS) concepts. Initially, CNN architectures were algebraically formulated, allowing to take conclusions on the relationships of CNN input data organizations and spatial immersions from DS, leading to the development of an immersion tool called Image-based False Nearest Neighbors (IFNN). IFNN estimates the convolutional mask sizes and helps in the process of finding the adequate number of convolutional units per CNN layer by taking advantage of well-known effects caused by the reconstruction of phase spaces. This tool is based on the False-Nearest Neighbors (FNN) method, typically used to estimate the minimal embedding dimension to represent recurrence patterns of time series. Experiments confirm that architectures designed with the support of IFNN mostly usually produce results similar to deeper (and thus more complex) architectures. Based on those experiments, we concluded that IFNN supports the design of simpler, shallower (in the sense of depth) but yet efficient CNN architectures, which are faster to train and provide tighter learning guarantees according to the Statistical Learning Theory (SLT) thus requiring smaller training samples. Finally, the CNN architectures after IFNN were analyzed based on their Shattering coefficients in attempt to verify their relative complexities, and most essentially the cardinalities of their spaces of admissible functions, a.k.a. biases.

Metadados do item

id	USP_733a72a829ae8bcf624f41eae066b6af
oai_identifier_str	oai:teses.usp.br:tde-26042019-082539
network_acronym_str	USP
network_name_str	Biblioteca Digital de Teses e Dissertações da USP
repository_id_str
spelling	info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesis Designing convolutional neural network architectures based on dynamical system concepts Projeto de arquiteturas de redes neurais artificiais convolucionais com o apoio de conceitos oriundos da área de sistemas dinâmicos 2019-02-26Rodrigo Fernandes de MelloFabio Gagliardi CozmanAna Carolina LorenaHélio PedriniMartha Dais FerreiraUniversidade de São PauloCiências da Computação e Matemática ComputacionalUSPBR Aprendizado profundo Convolutional neural networks Deep learning Dynamical systems Falsos vizinhos mais próximos em imagens Image-based False nearest neighbors Redes neurais convolucionais Sistemas dinâmicos Statistical learning theory Teoria do aprendizado estatístico Technology advances have motivated the production and storage of large amounts of data and, consequently, the need for processing them out in order to support decision making. In this context, Deep Learning (DL) has emerged and provided major advances to solve complex supervised tasks through the direct manipulation of raw data content, such as images, audios and videos. Convolutional Neural Networks (CNN) are among the state-of-the-art strategies in DL, confirming relevant performance results in tasks of different domains. Currently, the design of CNN architectures is one of the major challenges involved in the practical use of DL, since it requires considerable knowledge about the application domain, linear and nonlinear algebraic transformations. Architectures are either manually designed, using empirical procedures, or with the support of evolutionary algorithms, an option that excessively consumes computational resources while analyzing candidate solutions. In addition to the architecture design, the possibility of overfitting has attracting the scientific community to study the effect of such complex models and whether they produce some memorization effect on training sets. Those two main challenges motivated this PhD thesis which brings up a proposal to support the automatic design of CNN architectures based on Dynamical System (DS) concepts. Initially, CNN architectures were algebraically formulated, allowing to take conclusions on the relationships of CNN input data organizations and spatial immersions from DS, leading to the development of an immersion tool called Image-based False Nearest Neighbors (IFNN). IFNN estimates the convolutional mask sizes and helps in the process of finding the adequate number of convolutional units per CNN layer by taking advantage of well-known effects caused by the reconstruction of phase spaces. This tool is based on the False-Nearest Neighbors (FNN) method, typically used to estimate the minimal embedding dimension to represent recurrence patterns of time series. Experiments confirm that architectures designed with the support of IFNN mostly usually produce results similar to deeper (and thus more complex) architectures. Based on those experiments, we concluded that IFNN supports the design of simpler, shallower (in the sense of depth) but yet efficient CNN architectures, which are faster to train and provide tighter learning guarantees according to the Statistical Learning Theory (SLT) thus requiring smaller training samples. Finally, the CNN architectures after IFNN were analyzed based on their Shattering coefficients in attempt to verify their relative complexities, and most essentially the cardinalities of their spaces of admissible functions, a.k.a. biases. Avanços tecnológicos têm permitido e motivado a produção e o armazenamento de grandes volumes de dados e, consequentemente, a necessidade de processamento a fim de obter informações que apoiem processos de tomada de decisão. Neste contexto, a área de Aprendizado Profundo (DL) tem apoiado a resolução de problemas complexos por meio da extração de características implícitas em conteúdos, tais como de imagens, áudios e vídeos, para produzir bons classificadores e regressores. Redes Neurais Convolucionais (CNN) estão entre as estratégias consideradas estado da arte em DL, apresentando ótimo desempenho em tarefas de diferentes domínios. O projeto de arquiteturas de CNNs é um dos maiores desafios envolvidos no uso dessa tecnologia, pois requer considerável conhecimento sobre o domínio da aplicação, bem como sobre transformações algébricas lineares e não-lineares. Atualmente, esses projetos são realizados de forma manual, contando portanto com procedimentos empíricos, ou por meio de algoritmos evolutivos que analisam diferentes arquiteturas candidatas, opção que excessivamente consome recursos computacionais. Em meio ao projeto, surge ainda outra questão que tem atraído a comunidade científica, ela se refere ao uso de arquiteturas profundas e suas relações com overfitting, o qual produz memorização dos exemplos de treinamento e, portanto, degradação no processo de aprendizado. Esses dois principais desafios motivaram esta tese de doutorado a trazer uma discussão e uma proposta de abordagem para auxiliar no projeto de arquiteturas de CNNs, bem como permitir a compreensão algébrica de suas operações. Inicialmente, as arquiteturas de CNN foram algebricamente formuladas, o que permitiu concluir que suas relações de imersão espaciais são similares às empregadas pela área de Sistemas Dinâmicos (DS), levando ao desenvolvimento de uma ferramenta de imersão denominada Falsos Vizinhos mais Próximos em Imagem (IFNN). IFNN estima o tamanho das máscaras convolucionais e auxilia na estimação do números de unidades para cada camada de uma arquitetura CNN, a partir do efeito causado pela reconstrução de espaços fase. Essa ferramenta é motivada por outra denominada Falsos Vizinhos mais Próximos (FNN), a qual estima a dimensão de incorporação mínima necessária para representar padrões recorrentes em dados com dependências temporais. Experimentos confirmam que as arquiteturas projetadas com o auxílio da IFNN produziram resultados similares aos reportados por arquiteturas profundas e muito mais complexas. Com base nesses experimentos, conclui-se que a IFNN auxilia no projeto de arquiteturas mais simples, rasas (no sentido de menor profundidade) e eficientes, as quais são mais rapidamente treinadas e fornecem garantias mais justas de aprendizado (necessitam de menor tamanho para as amostras de treinamento). Por fim, as arquiteturas obtidas com o apoio da IFNN foram analisadas com base em seus coeficientes de Shattering a fim de verificar suas complexidades relativas, essencialmente a cardinalidade de seus viéses. https://doi.org/10.11606/T.55.2019.tde-26042019-082539info:eu-repo/semantics/openAccessengreponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USP2023-12-21T20:11:43Zoai:teses.usp.br:tde-26042019-082539Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.bropendoar:27212019-11-08T23:48:42Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.en.fl_str_mv	Designing convolutional neural network architectures based on dynamical system concepts
dc.title.alternative.pt.fl_str_mv	Projeto de arquiteturas de redes neurais artificiais convolucionais com o apoio de conceitos oriundos da área de sistemas dinâmicos
title	Designing convolutional neural network architectures based on dynamical system concepts
spellingShingle	Designing convolutional neural network architectures based on dynamical system concepts Martha Dais Ferreira
title_short	Designing convolutional neural network architectures based on dynamical system concepts
title_full	Designing convolutional neural network architectures based on dynamical system concepts
title_fullStr	Designing convolutional neural network architectures based on dynamical system concepts
title_full_unstemmed	Designing convolutional neural network architectures based on dynamical system concepts
title_sort	Designing convolutional neural network architectures based on dynamical system concepts
author	Martha Dais Ferreira
author_facet	Martha Dais Ferreira
author_role	author
dc.contributor.advisor1.fl_str_mv	Rodrigo Fernandes de Mello
dc.contributor.referee1.fl_str_mv	Fabio Gagliardi Cozman
dc.contributor.referee2.fl_str_mv	Ana Carolina Lorena
dc.contributor.referee3.fl_str_mv	Hélio Pedrini
dc.contributor.author.fl_str_mv	Martha Dais Ferreira
contributor_str_mv	Rodrigo Fernandes de Mello Fabio Gagliardi Cozman Ana Carolina Lorena Hélio Pedrini
description	Technology advances have motivated the production and storage of large amounts of data and, consequently, the need for processing them out in order to support decision making. In this context, Deep Learning (DL) has emerged and provided major advances to solve complex supervised tasks through the direct manipulation of raw data content, such as images, audios and videos. Convolutional Neural Networks (CNN) are among the state-of-the-art strategies in DL, confirming relevant performance results in tasks of different domains. Currently, the design of CNN architectures is one of the major challenges involved in the practical use of DL, since it requires considerable knowledge about the application domain, linear and nonlinear algebraic transformations. Architectures are either manually designed, using empirical procedures, or with the support of evolutionary algorithms, an option that excessively consumes computational resources while analyzing candidate solutions. In addition to the architecture design, the possibility of overfitting has attracting the scientific community to study the effect of such complex models and whether they produce some memorization effect on training sets. Those two main challenges motivated this PhD thesis which brings up a proposal to support the automatic design of CNN architectures based on Dynamical System (DS) concepts. Initially, CNN architectures were algebraically formulated, allowing to take conclusions on the relationships of CNN input data organizations and spatial immersions from DS, leading to the development of an immersion tool called Image-based False Nearest Neighbors (IFNN). IFNN estimates the convolutional mask sizes and helps in the process of finding the adequate number of convolutional units per CNN layer by taking advantage of well-known effects caused by the reconstruction of phase spaces. This tool is based on the False-Nearest Neighbors (FNN) method, typically used to estimate the minimal embedding dimension to represent recurrence patterns of time series. Experiments confirm that architectures designed with the support of IFNN mostly usually produce results similar to deeper (and thus more complex) architectures. Based on those experiments, we concluded that IFNN supports the design of simpler, shallower (in the sense of depth) but yet efficient CNN architectures, which are faster to train and provide tighter learning guarantees according to the Statistical Learning Theory (SLT) thus requiring smaller training samples. Finally, the CNN architectures after IFNN were analyzed based on their Shattering coefficients in attempt to verify their relative complexities, and most essentially the cardinalities of their spaces of admissible functions, a.k.a. biases.
publishDate	2019
dc.date.issued.fl_str_mv	2019-02-26
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://doi.org/10.11606/T.55.2019.tde-26042019-082539
url	https://doi.org/10.11606/T.55.2019.tde-26042019-082539
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Universidade de São Paulo
dc.publisher.program.fl_str_mv	Ciências da Computação e Matemática Computacional
dc.publisher.initials.fl_str_mv	USP
dc.publisher.country.fl_str_mv	BR
publisher.none.fl_str_mv	Universidade de São Paulo
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP
instname_str	Universidade de São Paulo (USP)
instacron_str	USP
institution	USP
reponame_str	Biblioteca Digital de Teses e Dissertações da USP
collection	Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv	virginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.br
_version_	1786377154902097920

Designing convolutional neural network architectures based on dynamical system concepts

Registros relacionados