An investigation on deep reinforcement learning algorithms for resource management and workload scheduling

Abner Sousa Nascimento

An investigation on deep reinforcement learning algorithms for resource management and workload scheduling

Detalhes bibliográficos
Ano de defesa:	2022
Autor(a) principal:	Abner Sousa Nascimento
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	eng
Instituição de defesa:	Universidade Federal de Minas Gerais
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Computação – Teses Aprendizado do computador – Teses Aprendizado profundo – Teses Computação de alto desempenho – Teses Aprendizado por reforço – Teses reinforcement learning deep learning high power computer clusters workload management resource management
Link de acesso:	https://hdl.handle.net/1843/84048
Resumo:	Efficiency is a key operational requirement for most computer systems, given that the resources necessary to such processes are usually subjected to constraints in availability. It's desirable that computing clusters operate in order to complete as many tasks as possible while making the most of hardware assets, for example, CPU and memory. In this context, the temporal ordering of the jobs submitted to a cluster can interfere in its capacity to function at maximum use. It is, thus, important that such tasks are scheduled properly to ensure efficiency. Several algorithms and techniques, both principled and learning-based, can be applied to this problem, but the goal-oriented nature of reinforcement learning powered by the use of deep neural networks can help deal with the particularities and complexities of it robustly. In this work, we investigate the usage of deep reinforcement learning techniques for job allocation in computing clusters, applying hyperparameter search and comparing the performance and training stability of the learning-based solutions with previously designed algorithms for a target metric. We found that it is possible to obtain equal or better performance under the right environmental conditions within the appropriate parametric domain. Results also indicate that such agents can achieve better generalization if trained in a graduated difficulty set-up, with increasingly challenging scenarios, instead of a random initialization approach that starts from a difficult configuration.

Metadados do item

id	UFMG_d0f4bc3275cfddf96d203d9f20a86119
oai_identifier_str	oai:repositorio.ufmg.br:1843/84048
network_acronym_str	UFMG
network_name_str	Repositório Institucional da UFMG
repository_id_str
spelling	An investigation on deep reinforcement learning algorithms for resource management and workload schedulingComputação – TesesAprendizado do computador – TesesAprendizado profundo – TesesComputação de alto desempenho – TesesAprendizado por reforço – Tesesreinforcement learningdeep learninghigh power computer clustersworkload managementresource managementEfficiency is a key operational requirement for most computer systems, given that the resources necessary to such processes are usually subjected to constraints in availability. It's desirable that computing clusters operate in order to complete as many tasks as possible while making the most of hardware assets, for example, CPU and memory. In this context, the temporal ordering of the jobs submitted to a cluster can interfere in its capacity to function at maximum use. It is, thus, important that such tasks are scheduled properly to ensure efficiency. Several algorithms and techniques, both principled and learning-based, can be applied to this problem, but the goal-oriented nature of reinforcement learning powered by the use of deep neural networks can help deal with the particularities and complexities of it robustly. In this work, we investigate the usage of deep reinforcement learning techniques for job allocation in computing clusters, applying hyperparameter search and comparing the performance and training stability of the learning-based solutions with previously designed algorithms for a target metric. We found that it is possible to obtain equal or better performance under the right environmental conditions within the appropriate parametric domain. Results also indicate that such agents can achieve better generalization if trained in a graduated difficulty set-up, with increasingly challenging scenarios, instead of a random initialization approach that starts from a difficult configuration.FAPESP - Fundação de Amparo à Pesquisa do Estado de São PauloUniversidade Federal de Minas Gerais2025-08-05T17:08:46Z2025-09-08T22:58:49Z2025-08-05T17:08:46Z2022-12-21info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/1843/84048enghttp://creativecommons.org/licenses/by-nd/3.0/pt/info:eu-repo/semantics/openAccessAbner Sousa Nascimentoreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMG2025-09-08T22:58:49Zoai:repositorio.ufmg.br:1843/84048Repositório InstitucionalPUBhttps://repositorio.ufmg.br/oairepositorio@ufmg.bropendoar:2025-09-08T22:58:49Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false
dc.title.none.fl_str_mv	An investigation on deep reinforcement learning algorithms for resource management and workload scheduling
title	An investigation on deep reinforcement learning algorithms for resource management and workload scheduling
spellingShingle	An investigation on deep reinforcement learning algorithms for resource management and workload scheduling Abner Sousa Nascimento Computação – Teses Aprendizado do computador – Teses Aprendizado profundo – Teses Computação de alto desempenho – Teses Aprendizado por reforço – Teses reinforcement learning deep learning high power computer clusters workload management resource management
title_short	An investigation on deep reinforcement learning algorithms for resource management and workload scheduling
title_full	An investigation on deep reinforcement learning algorithms for resource management and workload scheduling
title_fullStr	An investigation on deep reinforcement learning algorithms for resource management and workload scheduling
title_full_unstemmed	An investigation on deep reinforcement learning algorithms for resource management and workload scheduling
title_sort	An investigation on deep reinforcement learning algorithms for resource management and workload scheduling
author	Abner Sousa Nascimento
author_facet	Abner Sousa Nascimento
author_role	author
dc.contributor.author.fl_str_mv	Abner Sousa Nascimento
dc.subject.por.fl_str_mv	Computação – Teses Aprendizado do computador – Teses Aprendizado profundo – Teses Computação de alto desempenho – Teses Aprendizado por reforço – Teses reinforcement learning deep learning high power computer clusters workload management resource management
topic	Computação – Teses Aprendizado do computador – Teses Aprendizado profundo – Teses Computação de alto desempenho – Teses Aprendizado por reforço – Teses reinforcement learning deep learning high power computer clusters workload management resource management
description	Efficiency is a key operational requirement for most computer systems, given that the resources necessary to such processes are usually subjected to constraints in availability. It's desirable that computing clusters operate in order to complete as many tasks as possible while making the most of hardware assets, for example, CPU and memory. In this context, the temporal ordering of the jobs submitted to a cluster can interfere in its capacity to function at maximum use. It is, thus, important that such tasks are scheduled properly to ensure efficiency. Several algorithms and techniques, both principled and learning-based, can be applied to this problem, but the goal-oriented nature of reinforcement learning powered by the use of deep neural networks can help deal with the particularities and complexities of it robustly. In this work, we investigate the usage of deep reinforcement learning techniques for job allocation in computing clusters, applying hyperparameter search and comparing the performance and training stability of the learning-based solutions with previously designed algorithms for a target metric. We found that it is possible to obtain equal or better performance under the right environmental conditions within the appropriate parametric domain. Results also indicate that such agents can achieve better generalization if trained in a graduated difficulty set-up, with increasingly challenging scenarios, instead of a random initialization approach that starts from a difficult configuration.
publishDate	2022
dc.date.none.fl_str_mv	2022-12-21 2025-08-05T17:08:46Z 2025-09-08T22:58:49Z 2025-08-05T17:08:46Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://hdl.handle.net/1843/84048
url	https://hdl.handle.net/1843/84048
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	http://creativecommons.org/licenses/by-nd/3.0/pt/ info:eu-repo/semantics/openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nd/3.0/pt/
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal de Minas Gerais
publisher.none.fl_str_mv	Universidade Federal de Minas Gerais
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFMG instname:Universidade Federal de Minas Gerais (UFMG) instacron:UFMG
instname_str	Universidade Federal de Minas Gerais (UFMG)
instacron_str	UFMG
institution	UFMG
reponame_str	Repositório Institucional da UFMG
collection	Repositório Institucional da UFMG
repository.name.fl_str_mv	Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv	repositorio@ufmg.br
_version_	1856414115198664704

An investigation on deep reinforcement learning algorithms for resource management and workload scheduling

Registros relacionados