An investigation on deep reinforcement learning algorithms for resource management and workload scheduling
| Ano de defesa: | 2022 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Dissertação |
| Tipo de acesso: | Acesso aberto |
| Idioma: | eng |
| Instituição de defesa: |
Universidade Federal de Minas Gerais
|
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | https://hdl.handle.net/1843/84048 |
Resumo: | Efficiency is a key operational requirement for most computer systems, given that the resources necessary to such processes are usually subjected to constraints in availability. It's desirable that computing clusters operate in order to complete as many tasks as possible while making the most of hardware assets, for example, CPU and memory. In this context, the temporal ordering of the jobs submitted to a cluster can interfere in its capacity to function at maximum use. It is, thus, important that such tasks are scheduled properly to ensure efficiency. Several algorithms and techniques, both principled and learning-based, can be applied to this problem, but the goal-oriented nature of reinforcement learning powered by the use of deep neural networks can help deal with the particularities and complexities of it robustly. In this work, we investigate the usage of deep reinforcement learning techniques for job allocation in computing clusters, applying hyperparameter search and comparing the performance and training stability of the learning-based solutions with previously designed algorithms for a target metric. We found that it is possible to obtain equal or better performance under the right environmental conditions within the appropriate parametric domain. Results also indicate that such agents can achieve better generalization if trained in a graduated difficulty set-up, with increasingly challenging scenarios, instead of a random initialization approach that starts from a difficult configuration. |
| id |
UFMG_d0f4bc3275cfddf96d203d9f20a86119 |
|---|---|
| oai_identifier_str |
oai:repositorio.ufmg.br:1843/84048 |
| network_acronym_str |
UFMG |
| network_name_str |
Repositório Institucional da UFMG |
| repository_id_str |
|
| spelling |
An investigation on deep reinforcement learning algorithms for resource management and workload schedulingComputação – TesesAprendizado do computador – TesesAprendizado profundo – TesesComputação de alto desempenho – TesesAprendizado por reforço – Tesesreinforcement learningdeep learninghigh power computer clustersworkload managementresource managementEfficiency is a key operational requirement for most computer systems, given that the resources necessary to such processes are usually subjected to constraints in availability. It's desirable that computing clusters operate in order to complete as many tasks as possible while making the most of hardware assets, for example, CPU and memory. In this context, the temporal ordering of the jobs submitted to a cluster can interfere in its capacity to function at maximum use. It is, thus, important that such tasks are scheduled properly to ensure efficiency. Several algorithms and techniques, both principled and learning-based, can be applied to this problem, but the goal-oriented nature of reinforcement learning powered by the use of deep neural networks can help deal with the particularities and complexities of it robustly. In this work, we investigate the usage of deep reinforcement learning techniques for job allocation in computing clusters, applying hyperparameter search and comparing the performance and training stability of the learning-based solutions with previously designed algorithms for a target metric. We found that it is possible to obtain equal or better performance under the right environmental conditions within the appropriate parametric domain. Results also indicate that such agents can achieve better generalization if trained in a graduated difficulty set-up, with increasingly challenging scenarios, instead of a random initialization approach that starts from a difficult configuration.FAPESP - Fundação de Amparo à Pesquisa do Estado de São PauloUniversidade Federal de Minas Gerais2025-08-05T17:08:46Z2025-09-08T22:58:49Z2025-08-05T17:08:46Z2022-12-21info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/1843/84048enghttp://creativecommons.org/licenses/by-nd/3.0/pt/info:eu-repo/semantics/openAccessAbner Sousa Nascimentoreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMG2025-09-08T22:58:49Zoai:repositorio.ufmg.br:1843/84048Repositório InstitucionalPUBhttps://repositorio.ufmg.br/oairepositorio@ufmg.bropendoar:2025-09-08T22:58:49Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false |
| dc.title.none.fl_str_mv |
An investigation on deep reinforcement learning algorithms for resource management and workload scheduling |
| title |
An investigation on deep reinforcement learning algorithms for resource management and workload scheduling |
| spellingShingle |
An investigation on deep reinforcement learning algorithms for resource management and workload scheduling Abner Sousa Nascimento Computação – Teses Aprendizado do computador – Teses Aprendizado profundo – Teses Computação de alto desempenho – Teses Aprendizado por reforço – Teses reinforcement learning deep learning high power computer clusters workload management resource management |
| title_short |
An investigation on deep reinforcement learning algorithms for resource management and workload scheduling |
| title_full |
An investigation on deep reinforcement learning algorithms for resource management and workload scheduling |
| title_fullStr |
An investigation on deep reinforcement learning algorithms for resource management and workload scheduling |
| title_full_unstemmed |
An investigation on deep reinforcement learning algorithms for resource management and workload scheduling |
| title_sort |
An investigation on deep reinforcement learning algorithms for resource management and workload scheduling |
| author |
Abner Sousa Nascimento |
| author_facet |
Abner Sousa Nascimento |
| author_role |
author |
| dc.contributor.author.fl_str_mv |
Abner Sousa Nascimento |
| dc.subject.por.fl_str_mv |
Computação – Teses Aprendizado do computador – Teses Aprendizado profundo – Teses Computação de alto desempenho – Teses Aprendizado por reforço – Teses reinforcement learning deep learning high power computer clusters workload management resource management |
| topic |
Computação – Teses Aprendizado do computador – Teses Aprendizado profundo – Teses Computação de alto desempenho – Teses Aprendizado por reforço – Teses reinforcement learning deep learning high power computer clusters workload management resource management |
| description |
Efficiency is a key operational requirement for most computer systems, given that the resources necessary to such processes are usually subjected to constraints in availability. It's desirable that computing clusters operate in order to complete as many tasks as possible while making the most of hardware assets, for example, CPU and memory. In this context, the temporal ordering of the jobs submitted to a cluster can interfere in its capacity to function at maximum use. It is, thus, important that such tasks are scheduled properly to ensure efficiency. Several algorithms and techniques, both principled and learning-based, can be applied to this problem, but the goal-oriented nature of reinforcement learning powered by the use of deep neural networks can help deal with the particularities and complexities of it robustly. In this work, we investigate the usage of deep reinforcement learning techniques for job allocation in computing clusters, applying hyperparameter search and comparing the performance and training stability of the learning-based solutions with previously designed algorithms for a target metric. We found that it is possible to obtain equal or better performance under the right environmental conditions within the appropriate parametric domain. Results also indicate that such agents can achieve better generalization if trained in a graduated difficulty set-up, with increasingly challenging scenarios, instead of a random initialization approach that starts from a difficult configuration. |
| publishDate |
2022 |
| dc.date.none.fl_str_mv |
2022-12-21 2025-08-05T17:08:46Z 2025-09-08T22:58:49Z 2025-08-05T17:08:46Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/1843/84048 |
| url |
https://hdl.handle.net/1843/84048 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.rights.driver.fl_str_mv |
http://creativecommons.org/licenses/by-nd/3.0/pt/ info:eu-repo/semantics/openAccess |
| rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nd/3.0/pt/ |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.publisher.none.fl_str_mv |
Universidade Federal de Minas Gerais |
| publisher.none.fl_str_mv |
Universidade Federal de Minas Gerais |
| dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFMG instname:Universidade Federal de Minas Gerais (UFMG) instacron:UFMG |
| instname_str |
Universidade Federal de Minas Gerais (UFMG) |
| instacron_str |
UFMG |
| institution |
UFMG |
| reponame_str |
Repositório Institucional da UFMG |
| collection |
Repositório Institucional da UFMG |
| repository.name.fl_str_mv |
Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG) |
| repository.mail.fl_str_mv |
repositorio@ufmg.br |
| _version_ |
1856414115198664704 |