Energy savings and performance improvements with SSDs in the Hadoop Distributed File System
| Ano de defesa: | 2016 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Tese |
| Tipo de acesso: | Acesso aberto |
| Idioma: | eng |
| Instituição de defesa: |
Biblioteca Digitais de Teses e Dissertações da USP
|
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | http://www.teses.usp.br/teses/disponiveis/45/45134/tde-31102016-155908/ |
Resumo: | Energy issues gathered strong attention over the past decade, reaching IT data processing infrastructures. Now, they need to cope with such responsibility, adjusting existing platforms to reach acceptable performance while promoting energy consumption reduction. As the de facto platform for Big Data, Apache Hadoop has evolved significantly over the last years, with more than 60 releases bringing new features. By implementing the MapReduce programming paradigm and leveraging HDFS, its distributed file system, Hadoop has become a reliable and fault tolerant middleware for parallel and distributed computing over large datasets. Nevertheless, Hadoop may struggle under certain workloads, resulting in poor performance and high energy consumption. Users increasingly demand that high performance computing solutions address sustainability and limit energy consumption. In this thesis, we introduce HDFSH, a hybrid storage mechanism for HDFS, which uses a combination of Hard Disks and Solid-State Disks to achieve higher performance while saving power in Hadoop computations. HDFSH brings, to the middleware, the best from HDs (affordable cost per GB and high storage capacity) and SSDs (high throughput and low energy consumption) in a configurable fashion, using dedicated storage zones for each storage device type. We implemented our mechanism as a block placement policy for HDFS, and assessed it over six recent releases of Hadoop with different architectural properties. Results indicate that our approach increases overall job performance while decreasing the energy consumption under most hybrid configurations evaluated. Our results also showed that, in many cases, storing only part of the data in SSDs results in significant energy savings and execution speedups |
| id |
USP_154f5e405711be8e2aa134ea54ec5196 |
|---|---|
| oai_identifier_str |
oai:teses.usp.br:tde-31102016-155908 |
| network_acronym_str |
USP |
| network_name_str |
Biblioteca Digital de Teses e Dissertações da USP |
| repository_id_str |
|
| spelling |
Energy savings and performance improvements with SSDs in the Hadoop Distributed File SystemEconomia de energia e aumento de desempenho usando SSDs no Hadoop Distributed File SystemArmazenamento híbridoComputação verdeDiscos de estado sólidoDistributed file systemsEficiência energéticaEnergy efficiencyGreen computingHadoopHadoopHDFSHDFSHybrid storageParallel file systemsSistema de arquivos distribuídoSistemas de arquivos paraleloSolid-state diskSSDsSSDsEnergy issues gathered strong attention over the past decade, reaching IT data processing infrastructures. Now, they need to cope with such responsibility, adjusting existing platforms to reach acceptable performance while promoting energy consumption reduction. As the de facto platform for Big Data, Apache Hadoop has evolved significantly over the last years, with more than 60 releases bringing new features. By implementing the MapReduce programming paradigm and leveraging HDFS, its distributed file system, Hadoop has become a reliable and fault tolerant middleware for parallel and distributed computing over large datasets. Nevertheless, Hadoop may struggle under certain workloads, resulting in poor performance and high energy consumption. Users increasingly demand that high performance computing solutions address sustainability and limit energy consumption. In this thesis, we introduce HDFSH, a hybrid storage mechanism for HDFS, which uses a combination of Hard Disks and Solid-State Disks to achieve higher performance while saving power in Hadoop computations. HDFSH brings, to the middleware, the best from HDs (affordable cost per GB and high storage capacity) and SSDs (high throughput and low energy consumption) in a configurable fashion, using dedicated storage zones for each storage device type. We implemented our mechanism as a block placement policy for HDFS, and assessed it over six recent releases of Hadoop with different architectural properties. Results indicate that our approach increases overall job performance while decreasing the energy consumption under most hybrid configurations evaluated. Our results also showed that, in many cases, storing only part of the data in SSDs results in significant energy savings and execution speedupsAo longo da última década, questões energéticas atraíram forte atenção da sociedade, chegando às infraestruturas de TI para processamento de dados. Agora, essas infraestruturas devem se ajustar a essa responsabilidade, adequando plataformas existentes para alcançar desempenho aceitável enquanto promovem a redução no consumo de energia. Considerado um padrão para o processamento de Big Data, o Apache Hadoop tem evoluído significativamente ao longo dos últimos anos, com mais de 60 versões lançadas. Implementando o paradigma de programação MapReduce juntamente com o HDFS, seu sistema de arquivos distribuídos, o Hadoop tornou-se um middleware tolerante a falhas e confiável para a computação paralela e distribuída para grandes conjuntos de dados. No entanto, o Hadoop pode perder desempenho com determinadas cargas de trabalho, resultando em elevado consumo de energia. Cada vez mais, usuários exigem que a sustentabilidade e o consumo de energia controlado sejam parte intrínseca de soluções de computação de alto desempenho. Nesta tese, apresentamos o HDFSH, um sistema de armazenamento híbrido para o HDFS, que usa uma combinação de discos rígidos e discos de estado sólido para alcançar maior desempenho, promovendo economia de energia em aplicações usando Hadoop. O HDFSH traz ao middleware o melhor dos HDs (custo acessível por GB e grande capacidade de armazenamento) e SSDs (alto desempenho e baixo consumo de energia) de forma configurável, usando zonas de armazenamento dedicadas para cada dispositivo de armazenamento. Implementamos nosso mecanismo como uma política de alocação de blocos para o HDFS e o avaliamos em seis versões recentes do Hadoop com diferentes arquiteturas de software. Os resultados indicam que nossa abordagem aumenta o desempenho geral das aplicações, enquanto diminui o consumo de energia na maioria das configurações híbridas avaliadas. Os resultados também mostram que, em muitos casos, armazenar apenas uma parte dos dados em SSDs resulta em economia significativa de energia e aumento na velocidade de execuçãoBiblioteca Digitais de Teses e Dissertações da USPKon, FabioPolato, Ivanilton2016-08-29info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttp://www.teses.usp.br/teses/disponiveis/45/45134/tde-31102016-155908/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2018-10-02T20:03:01Zoai:teses.usp.br:tde-31102016-155908Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212018-10-02T20:03:01Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false |
| dc.title.none.fl_str_mv |
Energy savings and performance improvements with SSDs in the Hadoop Distributed File System Economia de energia e aumento de desempenho usando SSDs no Hadoop Distributed File System |
| title |
Energy savings and performance improvements with SSDs in the Hadoop Distributed File System |
| spellingShingle |
Energy savings and performance improvements with SSDs in the Hadoop Distributed File System Polato, Ivanilton Armazenamento híbrido Computação verde Discos de estado sólido Distributed file systems Eficiência energética Energy efficiency Green computing Hadoop Hadoop HDFS HDFS Hybrid storage Parallel file systems Sistema de arquivos distribuído Sistemas de arquivos paralelo Solid-state disk SSDs SSDs |
| title_short |
Energy savings and performance improvements with SSDs in the Hadoop Distributed File System |
| title_full |
Energy savings and performance improvements with SSDs in the Hadoop Distributed File System |
| title_fullStr |
Energy savings and performance improvements with SSDs in the Hadoop Distributed File System |
| title_full_unstemmed |
Energy savings and performance improvements with SSDs in the Hadoop Distributed File System |
| title_sort |
Energy savings and performance improvements with SSDs in the Hadoop Distributed File System |
| author |
Polato, Ivanilton |
| author_facet |
Polato, Ivanilton |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Kon, Fabio |
| dc.contributor.author.fl_str_mv |
Polato, Ivanilton |
| dc.subject.por.fl_str_mv |
Armazenamento híbrido Computação verde Discos de estado sólido Distributed file systems Eficiência energética Energy efficiency Green computing Hadoop Hadoop HDFS HDFS Hybrid storage Parallel file systems Sistema de arquivos distribuído Sistemas de arquivos paralelo Solid-state disk SSDs SSDs |
| topic |
Armazenamento híbrido Computação verde Discos de estado sólido Distributed file systems Eficiência energética Energy efficiency Green computing Hadoop Hadoop HDFS HDFS Hybrid storage Parallel file systems Sistema de arquivos distribuído Sistemas de arquivos paralelo Solid-state disk SSDs SSDs |
| description |
Energy issues gathered strong attention over the past decade, reaching IT data processing infrastructures. Now, they need to cope with such responsibility, adjusting existing platforms to reach acceptable performance while promoting energy consumption reduction. As the de facto platform for Big Data, Apache Hadoop has evolved significantly over the last years, with more than 60 releases bringing new features. By implementing the MapReduce programming paradigm and leveraging HDFS, its distributed file system, Hadoop has become a reliable and fault tolerant middleware for parallel and distributed computing over large datasets. Nevertheless, Hadoop may struggle under certain workloads, resulting in poor performance and high energy consumption. Users increasingly demand that high performance computing solutions address sustainability and limit energy consumption. In this thesis, we introduce HDFSH, a hybrid storage mechanism for HDFS, which uses a combination of Hard Disks and Solid-State Disks to achieve higher performance while saving power in Hadoop computations. HDFSH brings, to the middleware, the best from HDs (affordable cost per GB and high storage capacity) and SSDs (high throughput and low energy consumption) in a configurable fashion, using dedicated storage zones for each storage device type. We implemented our mechanism as a block placement policy for HDFS, and assessed it over six recent releases of Hadoop with different architectural properties. Results indicate that our approach increases overall job performance while decreasing the energy consumption under most hybrid configurations evaluated. Our results also showed that, in many cases, storing only part of the data in SSDs results in significant energy savings and execution speedups |
| publishDate |
2016 |
| dc.date.none.fl_str_mv |
2016-08-29 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
| format |
doctoralThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
http://www.teses.usp.br/teses/disponiveis/45/45134/tde-31102016-155908/ |
| url |
http://www.teses.usp.br/teses/disponiveis/45/45134/tde-31102016-155908/ |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
|
| dc.rights.driver.fl_str_mv |
Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess |
| rights_invalid_str_mv |
Liberar o conteúdo para acesso público. |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.coverage.none.fl_str_mv |
|
| dc.publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
| publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
| dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP |
| instname_str |
Universidade de São Paulo (USP) |
| instacron_str |
USP |
| institution |
USP |
| reponame_str |
Biblioteca Digital de Teses e Dissertações da USP |
| collection |
Biblioteca Digital de Teses e Dissertações da USP |
| repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP) |
| repository.mail.fl_str_mv |
virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br |
| _version_ |
1865491699638206464 |