Scan and join operators for asymmetric media

Detalhes bibliográficos
Ano de defesa: 2019
Autor(a) principal: Alencar, Namom Alves
Orientador(a): Monteiro Filho, José Maria
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Não Informado pela instituição
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://www.repositorio.ufc.br/handle/riufc/50777
Resumo: Solid State Drive (SSD) has become an attractive alternative for storing large databases. SSDs do not present mechanical parts in their assembly. Consequently, SSD has different characteristics and capabilities than that of Hard Disk Drive (HDD). The computer industry is moving towards the construction in large scale of chips with hundreds of cores in order to increase on-chip parallelism. One of the most important features of SSDs is the fact that they implement different levels of internal parallelism for executing read/write operations. Computers with SSD that provides petabytes of storage area is emerging. Nonetheless, database systems were designed based upon two premises. The rst one is the usage of HDD for storing databases. The second premise is that distributed database systems could scale beyond what a single-node Database Management System (DBMS) can support. However, the latter premise only holds for a small number of CPU cores in a node and for a limited number of nodes. Thus, to fully exploit benets provided by the parallelism and high Input/Output Operations Per Second (IOPS) rates supported by many-core machines with SSDs, database systems should be aware of upcoming CPUarchitectures and storage technologies. Thus, this research claims that to take full prot from SSD characteristics, DBMS's components should be aware of read/write asymmetry in SSD devices. It is well-known that the join operation is the query operator which requires the highest amount of accesses (read/write operations) to the secondary memory. This dissertation presents new scan algorithm and a new join algorithm, called respectively Divide and Conquer Scan (DaC Scan) and Divide and Conquer Join (DaC Join). The key goal of these algorithms are take advantage of the SSD's internal parallelism devices, DaC Join also reduces the amount of write operations during the execution of any join operation R S. By making less writes, we intend to extend the lifetime of SSD media by requiring less main memory space. Furthermore, the proposed operators are evaluated by, effectiveness and ef ciency, measured experiments on a database with the TPCH benchmark. The achieved results have shown that the proposed algorithms are quite efcient. For instance, DaC Join can reduce up to 77% of the amount of write operations w.r.t. and the number of write operations presented by Flash join (TSIROGIANNIS et al., 2009; GRAEFE; HARIZOPOULOS, 2010), and, consequently, it can be up to 61% faster than Flash join.
id UFC-7_d5e051883e2bff588587d8765be5f63d
oai_identifier_str oai:repositorio.ufc.br:riufc/50777
network_acronym_str UFC-7
network_name_str Repositório Institucional da Universidade Federal do Ceará (UFC)
repository_id_str
spelling Alencar, Namom AlvesBrayner, Ângelo AlencarMonteiro Filho, José Maria2020-03-18T17:46:36Z2020-03-18T17:46:36Z2019ALENCAR, Namom Alves. Scan and join operators for asymmetric media. 2019. 108 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2019.http://www.repositorio.ufc.br/handle/riufc/50777Solid State Drive (SSD) has become an attractive alternative for storing large databases. SSDs do not present mechanical parts in their assembly. Consequently, SSD has different characteristics and capabilities than that of Hard Disk Drive (HDD). The computer industry is moving towards the construction in large scale of chips with hundreds of cores in order to increase on-chip parallelism. One of the most important features of SSDs is the fact that they implement different levels of internal parallelism for executing read/write operations. Computers with SSD that provides petabytes of storage area is emerging. Nonetheless, database systems were designed based upon two premises. The rst one is the usage of HDD for storing databases. The second premise is that distributed database systems could scale beyond what a single-node Database Management System (DBMS) can support. However, the latter premise only holds for a small number of CPU cores in a node and for a limited number of nodes. Thus, to fully exploit benets provided by the parallelism and high Input/Output Operations Per Second (IOPS) rates supported by many-core machines with SSDs, database systems should be aware of upcoming CPUarchitectures and storage technologies. Thus, this research claims that to take full prot from SSD characteristics, DBMS's components should be aware of read/write asymmetry in SSD devices. It is well-known that the join operation is the query operator which requires the highest amount of accesses (read/write operations) to the secondary memory. This dissertation presents new scan algorithm and a new join algorithm, called respectively Divide and Conquer Scan (DaC Scan) and Divide and Conquer Join (DaC Join). The key goal of these algorithms are take advantage of the SSD's internal parallelism devices, DaC Join also reduces the amount of write operations during the execution of any join operation R S. By making less writes, we intend to extend the lifetime of SSD media by requiring less main memory space. Furthermore, the proposed operators are evaluated by, effectiveness and ef ciency, measured experiments on a database with the TPCH benchmark. The achieved results have shown that the proposed algorithms are quite efcient. For instance, DaC Join can reduce up to 77% of the amount of write operations w.r.t. and the number of write operations presented by Flash join (TSIROGIANNIS et al., 2009; GRAEFE; HARIZOPOULOS, 2010), and, consequently, it can be up to 61% faster than Flash join.Memorias de estado sólido (Solid State Drive (SSD)), se tornaram uma realidade para armazenamento de grandes bases de dados. SSDs não possuem partes mecânicas em sua composição. Consequentemente, é dotado de características e capacidades diferentes quando comparados com Discos Rígidos (Hard Disk Drive (HDD)). A indústria da computação está melhorando, cada vez mais, o paralelismo interno dos circuitos integrados com a fabricação em larga escala de processadores com centenas e centenas de núcleos. Uma das características mais importantes dos SSDs é que eles possuem diferentes níveis de paralelismo interno para a execução de operações de leitura e escrita. Estão surgindo computadores com SSD que possuem petabytes de capacidade de armazenamento. No entanto, os sistemas de banco de dados foram projetados com base em duas premissas. Primeiro, computadores usam HDDs para armazenar seus bancos de dados. A segunda premissa é que os sistemas de banco de dados distribuídos podem ser dimensionados para mais de uma única instância de um Sistema Gerenciador de Bancos de Dados (SGBD). Entretanto, a última premissa somente considera um pequeno número de núcleos por CPU e um número limitado de instâncias. Assim, para tirar o máximo proveito dos benefícios fornecidos pela paralelização e pelas altas taxas de operações por segundo (IOPS (Input/Output Operations Per Second)) fornecidas por máquinas de muitos núcleos com dispositivos SSDs, os sistemas de banco de dados devem estar preparados para as futuras arquiteturas de processadores e de armazenamento. Baseado nisto, esta pesquisa defende que, para tirar o máximo de proveito das características dos SSDs, componentes do SGBD devem ser cientes da assimetria entre leitura/escrita. A junção é o operador de consulta que requer a maior quantidade de acessos (operações de leitura/escrita) à memória secundária. Esta dissertação apresenta um novo algorítimo de leitura e de junção, chamados respectivamente de DaC Scan e DaC Join. O objetivo principal destes algoritmos é explorar ao máximo o paralelismo interno dos dispositivos SSDs, DaCJoin, também, é capaz de reduzir a quantidade de operações de escrita durante sua execução de uma operação de junção entre R S. Ao realizarmos menos escritas em memória secundária, estendemos a vida útil do dispositivo e utilizamos menos espaço de memória principal. Os experimentos foram realizados em banco de dados com o benchmark TPCH e os operadores propostos foram analisados em duas perspectivas, ecácia e eciência. Os resultados obtidos mostraram que os algoritmos propostos são bastante ecientes. DaC Join conseguiu reduzir em cerca de 77% o número de operações de escrita w.r.t. quando comparado com os números apresentados pelo Flash join (TSIROGIANNIS et al., 2009; GRAEFE; HARIZOPOULOS, 2010) e, consequentemente, mostrou-se ser cerca de 61% mais rápido.Solid state memoryDatabase query processingParallel join operatorParallel scan operatorMemória de estado sólidoProcessamento de consulta de banco de dadosOperador de junção paralelaOperador de leitura paralelaScan and join operators for asymmetric mediaScan and join operators for asymmetric mediainfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisengreponame:Repositório Institucional da Universidade Federal do Ceará (UFC)instname:Universidade Federal do Ceará (UFC)instacron:UFCinfo:eu-repo/semantics/openAccessORIGINAL2019_dis_naalencar.pdf2019_dis_naalencar.pdfapplication/pdf6808536http://repositorio.ufc.br/bitstream/riufc/50777/3/2019_dis_naalencar.pdf0aaea6d05dde8b84abcb1c3459f47449MD53LICENSElicense.txtlicense.txttext/plain; charset=utf-81748http://repositorio.ufc.br/bitstream/riufc/50777/4/license.txt8a4605be74aa9ea9d79846c1fba20a33MD54riufc/507772020-03-18 14:46:36.542oai:repositorio.ufc.br:riufc/50777Tk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=Repositório InstitucionalPUBhttp://www.repositorio.ufc.br/ri-oai/requestbu@ufc.br || repositorio@ufc.bropendoar:2020-03-18T17:46:36Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)false
dc.title.pt_BR.fl_str_mv Scan and join operators for asymmetric media
dc.title.en.pt_BR.fl_str_mv Scan and join operators for asymmetric media
title Scan and join operators for asymmetric media
spellingShingle Scan and join operators for asymmetric media
Alencar, Namom Alves
Solid state memory
Database query processing
Parallel join operator
Parallel scan operator
Memória de estado sólido
Processamento de consulta de banco de dados
Operador de junção paralela
Operador de leitura paralela
title_short Scan and join operators for asymmetric media
title_full Scan and join operators for asymmetric media
title_fullStr Scan and join operators for asymmetric media
title_full_unstemmed Scan and join operators for asymmetric media
title_sort Scan and join operators for asymmetric media
author Alencar, Namom Alves
author_facet Alencar, Namom Alves
author_role author
dc.contributor.co-advisor.none.fl_str_mv Brayner, Ângelo Alencar
dc.contributor.author.fl_str_mv Alencar, Namom Alves
dc.contributor.advisor1.fl_str_mv Monteiro Filho, José Maria
contributor_str_mv Monteiro Filho, José Maria
dc.subject.por.fl_str_mv Solid state memory
Database query processing
Parallel join operator
Parallel scan operator
Memória de estado sólido
Processamento de consulta de banco de dados
Operador de junção paralela
Operador de leitura paralela
topic Solid state memory
Database query processing
Parallel join operator
Parallel scan operator
Memória de estado sólido
Processamento de consulta de banco de dados
Operador de junção paralela
Operador de leitura paralela
description Solid State Drive (SSD) has become an attractive alternative for storing large databases. SSDs do not present mechanical parts in their assembly. Consequently, SSD has different characteristics and capabilities than that of Hard Disk Drive (HDD). The computer industry is moving towards the construction in large scale of chips with hundreds of cores in order to increase on-chip parallelism. One of the most important features of SSDs is the fact that they implement different levels of internal parallelism for executing read/write operations. Computers with SSD that provides petabytes of storage area is emerging. Nonetheless, database systems were designed based upon two premises. The rst one is the usage of HDD for storing databases. The second premise is that distributed database systems could scale beyond what a single-node Database Management System (DBMS) can support. However, the latter premise only holds for a small number of CPU cores in a node and for a limited number of nodes. Thus, to fully exploit benets provided by the parallelism and high Input/Output Operations Per Second (IOPS) rates supported by many-core machines with SSDs, database systems should be aware of upcoming CPUarchitectures and storage technologies. Thus, this research claims that to take full prot from SSD characteristics, DBMS's components should be aware of read/write asymmetry in SSD devices. It is well-known that the join operation is the query operator which requires the highest amount of accesses (read/write operations) to the secondary memory. This dissertation presents new scan algorithm and a new join algorithm, called respectively Divide and Conquer Scan (DaC Scan) and Divide and Conquer Join (DaC Join). The key goal of these algorithms are take advantage of the SSD's internal parallelism devices, DaC Join also reduces the amount of write operations during the execution of any join operation R S. By making less writes, we intend to extend the lifetime of SSD media by requiring less main memory space. Furthermore, the proposed operators are evaluated by, effectiveness and ef ciency, measured experiments on a database with the TPCH benchmark. The achieved results have shown that the proposed algorithms are quite efcient. For instance, DaC Join can reduce up to 77% of the amount of write operations w.r.t. and the number of write operations presented by Flash join (TSIROGIANNIS et al., 2009; GRAEFE; HARIZOPOULOS, 2010), and, consequently, it can be up to 61% faster than Flash join.
publishDate 2019
dc.date.issued.fl_str_mv 2019
dc.date.accessioned.fl_str_mv 2020-03-18T17:46:36Z
dc.date.available.fl_str_mv 2020-03-18T17:46:36Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv ALENCAR, Namom Alves. Scan and join operators for asymmetric media. 2019. 108 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2019.
dc.identifier.uri.fl_str_mv http://www.repositorio.ufc.br/handle/riufc/50777
identifier_str_mv ALENCAR, Namom Alves. Scan and join operators for asymmetric media. 2019. 108 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2019.
url http://www.repositorio.ufc.br/handle/riufc/50777
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositório Institucional da Universidade Federal do Ceará (UFC)
instname:Universidade Federal do Ceará (UFC)
instacron:UFC
instname_str Universidade Federal do Ceará (UFC)
instacron_str UFC
institution UFC
reponame_str Repositório Institucional da Universidade Federal do Ceará (UFC)
collection Repositório Institucional da Universidade Federal do Ceará (UFC)
bitstream.url.fl_str_mv http://repositorio.ufc.br/bitstream/riufc/50777/3/2019_dis_naalencar.pdf
http://repositorio.ufc.br/bitstream/riufc/50777/4/license.txt
bitstream.checksum.fl_str_mv 0aaea6d05dde8b84abcb1c3459f47449
8a4605be74aa9ea9d79846c1fba20a33
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)
repository.mail.fl_str_mv bu@ufc.br || repositorio@ufc.br
_version_ 1847793402796572672