Processamento eficiente de junção espacial em ambiente paralelo e distribuído baseado em Spatialhadoop

Mendes, Eduardo Fernando

Processamento eficiente de junção espacial em ambiente paralelo e distribuído baseado em Spatialhadoop

Detalhes bibliográficos
Ano de defesa:	2017
Autor(a) principal:	Mendes, Eduardo Fernando
Orientador(a):	Ciferri, Ricardo Rodrigues
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de São Carlos Câmpus São Carlos
Programa de Pós-Graduação:	Programa de Pós-Graduação em Ciência da Computação - PPGCC
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Banco de dados espaciais Processamento de consulta Junção espacial Processamento paralelo e distribuído Clusters de computadores
Palavras-chave em Inglês:	Spatial databases Query processing Spatial join Parallel and distributed processing Cluster computing
Área do conhecimento CNPq:	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
Link de acesso:	https://repositorio.ufscar.br/handle/20.500.14289/9168
Resumo:	The huge volume of spatial data generated and made available in recent years from different sources, such as remote sensing, smart phones, space telescopes, and satellites, has motivated researchers and practitioners around the world to find out a way to process efficiently this huge volume of spatial data. Systems based on the MapReduce programming paradigm, such as Hadoop, have proven to be an efficient framework for processing huge volumes of data in many applications. However, Hadoop has showed not to be adequate in native support for spatial data due to its central structure is not aware of the spatial characteristics of such data. The solution to this problem gave rise to SpatialHadoop, which is a Hadoop extension with native support for spatial data. However, SpatialHadoop does not enable to jointly allocate related spatial data and also does not take into account any characteristics of the data in the process of task scheduler for processing on the nodes of a cluster of computers. Given this scenario, this PhD dissertation aims to propose new strategies to improve the performance of the processing of the spatial join operations for huge volumes of data using SpatialHadoop. For this purpose, the proposed solutions explore the joint allocation of related spatial data and the scheduling strategy of MapReduce for related spatial data also allocated in a jointly form. The efficient data access is an essential step in achieving better performance during query processing. Therefore, the proposed solutions allow the reduction of network traffic and I/O operations to the disk and consequently improve the performance of spatial join processing by using SpatialHadoop. By means of experimental evaluations, it was possible to show that the novel data allocation policies and scheduling tasks actually improve the total processing time of the spatial join operations. The performance gain varied from 14.7% to 23.6% if compared to the baseline proposed by CoS-HDFS and varied from 8.3% to 65% if compared to the native support of SpatialHadoop.

Metadados do item

id	SCAR_6313a2b692b937c8168901b1e502d26f
oai_identifier_str	oai:repositorio.ufscar.br:20.500.14289/9168
network_acronym_str	SCAR
network_name_str	Repositório Institucional da UFSCAR
repository_id_str
spelling	Mendes, Eduardo FernandoCiferri, Ricardo Rodrigueshttp://lattes.cnpq.br/8382221522817502http://lattes.cnpq.br/4759269927514121e8738b6a-54bc-4e8d-b63b-64dd91844d9d2017-10-25T18:01:51Z2017-10-25T18:01:51Z2017-02-17MENDES, Eduardo Fernando. Processamento eficiente de junção espacial em ambiente paralelo e distribuído baseado em Spatialhadoop. 2017. Tese (Doutorado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2017. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/9168.https://repositorio.ufscar.br/handle/20.500.14289/9168The huge volume of spatial data generated and made available in recent years from different sources, such as remote sensing, smart phones, space telescopes, and satellites, has motivated researchers and practitioners around the world to find out a way to process efficiently this huge volume of spatial data. Systems based on the MapReduce programming paradigm, such as Hadoop, have proven to be an efficient framework for processing huge volumes of data in many applications. However, Hadoop has showed not to be adequate in native support for spatial data due to its central structure is not aware of the spatial characteristics of such data. The solution to this problem gave rise to SpatialHadoop, which is a Hadoop extension with native support for spatial data. However, SpatialHadoop does not enable to jointly allocate related spatial data and also does not take into account any characteristics of the data in the process of task scheduler for processing on the nodes of a cluster of computers. Given this scenario, this PhD dissertation aims to propose new strategies to improve the performance of the processing of the spatial join operations for huge volumes of data using SpatialHadoop. For this purpose, the proposed solutions explore the joint allocation of related spatial data and the scheduling strategy of MapReduce for related spatial data also allocated in a jointly form. The efficient data access is an essential step in achieving better performance during query processing. Therefore, the proposed solutions allow the reduction of network traffic and I/O operations to the disk and consequently improve the performance of spatial join processing by using SpatialHadoop. By means of experimental evaluations, it was possible to show that the novel data allocation policies and scheduling tasks actually improve the total processing time of the spatial join operations. The performance gain varied from 14.7% to 23.6% if compared to the baseline proposed by CoS-HDFS and varied from 8.3% to 65% if compared to the native support of SpatialHadoop.A explosão no volume de dados espaciais gerados e disponibilizados nos últimos anos, provenientes de diferentes fontes, por exemplo, sensoriamento remoto, telefones inteligentes, telescópios espaciais e satélites, motivaram pesquisadores e profissionais em todo o mundo a encontrar uma forma de processar de forma eficiente esse grande volume de dados espaciais. Sistemas baseados no paradigma de programação MapReduce, como exemplo Hadoop, provaram ser durante anos um framework eficiente para o processamento de enormes volumes de dados em muitas aplicações. No entanto, o Hadoop demonstrou não ser adequado no suporte nativo a dados espaciais devido a sua estrutura central não ter conhecimento das características espaciais desses dados. A solução para este problema deu origem ao SpatialHadoop, uma extensão do Hadoop, com suporte nativo para dados espaciais. Entretanto o SpatialHadoop não é capaz de alocar conjuntamente dados espaciais relacionados e também não leva em consideração qualquer característica dos dados no processo de escalonamento das tarefas para processamento nos nós de um cluster de computadores. Diante deste cenário, esta tese tem por objetivo propor novas estratégias para melhorar o desempenho do processamento das operações de junção espacial para grandes volumes de dados usando o SpatialHadoop. Para tanto, as soluções propostas exploram a alocação conjunta dos dados espaciais relacionados e a estratégia de escalonamento de tarefas MapReduce para dados espaciais relacionados também alocados de forma conjunta. Acredita-se que o acesso eficiente aos dados é um passo essencial para alcançar um melhor desempenho durante o processamento de consultas. Desta forma, as soluções propostas permitem a redução do tráfego de rede e operações de Entrada/Saída para o disco e consequentemente melhoram o desempenho no processamento de junção espacial usando SpatialHadoop. Por meio de testes de desempenho experimentais foi possível comprovar que as novas políticas de alocação de dados e escalonamento de tarefas de fato melhoram o tempo total de processamento das operações de junção espacial. O ganho de desempenho variou de 14,7% a 23,6% com relação ao baseline proposto por CoS-HDFS e variou de 8,3% a 65% com relação ao suporte nativo do SpatialHadoop.Não recebi financiamentoporUniversidade Federal de São CarlosCâmpus São CarlosPrograma de Pós-Graduação em Ciência da Computação - PPGCCUFSCarBanco de dados espaciaisProcessamento de consultaJunção espacialProcessamento paralelo e distribuídoClusters de computadoresSpatial databasesQuery processingSpatial joinParallel and distributed processingCluster computingCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOProcessamento eficiente de junção espacial em ambiente paralelo e distribuído baseado em Spatialhadoopinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisOnline6006003b1d5172-8bf0-4d0b-8777-ab82599bbf09info:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINALTeseEFM.pdfTeseEFM.pdfapplication/pdf31334481https://repositorio.ufscar.br/bitstreams/ecd144f9-c124-4c02-ade7-9a814682a215/download966afb8a981794db0aee3bc97ee11d5bMD51trueAnonymousREADLICENSElicense.txtlicense.txttext/plain; charset=utf-81957https://repositorio.ufscar.br/bitstreams/2694be0f-f3d1-47bd-ad46-4650ccf64477/downloadae0398b6f8b235e40ad82cba6c50031dMD52falseAnonymousREADTEXTTeseEFM.pdf.txtTeseEFM.pdf.txtExtracted texttext/plain334743https://repositorio.ufscar.br/bitstreams/c2d3153d-a8e9-4ccd-8495-04714084686c/download123143096ebb6d0b59b509b162510d59MD55falseAnonymousREADTHUMBNAILTeseEFM.pdf.jpgTeseEFM.pdf.jpgIM Thumbnailimage/jpeg6636https://repositorio.ufscar.br/bitstreams/3b506c27-f45a-4d40-a2c2-8adb44121c3e/download74b61db2537aa673c2e7dcf177e8e011MD56falseAnonymousREAD20.500.14289/91682025-02-05 18:59:36.77Acesso abertoopen.accessoai:repositorio.ufscar.br:20.500.14289/9168https://repositorio.ufscar.brRepositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestrepositorio.sibi@ufscar.bropendoar:43222025-02-05T21:59:36Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)falseTElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkNvbSBhIGFwcmVzZW50YcOnw6NvIGRlc3RhIGxpY2Vuw6dhLCB2b2PDqiAobyBhdXRvciAoZXMpIG91IG8gdGl0dWxhciBkb3MgZGlyZWl0b3MgZGUgYXV0b3IpIGNvbmNlZGUgw6AgVW5pdmVyc2lkYWRlCkZlZGVyYWwgZGUgU8OjbyBDYXJsb3MgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsICB0cmFkdXppciAoY29uZm9ybWUgZGVmaW5pZG8gYWJhaXhvKSwgZS9vdQpkaXN0cmlidWlyIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyAoaW5jbHVpbmRvIG8gcmVzdW1vKSBwb3IgdG9kbyBvIG11bmRvIG5vIGZvcm1hdG8gaW1wcmVzc28gZSBlbGV0csO0bmljbyBlCmVtIHF1YWxxdWVyIG1laW8sIGluY2x1aW5kbyBvcyBmb3JtYXRvcyDDoXVkaW8gb3UgdsOtZGVvLgoKVm9jw6ogY29uY29yZGEgcXVlIGEgVUZTQ2FyIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28KcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBhIFVGU0NhciBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgYSBzdWEgdGVzZSBvdQpkaXNzZXJ0YcOnw6NvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcwpuZXN0YSBsaWNlbsOnYS4gVm9jw6ogdGFtYsOpbSBkZWNsYXJhIHF1ZSBvIGRlcMOzc2l0byBkYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIG7Do28sIHF1ZSBzZWphIGRlIHNldQpjb25oZWNpbWVudG8sIGluZnJpbmdlIGRpcmVpdG9zIGF1dG9yYWlzIGRlIG5pbmd1w6ltLgoKQ2FzbyBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gY29udGVuaGEgbWF0ZXJpYWwgcXVlIHZvY8OqIG7Do28gcG9zc3VpIGEgdGl0dWxhcmlkYWRlIGRvcyBkaXJlaXRvcyBhdXRvcmFpcywgdm9jw6oKZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc8OjbyBpcnJlc3RyaXRhIGRvIGRldGVudG9yIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBwYXJhIGNvbmNlZGVyIMOgIFVGU0NhcgpvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUKaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBURVNFIE9VIERJU1NFUlRBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UKQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBVRlNDYXIsClZPQ8OKIERFQ0xBUkEgUVVFIFJFU1BFSVRPVSBUT0RPUyBFIFFVQUlTUVVFUiBESVJFSVRPUyBERSBSRVZJU8ODTyBDT01PClRBTULDiU0gQVMgREVNQUlTIE9CUklHQcOHw5VFUyBFWElHSURBUyBQT1IgQ09OVFJBVE8gT1UgQUNPUkRPLgoKQSBVRlNDYXIgc2UgY29tcHJvbWV0ZSBhIGlkZW50aWZpY2FyIGNsYXJhbWVudGUgbyBzZXUgbm9tZSAocykgb3UgbyhzKSBub21lKHMpIGRvKHMpCmRldGVudG9yKGVzKSBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzCmNvbmNlZGlkYXMgcG9yIGVzdGEgbGljZW7Dp2EuCg==
dc.title.por.fl_str_mv	Processamento eficiente de junção espacial em ambiente paralelo e distribuído baseado em Spatialhadoop
title	Processamento eficiente de junção espacial em ambiente paralelo e distribuído baseado em Spatialhadoop
spellingShingle	Processamento eficiente de junção espacial em ambiente paralelo e distribuído baseado em Spatialhadoop Mendes, Eduardo Fernando Banco de dados espaciais Processamento de consulta Junção espacial Processamento paralelo e distribuído Clusters de computadores Spatial databases Query processing Spatial join Parallel and distributed processing Cluster computing CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short	Processamento eficiente de junção espacial em ambiente paralelo e distribuído baseado em Spatialhadoop
title_full	Processamento eficiente de junção espacial em ambiente paralelo e distribuído baseado em Spatialhadoop
title_fullStr	Processamento eficiente de junção espacial em ambiente paralelo e distribuído baseado em Spatialhadoop
title_full_unstemmed	Processamento eficiente de junção espacial em ambiente paralelo e distribuído baseado em Spatialhadoop
title_sort	Processamento eficiente de junção espacial em ambiente paralelo e distribuído baseado em Spatialhadoop
author	Mendes, Eduardo Fernando
author_facet	Mendes, Eduardo Fernando
author_role	author
dc.contributor.authorlattes.por.fl_str_mv	http://lattes.cnpq.br/4759269927514121
dc.contributor.author.fl_str_mv	Mendes, Eduardo Fernando
dc.contributor.advisor1.fl_str_mv	Ciferri, Ricardo Rodrigues
dc.contributor.advisor1Lattes.fl_str_mv	http://lattes.cnpq.br/8382221522817502
dc.contributor.authorID.fl_str_mv	e8738b6a-54bc-4e8d-b63b-64dd91844d9d
contributor_str_mv	Ciferri, Ricardo Rodrigues
dc.subject.por.fl_str_mv	Banco de dados espaciais Processamento de consulta Junção espacial Processamento paralelo e distribuído Clusters de computadores
topic	Banco de dados espaciais Processamento de consulta Junção espacial Processamento paralelo e distribuído Clusters de computadores Spatial databases Query processing Spatial join Parallel and distributed processing Cluster computing CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
dc.subject.eng.fl_str_mv	Spatial databases Query processing Spatial join Parallel and distributed processing Cluster computing
dc.subject.cnpq.fl_str_mv	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description	The huge volume of spatial data generated and made available in recent years from different sources, such as remote sensing, smart phones, space telescopes, and satellites, has motivated researchers and practitioners around the world to find out a way to process efficiently this huge volume of spatial data. Systems based on the MapReduce programming paradigm, such as Hadoop, have proven to be an efficient framework for processing huge volumes of data in many applications. However, Hadoop has showed not to be adequate in native support for spatial data due to its central structure is not aware of the spatial characteristics of such data. The solution to this problem gave rise to SpatialHadoop, which is a Hadoop extension with native support for spatial data. However, SpatialHadoop does not enable to jointly allocate related spatial data and also does not take into account any characteristics of the data in the process of task scheduler for processing on the nodes of a cluster of computers. Given this scenario, this PhD dissertation aims to propose new strategies to improve the performance of the processing of the spatial join operations for huge volumes of data using SpatialHadoop. For this purpose, the proposed solutions explore the joint allocation of related spatial data and the scheduling strategy of MapReduce for related spatial data also allocated in a jointly form. The efficient data access is an essential step in achieving better performance during query processing. Therefore, the proposed solutions allow the reduction of network traffic and I/O operations to the disk and consequently improve the performance of spatial join processing by using SpatialHadoop. By means of experimental evaluations, it was possible to show that the novel data allocation policies and scheduling tasks actually improve the total processing time of the spatial join operations. The performance gain varied from 14.7% to 23.6% if compared to the baseline proposed by CoS-HDFS and varied from 8.3% to 65% if compared to the native support of SpatialHadoop.
publishDate	2017
dc.date.accessioned.fl_str_mv	2017-10-25T18:01:51Z
dc.date.available.fl_str_mv	2017-10-25T18:01:51Z
dc.date.issued.fl_str_mv	2017-02-17
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	MENDES, Eduardo Fernando. Processamento eficiente de junção espacial em ambiente paralelo e distribuído baseado em Spatialhadoop. 2017. Tese (Doutorado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2017. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/9168.
dc.identifier.uri.fl_str_mv	https://repositorio.ufscar.br/handle/20.500.14289/9168
identifier_str_mv	MENDES, Eduardo Fernando. Processamento eficiente de junção espacial em ambiente paralelo e distribuído baseado em Spatialhadoop. 2017. Tese (Doutorado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2017. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/9168.
url	https://repositorio.ufscar.br/handle/20.500.14289/9168
dc.language.iso.fl_str_mv	por
language	por
dc.relation.confidence.fl_str_mv	600 600
dc.relation.authority.fl_str_mv	3b1d5172-8bf0-4d0b-8777-ab82599bbf09
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Universidade Federal de São Carlos Câmpus São Carlos
dc.publisher.program.fl_str_mv	Programa de Pós-Graduação em Ciência da Computação - PPGCC
dc.publisher.initials.fl_str_mv	UFSCar
publisher.none.fl_str_mv	Universidade Federal de São Carlos Câmpus São Carlos
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFSCAR instname:Universidade Federal de São Carlos (UFSCAR) instacron:UFSCAR
instname_str	Universidade Federal de São Carlos (UFSCAR)
instacron_str	UFSCAR
institution	UFSCAR
reponame_str	Repositório Institucional da UFSCAR
collection	Repositório Institucional da UFSCAR
bitstream.url.fl_str_mv	https://repositorio.ufscar.br/bitstreams/ecd144f9-c124-4c02-ade7-9a814682a215/download https://repositorio.ufscar.br/bitstreams/2694be0f-f3d1-47bd-ad46-4650ccf64477/download https://repositorio.ufscar.br/bitstreams/c2d3153d-a8e9-4ccd-8495-04714084686c/download https://repositorio.ufscar.br/bitstreams/3b506c27-f45a-4d40-a2c2-8adb44121c3e/download
bitstream.checksum.fl_str_mv	966afb8a981794db0aee3bc97ee11d5b ae0398b6f8b235e40ad82cba6c50031d 123143096ebb6d0b59b509b162510d59 74b61db2537aa673c2e7dcf177e8e011
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)
repository.mail.fl_str_mv	repositorio.sibi@ufscar.br
_version_	1851688886227435520

Processamento eficiente de junção espacial em ambiente paralelo e distribuído baseado em Spatialhadoop

Registros relacionados