Paralelização de algoritmos de busca de documentos mais relevantes na web utilizando GPUs

Gaioso, Roussian Di Ramos Alves

Paralelização de algoritmos de busca de documentos mais relevantes na web utilizando GPUs

Detalhes bibliográficos
Ano de defesa:	2019
Autor(a) principal:	Gaioso, Roussian Di Ramos Alves
Orientador(a):	Senger, Hermes
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de São Carlos Câmpus São Carlos
Programa de Pós-Graduação:	Programa de Pós-Graduação em Ciência da Computação - PPGCC
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Busca na Web Processamento de consultas Algoritmos DAAT Algoritmos de Poda Algoritmo WAND Algoritmo MaxScore Algoritmos paralelos Arquitetura GPU
Palavras-chave em Inglês:	Web search Query processing DAAT Algorithms Pruning algorithms WAND Algorithm MaxScore algorithm Parallel algorithms
Área do conhecimento CNPq:	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
Link de acesso:	https://repositorio.ufscar.br/handle/20.500.14289/11481
Resumo:	Search engines are facing performance challenges because of the large number of documents and the increase of query loads in the Web environment. The success of a search engine is related to the ability of the query processing system to find documents that match the needs of information expressed in user queries in a short time interval. Despite the large amount of documents, users are more interested in fewer results in a query. This causes few documents to be highly relevant in most queries. DAAT dynamic pruning algorithms have been exploring the efficiency of query processing systems, avoiding wasting time sorting documents that are not likely to be relevant. To handle the scale and dynamics of user query traffic, query processing needs to make efficient use of hardware resources. The main objective of this doctoral thesis is to investigate the use of parallel computing in the process of identifying the most relevant documents to a given query in the GPU architecture. For this, strategies of parallelization of algorithms that aim to reduce the latency of response of a given query and to increase the flow of queries are proposed and evaluated in the GPU. The parallelization proposals are well suited to the category of DAAT algorithms and dynamic pruning algorithms. In the DAAT category, partitioning strategies are offered in a way that performs an investigation into the location of occurrences of the same document in the memory hierarchy of the GPU. At the level of dynamic pruning algorithms, threshold propagation policies among processors are proposed and the impacts generated on the efficiency of the parallel algorithms are analyzed. To verify efficiency in practice, the parallel proposals were implemented and tested in the Pascal GPU architecture and obtained a performance of 4x to 40x relative to the fundamental algorithms.

Metadados do item

id	SCAR_da582e03bab9ff74a11ab70dbc5c3395
oai_identifier_str	oai:repositorio.ufscar.br:20.500.14289/11481
network_acronym_str	SCAR
network_name_str	Repositório Institucional da UFSCAR
repository_id_str
spelling	Gaioso, Roussian Di Ramos AlvesSenger, Hermeshttp://lattes.cnpq.br/3691742159298316http://lattes.cnpq.br/3536210071193629da35d675-7730-4e71-94a7-c16f77f55df42019-07-05T18:12:01Z2019-07-05T18:12:01Z2019-02-13GAIOSO, Roussian Di Ramos Alves. Paralelização de algoritmos de busca de documentos mais relevantes na web utilizando GPUs. 2019. Tese (Doutorado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2019. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/11481.https://repositorio.ufscar.br/handle/20.500.14289/11481Search engines are facing performance challenges because of the large number of documents and the increase of query loads in the Web environment. The success of a search engine is related to the ability of the query processing system to find documents that match the needs of information expressed in user queries in a short time interval. Despite the large amount of documents, users are more interested in fewer results in a query. This causes few documents to be highly relevant in most queries. DAAT dynamic pruning algorithms have been exploring the efficiency of query processing systems, avoiding wasting time sorting documents that are not likely to be relevant. To handle the scale and dynamics of user query traffic, query processing needs to make efficient use of hardware resources. The main objective of this doctoral thesis is to investigate the use of parallel computing in the process of identifying the most relevant documents to a given query in the GPU architecture. For this, strategies of parallelization of algorithms that aim to reduce the latency of response of a given query and to increase the flow of queries are proposed and evaluated in the GPU. The parallelization proposals are well suited to the category of DAAT algorithms and dynamic pruning algorithms. In the DAAT category, partitioning strategies are offered in a way that performs an investigation into the location of occurrences of the same document in the memory hierarchy of the GPU. At the level of dynamic pruning algorithms, threshold propagation policies among processors are proposed and the impacts generated on the efficiency of the parallel algorithms are analyzed. To verify efficiency in practice, the parallel proposals were implemented and tested in the Pascal GPU architecture and obtained a performance of 4x to 40x relative to the fundamental algorithms.As máquinas de busca estão enfrentando desafios de desempenho devido à grande quantidade de documentos e ao aumento de cargas de consultas no ambiente Web. O sucesso de uma máquina de busca está relacionado à capacidade do sistema de processamento de consultas de encontrar, em um curto intervalo de tempo, documentos que correspondam às necessidades de informações expressas nas consultas dos usuários. Apesar da grande quantidade de documentos, os usuários estão mais interessados em poucos documentos de resultados de uma consulta. Isso faz com que haja poucos documentos que são altamente relevantes na maioria das consultas. Os algoritmos de poda dinâmica DAAT vêm explorando a eficiência dos sistemas de processamento de consulta evitando perder tempo ao classificar documentos que provalvemente não são relevantes. Para lidar com a escala e a dinâmica do tráfego de consultas do usuário, o processamento de consulta precisa fazer o uso eficiente dos recursos do hardware. O objetivo principal desta tese de doutorado é investigar o uso da computação paralela no processo de identificar os documentos mais relevantes a uma consulta realizando processamento na arquitetura GPU. Para isso, este trabalho apresenta estratégias de paralelização de algoritmos que visam a reduzir a latência de resposta de uma dada consulta e a aumentar a vazão das consultas. As propostas de paralelização são bem adequadas à categoria de algoritmos DAAT e aos algoritmos de poda dinâmica. Na categoria DAAT, estratégias de particionamento são oferecidas de modo que realizam uma investigação na localização das ocorrências de um mesmo documento na hierarquia de memória da GPU. No nível dos algoritmos de poda dinâmica, políticas de propagação de threshold entre os processadores são propostas e os impactos gerados na eficiência dos algoritmos paralelos são analisados. Para mostrar a eficiência na prática, as propostas paralelas foram implementadas e experimentadas na arquitetura da GPU Pascal e obtiveram um desempenho de 4x a 40x em relação aos algoritmos fundamentais.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)porUniversidade Federal de São CarlosCâmpus São CarlosPrograma de Pós-Graduação em Ciência da Computação - PPGCCUFSCarBusca na WebProcessamento de consultasAlgoritmos DAATAlgoritmos de PodaAlgoritmo WANDAlgoritmo MaxScoreAlgoritmos paralelosArquitetura GPUWeb searchQuery processingDAAT AlgorithmsPruning algorithmsWAND AlgorithmMaxScore algorithmParallel algorithmsCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOParalelização de algoritmos de busca de documentos mais relevantes na web utilizando GPUsParallelization of search algorithms of most relevant documents on the web using GPUsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesis18 meses após a data da defesa6002947c428-30b1-4d14-8369-e5871a4d7accinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINALVersão Final - Paralelização de Algoritmos de Busca de Documentos mais Relevantes na Web Utilizando GPUs.pdfVersão Final - Paralelização de Algoritmos de Busca de Documentos mais Relevantes na Web Utilizando GPUs.pdfapplication/pdf2287254https://repositorio.ufscar.br/bitstreams/3959780a-9805-4e52-8ed5-9ed5136e4c08/downloada7b72f33dd16e9f235f807bbd26f6daeMD51trueAnonymousREAD2020-08-13LICENSElicense.txtlicense.txttext/plain; charset=utf-81957https://repositorio.ufscar.br/bitstreams/0947f761-9304-45bc-bac1-f1b31b241fb9/downloadae0398b6f8b235e40ad82cba6c50031dMD54falseAnonymousREAD2020-08-13TEXTVersão Final - Paralelização de Algoritmos de Busca de Documentos mais Relevantes na Web Utilizando GPUs.pdf.txtVersão Final - Paralelização de Algoritmos de Busca de Documentos mais Relevantes na Web Utilizando GPUs.pdf.txtExtracted texttext/plain261958https://repositorio.ufscar.br/bitstreams/01598182-d9dc-4c84-ac03-5e9fcb3a5aff/downloadb9fc6d5b5ca8e95c0c292c1e922c8deeMD57falseAnonymousREAD2020-08-13THUMBNAILVersão Final - Paralelização de Algoritmos de Busca de Documentos mais Relevantes na Web Utilizando GPUs.pdf.jpgVersão Final - Paralelização de Algoritmos de Busca de Documentos mais Relevantes na Web Utilizando GPUs.pdf.jpgIM Thumbnailimage/jpeg8008https://repositorio.ufscar.br/bitstreams/6eaeefa6-31dd-4bc0-884d-eb68f15d4884/download7c16c7d18b92948bbefcb9594159f207MD58falseAnonymousREAD2020-08-1320.500.14289/114812025-02-05 19:16:29.543Acesso abertoopen.accessoai:repositorio.ufscar.br:20.500.14289/11481https://repositorio.ufscar.brRepositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestrepositorio.sibi@ufscar.bropendoar:43222025-02-05T22:16:29Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)falseTElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkNvbSBhIGFwcmVzZW50YcOnw6NvIGRlc3RhIGxpY2Vuw6dhLCB2b2PDqiAobyBhdXRvciAoZXMpIG91IG8gdGl0dWxhciBkb3MgZGlyZWl0b3MgZGUgYXV0b3IpIGNvbmNlZGUgw6AgVW5pdmVyc2lkYWRlCkZlZGVyYWwgZGUgU8OjbyBDYXJsb3MgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsICB0cmFkdXppciAoY29uZm9ybWUgZGVmaW5pZG8gYWJhaXhvKSwgZS9vdQpkaXN0cmlidWlyIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyAoaW5jbHVpbmRvIG8gcmVzdW1vKSBwb3IgdG9kbyBvIG11bmRvIG5vIGZvcm1hdG8gaW1wcmVzc28gZSBlbGV0csO0bmljbyBlCmVtIHF1YWxxdWVyIG1laW8sIGluY2x1aW5kbyBvcyBmb3JtYXRvcyDDoXVkaW8gb3UgdsOtZGVvLgoKVm9jw6ogY29uY29yZGEgcXVlIGEgVUZTQ2FyIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28KcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBhIFVGU0NhciBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgYSBzdWEgdGVzZSBvdQpkaXNzZXJ0YcOnw6NvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcwpuZXN0YSBsaWNlbsOnYS4gVm9jw6ogdGFtYsOpbSBkZWNsYXJhIHF1ZSBvIGRlcMOzc2l0byBkYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIG7Do28sIHF1ZSBzZWphIGRlIHNldQpjb25oZWNpbWVudG8sIGluZnJpbmdlIGRpcmVpdG9zIGF1dG9yYWlzIGRlIG5pbmd1w6ltLgoKQ2FzbyBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gY29udGVuaGEgbWF0ZXJpYWwgcXVlIHZvY8OqIG7Do28gcG9zc3VpIGEgdGl0dWxhcmlkYWRlIGRvcyBkaXJlaXRvcyBhdXRvcmFpcywgdm9jw6oKZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc8OjbyBpcnJlc3RyaXRhIGRvIGRldGVudG9yIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBwYXJhIGNvbmNlZGVyIMOgIFVGU0NhcgpvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUKaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBURVNFIE9VIERJU1NFUlRBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UKQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBVRlNDYXIsClZPQ8OKIERFQ0xBUkEgUVVFIFJFU1BFSVRPVSBUT0RPUyBFIFFVQUlTUVVFUiBESVJFSVRPUyBERSBSRVZJU8ODTyBDT01PClRBTULDiU0gQVMgREVNQUlTIE9CUklHQcOHw5VFUyBFWElHSURBUyBQT1IgQ09OVFJBVE8gT1UgQUNPUkRPLgoKQSBVRlNDYXIgc2UgY29tcHJvbWV0ZSBhIGlkZW50aWZpY2FyIGNsYXJhbWVudGUgbyBzZXUgbm9tZSAocykgb3UgbyhzKSBub21lKHMpIGRvKHMpCmRldGVudG9yKGVzKSBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzCmNvbmNlZGlkYXMgcG9yIGVzdGEgbGljZW7Dp2EuCg==
dc.title.por.fl_str_mv	Paralelização de algoritmos de busca de documentos mais relevantes na web utilizando GPUs
dc.title.alternative.eng.fl_str_mv	Parallelization of search algorithms of most relevant documents on the web using GPUs
title	Paralelização de algoritmos de busca de documentos mais relevantes na web utilizando GPUs
spellingShingle	Paralelização de algoritmos de busca de documentos mais relevantes na web utilizando GPUs Gaioso, Roussian Di Ramos Alves Busca na Web Processamento de consultas Algoritmos DAAT Algoritmos de Poda Algoritmo WAND Algoritmo MaxScore Algoritmos paralelos Arquitetura GPU Web search Query processing DAAT Algorithms Pruning algorithms WAND Algorithm MaxScore algorithm Parallel algorithms CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short	Paralelização de algoritmos de busca de documentos mais relevantes na web utilizando GPUs
title_full	Paralelização de algoritmos de busca de documentos mais relevantes na web utilizando GPUs
title_fullStr	Paralelização de algoritmos de busca de documentos mais relevantes na web utilizando GPUs
title_full_unstemmed	Paralelização de algoritmos de busca de documentos mais relevantes na web utilizando GPUs
title_sort	Paralelização de algoritmos de busca de documentos mais relevantes na web utilizando GPUs
author	Gaioso, Roussian Di Ramos Alves
author_facet	Gaioso, Roussian Di Ramos Alves
author_role	author
dc.contributor.authorlattes.por.fl_str_mv	http://lattes.cnpq.br/3536210071193629
dc.contributor.author.fl_str_mv	Gaioso, Roussian Di Ramos Alves
dc.contributor.advisor1.fl_str_mv	Senger, Hermes
dc.contributor.advisor1Lattes.fl_str_mv	http://lattes.cnpq.br/3691742159298316
dc.contributor.authorID.fl_str_mv	da35d675-7730-4e71-94a7-c16f77f55df4
contributor_str_mv	Senger, Hermes
dc.subject.por.fl_str_mv	Busca na Web Processamento de consultas Algoritmos DAAT Algoritmos de Poda Algoritmo WAND Algoritmo MaxScore Algoritmos paralelos Arquitetura GPU
topic	Busca na Web Processamento de consultas Algoritmos DAAT Algoritmos de Poda Algoritmo WAND Algoritmo MaxScore Algoritmos paralelos Arquitetura GPU Web search Query processing DAAT Algorithms Pruning algorithms WAND Algorithm MaxScore algorithm Parallel algorithms CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
dc.subject.eng.fl_str_mv	Web search Query processing DAAT Algorithms Pruning algorithms WAND Algorithm MaxScore algorithm Parallel algorithms
dc.subject.cnpq.fl_str_mv	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description	Search engines are facing performance challenges because of the large number of documents and the increase of query loads in the Web environment. The success of a search engine is related to the ability of the query processing system to find documents that match the needs of information expressed in user queries in a short time interval. Despite the large amount of documents, users are more interested in fewer results in a query. This causes few documents to be highly relevant in most queries. DAAT dynamic pruning algorithms have been exploring the efficiency of query processing systems, avoiding wasting time sorting documents that are not likely to be relevant. To handle the scale and dynamics of user query traffic, query processing needs to make efficient use of hardware resources. The main objective of this doctoral thesis is to investigate the use of parallel computing in the process of identifying the most relevant documents to a given query in the GPU architecture. For this, strategies of parallelization of algorithms that aim to reduce the latency of response of a given query and to increase the flow of queries are proposed and evaluated in the GPU. The parallelization proposals are well suited to the category of DAAT algorithms and dynamic pruning algorithms. In the DAAT category, partitioning strategies are offered in a way that performs an investigation into the location of occurrences of the same document in the memory hierarchy of the GPU. At the level of dynamic pruning algorithms, threshold propagation policies among processors are proposed and the impacts generated on the efficiency of the parallel algorithms are analyzed. To verify efficiency in practice, the parallel proposals were implemented and tested in the Pascal GPU architecture and obtained a performance of 4x to 40x relative to the fundamental algorithms.
publishDate	2019
dc.date.accessioned.fl_str_mv	2019-07-05T18:12:01Z
dc.date.available.fl_str_mv	2019-07-05T18:12:01Z
dc.date.issued.fl_str_mv	2019-02-13
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	GAIOSO, Roussian Di Ramos Alves. Paralelização de algoritmos de busca de documentos mais relevantes na web utilizando GPUs. 2019. Tese (Doutorado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2019. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/11481.
dc.identifier.uri.fl_str_mv	https://repositorio.ufscar.br/handle/20.500.14289/11481
identifier_str_mv	GAIOSO, Roussian Di Ramos Alves. Paralelização de algoritmos de busca de documentos mais relevantes na web utilizando GPUs. 2019. Tese (Doutorado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2019. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/11481.
url	https://repositorio.ufscar.br/handle/20.500.14289/11481
dc.language.iso.fl_str_mv	por
language	por
dc.relation.confidence.fl_str_mv	600
dc.relation.authority.fl_str_mv	2947c428-30b1-4d14-8369-e5871a4d7acc
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Universidade Federal de São Carlos Câmpus São Carlos
dc.publisher.program.fl_str_mv	Programa de Pós-Graduação em Ciência da Computação - PPGCC
dc.publisher.initials.fl_str_mv	UFSCar
publisher.none.fl_str_mv	Universidade Federal de São Carlos Câmpus São Carlos
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFSCAR instname:Universidade Federal de São Carlos (UFSCAR) instacron:UFSCAR
instname_str	Universidade Federal de São Carlos (UFSCAR)
instacron_str	UFSCAR
institution	UFSCAR
reponame_str	Repositório Institucional da UFSCAR
collection	Repositório Institucional da UFSCAR
bitstream.url.fl_str_mv	https://repositorio.ufscar.br/bitstreams/3959780a-9805-4e52-8ed5-9ed5136e4c08/download https://repositorio.ufscar.br/bitstreams/0947f761-9304-45bc-bac1-f1b31b241fb9/download https://repositorio.ufscar.br/bitstreams/01598182-d9dc-4c84-ac03-5e9fcb3a5aff/download https://repositorio.ufscar.br/bitstreams/6eaeefa6-31dd-4bc0-884d-eb68f15d4884/download
bitstream.checksum.fl_str_mv	a7b72f33dd16e9f235f807bbd26f6dae ae0398b6f8b235e40ad82cba6c50031d b9fc6d5b5ca8e95c0c292c1e922c8dee 7c16c7d18b92948bbefcb9594159f207
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)
repository.mail.fl_str_mv	repositorio.sibi@ufscar.br
_version_	1851688843480137728

Paralelização de algoritmos de busca de documentos mais relevantes na web utilizando GPUs

Registros relacionados