Analysis of variants of similarity retrieval operators in Metric Access Methods

Detalhes bibliográficos
Ano de defesa: 2025
Autor(a) principal: Silva, William Zaniboni
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/55/55134/tde-06082025-183402/
Resumo: The Database Management Systems (DBMS) are the tools responsible for storing and retrieving data par excellence. However, their features were developed focusing on scalar data, such as numbers and small text. When considering complex data (non-scalar data) such as images and videos, the resources available in these tools become less useful in the context of data retrieval and data organization. Furthermore, for complex data, recovery based on the similarity between elements makes more sense. The definition of similarity depends on particular characteristics of the application domain, and the Metric Access Methods (MAM) are employed to structure these data, aiming at processing the similarity retrievals more efficiently. The literature presents a few works that integrate open-source Relational Database Management Systems (RDBMS) and similarity queries using MAMs. In the majority, the available similarity retrieval operations are the most basic, and they tend to work only on complex data. The objective of this masters thesis is to define and evaluate useful variations of similarity queries that should be accelerated and create the support for query execution using Metric Access Methods (MAM) within the execution engine of the Relational DBMS, and assess potential applications of these concepts in NoSQL systems. The focus will be on MAM and the development of its retrieval operators, so that operations performed in DBMS can be varied and optimized (it is not within the scope of this work to integrate MAM into the RDBMS). As a result, this masters thesis: defines and details the applicability of several similarity query variations; develops a new MAM, called Include-Slim, which allows accelerating these queries within an RDBMS; defines and develops a new framework called Similarity-Slim, which enables similarity queries to be accelerated and their financial costs reduced when executed on a NoSQL cloud database using document collections; and presents two new post-processing algorithms, Slim-Bloat and Slim-Replace, that can be applied to any MAM to generate a better structure. The evaluations performed show that Include-Slim can accelerate query variations by up to four orders of magnitude, and Similarity-Slim can reduce the financial cost of queries by up to three orders of magnitude. To the best of our knowledge, this is the first work that addresses accelerating the presented variations and the first one applied to reduce the financial costs in NoSQL environments.
id USP_52cdedf020b17a2908697f884cb826b3
oai_identifier_str oai:teses.usp.br:tde-06082025-183402
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str
spelling Analysis of variants of similarity retrieval operators in Metric Access MethodsAnálise de variantes dos operadores de recuperação por similaridade em Métodos de Acesso MétricosBilling reductionConsultas por similaridadeDBMSMétodos de acesso métricosMetric access methodsQuery variationsRedução de custo financeiroSGBDsSimilarity queryVariações de consultaThe Database Management Systems (DBMS) are the tools responsible for storing and retrieving data par excellence. However, their features were developed focusing on scalar data, such as numbers and small text. When considering complex data (non-scalar data) such as images and videos, the resources available in these tools become less useful in the context of data retrieval and data organization. Furthermore, for complex data, recovery based on the similarity between elements makes more sense. The definition of similarity depends on particular characteristics of the application domain, and the Metric Access Methods (MAM) are employed to structure these data, aiming at processing the similarity retrievals more efficiently. The literature presents a few works that integrate open-source Relational Database Management Systems (RDBMS) and similarity queries using MAMs. In the majority, the available similarity retrieval operations are the most basic, and they tend to work only on complex data. The objective of this masters thesis is to define and evaluate useful variations of similarity queries that should be accelerated and create the support for query execution using Metric Access Methods (MAM) within the execution engine of the Relational DBMS, and assess potential applications of these concepts in NoSQL systems. The focus will be on MAM and the development of its retrieval operators, so that operations performed in DBMS can be varied and optimized (it is not within the scope of this work to integrate MAM into the RDBMS). As a result, this masters thesis: defines and details the applicability of several similarity query variations; develops a new MAM, called Include-Slim, which allows accelerating these queries within an RDBMS; defines and develops a new framework called Similarity-Slim, which enables similarity queries to be accelerated and their financial costs reduced when executed on a NoSQL cloud database using document collections; and presents two new post-processing algorithms, Slim-Bloat and Slim-Replace, that can be applied to any MAM to generate a better structure. The evaluations performed show that Include-Slim can accelerate query variations by up to four orders of magnitude, and Similarity-Slim can reduce the financial cost of queries by up to three orders of magnitude. To the best of our knowledge, this is the first work that addresses accelerating the presented variations and the first one applied to reduce the financial costs in NoSQL environments.Os Sistemas de Gerenciamento de Bases de Dados (SGBDs) são ferramentas responsáveis por armazenar e recuperar dados por excelência. No entanto, seus recursos foram desenvolvidos com enfoque em dados escalares, como números e pequenos textos. Ao se considerar dados complexos (não escalares) como imagens e vídeos, os recursos disponibilizados por essas ferramentas tornam-se pouco úteis no contexto de recuperação e organização dos dados. Além disso, para dados complexos, a recuperação baseada na similaridade entre os elementos faz mais sentido. A definição da similaridade depende de características particulares do domínio de aplicação, e os Métodos de Acesso Métricos (MAM) são utilizados na estruturação desses dados para tornar mais eficientes os processo de recuperação por similaridade. Alguns poucos trabalhos da literatura apresentam integrações entre SGBDs Relacionais de código aberto e consultas por similaridade utilizando MAMs. Nesses casos, as operações de recuperação por similaridade disponibilizadas se restringem às mais básicas e elas tendem a trabalhar somente sobre o dado complexo. O objetivo deste trabalho de mestrado é avaliar variações de consultas que devem ser aceleradas e criar meios para execução utilizando Métodos de Acesso Métricos dentro do motor de execução do SGBD Relacional e posteriormente avaliar possíveis aplicações desses conceitos em sistemas NoSQL. O enfoque será sobre o MAM e o desenvolvimento de seus operadores de recuperação, para que operações feitas em SGBDs possam ser variadas e otimizadas (não está no escopo do trabalho realizar a integração do MAM no SGBDR). Como resultados, esta tese de mestrado: define e detalha a aplicabilidade de diversas variações de consultas por similaridade; desenvolve um novo MAM, chamado Include-Slim, que permite acelerar essas consultas dentro de um SGBDR; define e desenvolve o novo framework chamado Similarity-Slim, que permite que consultas por similaridade sejam aceleradas e tenham os seus custos financeiros reduzidos quando executados em bases de dados NoSQL em nuvem que utilizam de coleções de documentos; e apresenta dois novos algoritmos para pós-processamento, Slim-Bloat e Slim-Replace, que podem ser aplicados sobre qualquer MAM, para gerar uma estrutura melhor. Os resultados das avaliações realizadas mostram que a Include-Slim pode acelerar as variantes de consultas em até quatro ordens de magnitude e o Similarity-Slim pode reduzir o custo financeiro das consultas em até três ordens de magnitude. Até onde sabemos, este é o primeiro trabalho que trata da aceleração das variantes apresentadas e o primeiro aplicado na redução de custos financeiros em ambientes NoSQL.Biblioteca Digitais de Teses e Dissertações da USPTraina Junior, CaetanoSilva, William Zaniboni2025-05-28info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/55/55134/tde-06082025-183402/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2025-08-07T09:02:02Zoai:teses.usp.br:tde-06082025-183402Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212025-08-07T09:02:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Analysis of variants of similarity retrieval operators in Metric Access Methods
Análise de variantes dos operadores de recuperação por similaridade em Métodos de Acesso Métricos
title Analysis of variants of similarity retrieval operators in Metric Access Methods
spellingShingle Analysis of variants of similarity retrieval operators in Metric Access Methods
Silva, William Zaniboni
Billing reduction
Consultas por similaridade
DBMS
Métodos de acesso métricos
Metric access methods
Query variations
Redução de custo financeiro
SGBDs
Similarity query
Variações de consulta
title_short Analysis of variants of similarity retrieval operators in Metric Access Methods
title_full Analysis of variants of similarity retrieval operators in Metric Access Methods
title_fullStr Analysis of variants of similarity retrieval operators in Metric Access Methods
title_full_unstemmed Analysis of variants of similarity retrieval operators in Metric Access Methods
title_sort Analysis of variants of similarity retrieval operators in Metric Access Methods
author Silva, William Zaniboni
author_facet Silva, William Zaniboni
author_role author
dc.contributor.none.fl_str_mv Traina Junior, Caetano
dc.contributor.author.fl_str_mv Silva, William Zaniboni
dc.subject.por.fl_str_mv Billing reduction
Consultas por similaridade
DBMS
Métodos de acesso métricos
Metric access methods
Query variations
Redução de custo financeiro
SGBDs
Similarity query
Variações de consulta
topic Billing reduction
Consultas por similaridade
DBMS
Métodos de acesso métricos
Metric access methods
Query variations
Redução de custo financeiro
SGBDs
Similarity query
Variações de consulta
description The Database Management Systems (DBMS) are the tools responsible for storing and retrieving data par excellence. However, their features were developed focusing on scalar data, such as numbers and small text. When considering complex data (non-scalar data) such as images and videos, the resources available in these tools become less useful in the context of data retrieval and data organization. Furthermore, for complex data, recovery based on the similarity between elements makes more sense. The definition of similarity depends on particular characteristics of the application domain, and the Metric Access Methods (MAM) are employed to structure these data, aiming at processing the similarity retrievals more efficiently. The literature presents a few works that integrate open-source Relational Database Management Systems (RDBMS) and similarity queries using MAMs. In the majority, the available similarity retrieval operations are the most basic, and they tend to work only on complex data. The objective of this masters thesis is to define and evaluate useful variations of similarity queries that should be accelerated and create the support for query execution using Metric Access Methods (MAM) within the execution engine of the Relational DBMS, and assess potential applications of these concepts in NoSQL systems. The focus will be on MAM and the development of its retrieval operators, so that operations performed in DBMS can be varied and optimized (it is not within the scope of this work to integrate MAM into the RDBMS). As a result, this masters thesis: defines and details the applicability of several similarity query variations; develops a new MAM, called Include-Slim, which allows accelerating these queries within an RDBMS; defines and develops a new framework called Similarity-Slim, which enables similarity queries to be accelerated and their financial costs reduced when executed on a NoSQL cloud database using document collections; and presents two new post-processing algorithms, Slim-Bloat and Slim-Replace, that can be applied to any MAM to generate a better structure. The evaluations performed show that Include-Slim can accelerate query variations by up to four orders of magnitude, and Similarity-Slim can reduce the financial cost of queries by up to three orders of magnitude. To the best of our knowledge, this is the first work that addresses accelerating the presented variations and the first one applied to reduce the financial costs in NoSQL environments.
publishDate 2025
dc.date.none.fl_str_mv 2025-05-28
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://www.teses.usp.br/teses/disponiveis/55/55134/tde-06082025-183402/
url https://www.teses.usp.br/teses/disponiveis/55/55134/tde-06082025-183402/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1844786329451233280