Analysis of variants of similarity retrieval operators in Metric Access Methods
| Ano de defesa: | 2025 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Dissertação |
| Tipo de acesso: | Acesso aberto |
| Idioma: | eng |
| Instituição de defesa: |
Biblioteca Digitais de Teses e Dissertações da USP
|
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | https://www.teses.usp.br/teses/disponiveis/55/55134/tde-06082025-183402/ |
Resumo: | The Database Management Systems (DBMS) are the tools responsible for storing and retrieving data par excellence. However, their features were developed focusing on scalar data, such as numbers and small text. When considering complex data (non-scalar data) such as images and videos, the resources available in these tools become less useful in the context of data retrieval and data organization. Furthermore, for complex data, recovery based on the similarity between elements makes more sense. The definition of similarity depends on particular characteristics of the application domain, and the Metric Access Methods (MAM) are employed to structure these data, aiming at processing the similarity retrievals more efficiently. The literature presents a few works that integrate open-source Relational Database Management Systems (RDBMS) and similarity queries using MAMs. In the majority, the available similarity retrieval operations are the most basic, and they tend to work only on complex data. The objective of this masters thesis is to define and evaluate useful variations of similarity queries that should be accelerated and create the support for query execution using Metric Access Methods (MAM) within the execution engine of the Relational DBMS, and assess potential applications of these concepts in NoSQL systems. The focus will be on MAM and the development of its retrieval operators, so that operations performed in DBMS can be varied and optimized (it is not within the scope of this work to integrate MAM into the RDBMS). As a result, this masters thesis: defines and details the applicability of several similarity query variations; develops a new MAM, called Include-Slim, which allows accelerating these queries within an RDBMS; defines and develops a new framework called Similarity-Slim, which enables similarity queries to be accelerated and their financial costs reduced when executed on a NoSQL cloud database using document collections; and presents two new post-processing algorithms, Slim-Bloat and Slim-Replace, that can be applied to any MAM to generate a better structure. The evaluations performed show that Include-Slim can accelerate query variations by up to four orders of magnitude, and Similarity-Slim can reduce the financial cost of queries by up to three orders of magnitude. To the best of our knowledge, this is the first work that addresses accelerating the presented variations and the first one applied to reduce the financial costs in NoSQL environments. |
| id |
USP_52cdedf020b17a2908697f884cb826b3 |
|---|---|
| oai_identifier_str |
oai:teses.usp.br:tde-06082025-183402 |
| network_acronym_str |
USP |
| network_name_str |
Biblioteca Digital de Teses e Dissertações da USP |
| repository_id_str |
|
| spelling |
Analysis of variants of similarity retrieval operators in Metric Access MethodsAnálise de variantes dos operadores de recuperação por similaridade em Métodos de Acesso MétricosBilling reductionConsultas por similaridadeDBMSMétodos de acesso métricosMetric access methodsQuery variationsRedução de custo financeiroSGBDsSimilarity queryVariações de consultaThe Database Management Systems (DBMS) are the tools responsible for storing and retrieving data par excellence. However, their features were developed focusing on scalar data, such as numbers and small text. When considering complex data (non-scalar data) such as images and videos, the resources available in these tools become less useful in the context of data retrieval and data organization. Furthermore, for complex data, recovery based on the similarity between elements makes more sense. The definition of similarity depends on particular characteristics of the application domain, and the Metric Access Methods (MAM) are employed to structure these data, aiming at processing the similarity retrievals more efficiently. The literature presents a few works that integrate open-source Relational Database Management Systems (RDBMS) and similarity queries using MAMs. In the majority, the available similarity retrieval operations are the most basic, and they tend to work only on complex data. The objective of this masters thesis is to define and evaluate useful variations of similarity queries that should be accelerated and create the support for query execution using Metric Access Methods (MAM) within the execution engine of the Relational DBMS, and assess potential applications of these concepts in NoSQL systems. The focus will be on MAM and the development of its retrieval operators, so that operations performed in DBMS can be varied and optimized (it is not within the scope of this work to integrate MAM into the RDBMS). As a result, this masters thesis: defines and details the applicability of several similarity query variations; develops a new MAM, called Include-Slim, which allows accelerating these queries within an RDBMS; defines and develops a new framework called Similarity-Slim, which enables similarity queries to be accelerated and their financial costs reduced when executed on a NoSQL cloud database using document collections; and presents two new post-processing algorithms, Slim-Bloat and Slim-Replace, that can be applied to any MAM to generate a better structure. The evaluations performed show that Include-Slim can accelerate query variations by up to four orders of magnitude, and Similarity-Slim can reduce the financial cost of queries by up to three orders of magnitude. To the best of our knowledge, this is the first work that addresses accelerating the presented variations and the first one applied to reduce the financial costs in NoSQL environments.Os Sistemas de Gerenciamento de Bases de Dados (SGBDs) são ferramentas responsáveis por armazenar e recuperar dados por excelência. No entanto, seus recursos foram desenvolvidos com enfoque em dados escalares, como números e pequenos textos. Ao se considerar dados complexos (não escalares) como imagens e vídeos, os recursos disponibilizados por essas ferramentas tornam-se pouco úteis no contexto de recuperação e organização dos dados. Além disso, para dados complexos, a recuperação baseada na similaridade entre os elementos faz mais sentido. A definição da similaridade depende de características particulares do domínio de aplicação, e os Métodos de Acesso Métricos (MAM) são utilizados na estruturação desses dados para tornar mais eficientes os processo de recuperação por similaridade. Alguns poucos trabalhos da literatura apresentam integrações entre SGBDs Relacionais de código aberto e consultas por similaridade utilizando MAMs. Nesses casos, as operações de recuperação por similaridade disponibilizadas se restringem às mais básicas e elas tendem a trabalhar somente sobre o dado complexo. O objetivo deste trabalho de mestrado é avaliar variações de consultas que devem ser aceleradas e criar meios para execução utilizando Métodos de Acesso Métricos dentro do motor de execução do SGBD Relacional e posteriormente avaliar possíveis aplicações desses conceitos em sistemas NoSQL. O enfoque será sobre o MAM e o desenvolvimento de seus operadores de recuperação, para que operações feitas em SGBDs possam ser variadas e otimizadas (não está no escopo do trabalho realizar a integração do MAM no SGBDR). Como resultados, esta tese de mestrado: define e detalha a aplicabilidade de diversas variações de consultas por similaridade; desenvolve um novo MAM, chamado Include-Slim, que permite acelerar essas consultas dentro de um SGBDR; define e desenvolve o novo framework chamado Similarity-Slim, que permite que consultas por similaridade sejam aceleradas e tenham os seus custos financeiros reduzidos quando executados em bases de dados NoSQL em nuvem que utilizam de coleções de documentos; e apresenta dois novos algoritmos para pós-processamento, Slim-Bloat e Slim-Replace, que podem ser aplicados sobre qualquer MAM, para gerar uma estrutura melhor. Os resultados das avaliações realizadas mostram que a Include-Slim pode acelerar as variantes de consultas em até quatro ordens de magnitude e o Similarity-Slim pode reduzir o custo financeiro das consultas em até três ordens de magnitude. Até onde sabemos, este é o primeiro trabalho que trata da aceleração das variantes apresentadas e o primeiro aplicado na redução de custos financeiros em ambientes NoSQL.Biblioteca Digitais de Teses e Dissertações da USPTraina Junior, CaetanoSilva, William Zaniboni2025-05-28info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/55/55134/tde-06082025-183402/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2025-08-07T09:02:02Zoai:teses.usp.br:tde-06082025-183402Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212025-08-07T09:02:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false |
| dc.title.none.fl_str_mv |
Analysis of variants of similarity retrieval operators in Metric Access Methods Análise de variantes dos operadores de recuperação por similaridade em Métodos de Acesso Métricos |
| title |
Analysis of variants of similarity retrieval operators in Metric Access Methods |
| spellingShingle |
Analysis of variants of similarity retrieval operators in Metric Access Methods Silva, William Zaniboni Billing reduction Consultas por similaridade DBMS Métodos de acesso métricos Metric access methods Query variations Redução de custo financeiro SGBDs Similarity query Variações de consulta |
| title_short |
Analysis of variants of similarity retrieval operators in Metric Access Methods |
| title_full |
Analysis of variants of similarity retrieval operators in Metric Access Methods |
| title_fullStr |
Analysis of variants of similarity retrieval operators in Metric Access Methods |
| title_full_unstemmed |
Analysis of variants of similarity retrieval operators in Metric Access Methods |
| title_sort |
Analysis of variants of similarity retrieval operators in Metric Access Methods |
| author |
Silva, William Zaniboni |
| author_facet |
Silva, William Zaniboni |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Traina Junior, Caetano |
| dc.contributor.author.fl_str_mv |
Silva, William Zaniboni |
| dc.subject.por.fl_str_mv |
Billing reduction Consultas por similaridade DBMS Métodos de acesso métricos Metric access methods Query variations Redução de custo financeiro SGBDs Similarity query Variações de consulta |
| topic |
Billing reduction Consultas por similaridade DBMS Métodos de acesso métricos Metric access methods Query variations Redução de custo financeiro SGBDs Similarity query Variações de consulta |
| description |
The Database Management Systems (DBMS) are the tools responsible for storing and retrieving data par excellence. However, their features were developed focusing on scalar data, such as numbers and small text. When considering complex data (non-scalar data) such as images and videos, the resources available in these tools become less useful in the context of data retrieval and data organization. Furthermore, for complex data, recovery based on the similarity between elements makes more sense. The definition of similarity depends on particular characteristics of the application domain, and the Metric Access Methods (MAM) are employed to structure these data, aiming at processing the similarity retrievals more efficiently. The literature presents a few works that integrate open-source Relational Database Management Systems (RDBMS) and similarity queries using MAMs. In the majority, the available similarity retrieval operations are the most basic, and they tend to work only on complex data. The objective of this masters thesis is to define and evaluate useful variations of similarity queries that should be accelerated and create the support for query execution using Metric Access Methods (MAM) within the execution engine of the Relational DBMS, and assess potential applications of these concepts in NoSQL systems. The focus will be on MAM and the development of its retrieval operators, so that operations performed in DBMS can be varied and optimized (it is not within the scope of this work to integrate MAM into the RDBMS). As a result, this masters thesis: defines and details the applicability of several similarity query variations; develops a new MAM, called Include-Slim, which allows accelerating these queries within an RDBMS; defines and develops a new framework called Similarity-Slim, which enables similarity queries to be accelerated and their financial costs reduced when executed on a NoSQL cloud database using document collections; and presents two new post-processing algorithms, Slim-Bloat and Slim-Replace, that can be applied to any MAM to generate a better structure. The evaluations performed show that Include-Slim can accelerate query variations by up to four orders of magnitude, and Similarity-Slim can reduce the financial cost of queries by up to three orders of magnitude. To the best of our knowledge, this is the first work that addresses accelerating the presented variations and the first one applied to reduce the financial costs in NoSQL environments. |
| publishDate |
2025 |
| dc.date.none.fl_str_mv |
2025-05-28 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://www.teses.usp.br/teses/disponiveis/55/55134/tde-06082025-183402/ |
| url |
https://www.teses.usp.br/teses/disponiveis/55/55134/tde-06082025-183402/ |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
|
| dc.rights.driver.fl_str_mv |
Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess |
| rights_invalid_str_mv |
Liberar o conteúdo para acesso público. |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.coverage.none.fl_str_mv |
|
| dc.publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
| publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
| dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP |
| instname_str |
Universidade de São Paulo (USP) |
| instacron_str |
USP |
| institution |
USP |
| reponame_str |
Biblioteca Digital de Teses e Dissertações da USP |
| collection |
Biblioteca Digital de Teses e Dissertações da USP |
| repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP) |
| repository.mail.fl_str_mv |
virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br |
| _version_ |
1844786329451233280 |