Ulysses-RFSQ: improving information retrieval through relevance feedback for similar queries

Detalhes bibliográficos
Ano de defesa: 2025
Autor(a) principal: VITÓRIO, Douglas Álisson Marques de Sá
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade Federal de Pernambuco
UFPE
Brasil
Programa de Pos Graduacao em Ciencia da Computacao
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://repositorio.ufpe.br/handle/123456789/67960
Resumo: The use of Relevance Feedback can enhance the Information Retrieval (IR) performance, but this method is often used only to improve the retrieval for a specific query: the one currently being processed. When there is available relevance information from past searches, this information may be useful to help future searches. If two queries are sufficiently similar, the relevant documents for one may also be relevant for the other. However, only a few studies were found in the literature dealing with this use of relevance information from past queries, as there is a lack of benchmark datasets containing this information for similar queries. In this sense, this study presents Ulysses-RFSQ, a novel IR method that aims to improve the results for future queries by using the Relevance Feedback information from past similar ones. It works by re-ranking the list of documents retrieved by a base IR algorithm through the addition of a bonus or a penalty to the documents’ score. Therefore, it can be used with any algorithm that computes a score for the documents, such as BM25 or Sentence-BERT models. To evaluate the Ulysses-RFSQ method, a Relevance Feedback dataset, called Ulysses RFCorpus, was built together with the Brazilian Chamber of Deputies and made available to the community. Besides Ulysses-RFCorpus, the proposed method was also evaluated in larger dataset (the Preliminary Search corpus) provided by the Chamber, which could not be made available. The method’s evaluation in the legislative scenario is justified by the fact that most of the queries used in the Brazilian legislative process are redundant. As results, the findings pointed out that Ulysses-RFSQ can use the past feedback information from similar queries to improve the base algorithm’s performance for future queries. Improvements in MAP, MRP, MRR, and nDCG showed that the proposed method could re-rank the retrieved documents list in a way that can rearrange the relevant documents in the first positions while fetching relevant documents not retrieved by the base IR algorithm. The improvements could be better seen in scenarios in which the base IR algorithm did not achieve great results and while using a larger set of stored queries. For instance, the observed improvements in the MAP results ranged from 0.0384 to 0.0773 for the Preliminary Search corpus — in some cases, more than doubling the baseline’s performance.
id UFPE_071096275dcc8e1fe4a78541a1eb4cdc
oai_identifier_str oai:repositorio.ufpe.br:123456789/67960
network_acronym_str UFPE
network_name_str Repositório Institucional da UFPE
repository_id_str
spelling Ulysses-RFSQ: improving information retrieval through relevance feedback for similar queriesRecuperação de informaçãoFeedback de relevânciaConsultas similaresRe-ranqueamentoDomínio legislativoThe use of Relevance Feedback can enhance the Information Retrieval (IR) performance, but this method is often used only to improve the retrieval for a specific query: the one currently being processed. When there is available relevance information from past searches, this information may be useful to help future searches. If two queries are sufficiently similar, the relevant documents for one may also be relevant for the other. However, only a few studies were found in the literature dealing with this use of relevance information from past queries, as there is a lack of benchmark datasets containing this information for similar queries. In this sense, this study presents Ulysses-RFSQ, a novel IR method that aims to improve the results for future queries by using the Relevance Feedback information from past similar ones. It works by re-ranking the list of documents retrieved by a base IR algorithm through the addition of a bonus or a penalty to the documents’ score. Therefore, it can be used with any algorithm that computes a score for the documents, such as BM25 or Sentence-BERT models. To evaluate the Ulysses-RFSQ method, a Relevance Feedback dataset, called Ulysses RFCorpus, was built together with the Brazilian Chamber of Deputies and made available to the community. Besides Ulysses-RFCorpus, the proposed method was also evaluated in larger dataset (the Preliminary Search corpus) provided by the Chamber, which could not be made available. The method’s evaluation in the legislative scenario is justified by the fact that most of the queries used in the Brazilian legislative process are redundant. As results, the findings pointed out that Ulysses-RFSQ can use the past feedback information from similar queries to improve the base algorithm’s performance for future queries. Improvements in MAP, MRP, MRR, and nDCG showed that the proposed method could re-rank the retrieved documents list in a way that can rearrange the relevant documents in the first positions while fetching relevant documents not retrieved by the base IR algorithm. The improvements could be better seen in scenarios in which the base IR algorithm did not achieve great results and while using a larger set of stored queries. For instance, the observed improvements in the MAP results ranged from 0.0384 to 0.0773 for the Preliminary Search corpus — in some cases, more than doubling the baseline’s performance.OusodoFeedback de Relevância é capaz de aperfeiçoar o desempenho da Recuperação de Informação (RI), mas esse método é comumente utilizado apenas para melhorar o processo de recuperação para a consulta que está correntemente sendo processada. Quando a informação de relevância de buscas passadas está disponível, essa informação pode ser utilizada para aux iliar buscas futuras. Se duas consultas são suficientemente similares, os documentos julgados como relevantes para uma podem também ser relevantes para a outra. Entretanto, poucos estudos foram encontrados na literatura lidando com esse uso da informação de relevância de consultas passadas, pois há uma falta de bases de dados de benchmark contendo essa informação para consultas similares. Dessa forma, este estudo apresenta Ulysses-RFSQ, um novo método de RI que visa aprimorar os resultados para consultas futuras a partir do uso da informação do Feedback de Relevância de buscas passadas similares. Seu funcionamento se dá pelo re-ranqueamento da lista de documentos recuperada por um algoritmo de RI base através da adição de um bônus ou uma penalidade ao escore dos documentos. Assim, esse método pode ser utilizado com qualquer algoritmo que calcule um escore para os documentos, tais como o algoritmo BM25 ou modelos Sentence-BERT. Para avaliar o método Ulysses-RFSQ, uma base de dados de Feedback de Relevância, chamada Ulysses-RFCorpus, foi construída junto com a Câmara dos Deputados brasileira e disponibilizada para a comunidade. Além do Ulysses-RFCorpus, o método proposto também foi avaliado em uma base de dados maior, também fornecida pela Câmara (o corpus da Pesquisa Prévia), a qual não pôde ser disponibi lizada publicamente. A avaliação desse método no cenário legislativo é justificada pelo fato de que a maioria das consultas utilizadas no processo legislativo brasileiro é redundante. Como resultados, os achados apontaram que o Ulysses-RFSQ é capaz de usar a informação de feed back de consultas passadas similares para aprimorar o desempenho do algoritmo base para consultas futuras. Melhorias nas métricas de MAP, MRP, MRR e nDCG mostraram que o método proposto pôde re-ranquear os documentos relevantes nas primeiras posições enquanto recuperava documentos relevantes que não foram recuperados pelo algoritmo de RI base. As melhorias puderam ser melhor observadas em cenários nos quais o algoritmo base não obteve resultados muito bons e utilizando um maior conjunto de consultas passadas armazenadas. Por exemplo, as melhorias observadas nos resultados de MAP variaram de 0,0384 a 0,0773 para o corpus da Pesquisa Prévia — em alguns casos, mais do que dobrando o desempenho do algoritmo utilizado como baseline.Universidade Federal de PernambucoUFPEBrasilPrograma de Pos Graduacao em Ciencia da ComputacaoOLIVEIRA, Adriano Lorena Inácio dePEREIRA, Ellen Polliana Ramos Souzahttp://lattes.cnpq.br/2138402381175111http://lattes.cnpq.br/5194381227316437http://lattes.cnpq.br/6593918610781356https://orcid.org/0000-0003-2285-574Xhttps://orcid.org/0000-0002-7706-4809VITÓRIO, Douglas Álisson Marques de Sá2026-01-28T14:51:34Z2026-01-28T14:51:34Z2025-11-27info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfVITORIO, Douglas Álisson Marques de Sá. Ulysses-RFSQ: improving information retrieval through relevance feedback for similar queries. 2025. Tese (Doutorado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2025.https://repositorio.ufpe.br/handle/123456789/67960enghttps://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPE2026-02-01T19:55:58Zoai:repositorio.ufpe.br:123456789/67960Repositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212026-02-01T19:55:58Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false
dc.title.none.fl_str_mv Ulysses-RFSQ: improving information retrieval through relevance feedback for similar queries
title Ulysses-RFSQ: improving information retrieval through relevance feedback for similar queries
spellingShingle Ulysses-RFSQ: improving information retrieval through relevance feedback for similar queries
VITÓRIO, Douglas Álisson Marques de Sá
Recuperação de informação
Feedback de relevância
Consultas similares
Re-ranqueamento
Domínio legislativo
title_short Ulysses-RFSQ: improving information retrieval through relevance feedback for similar queries
title_full Ulysses-RFSQ: improving information retrieval through relevance feedback for similar queries
title_fullStr Ulysses-RFSQ: improving information retrieval through relevance feedback for similar queries
title_full_unstemmed Ulysses-RFSQ: improving information retrieval through relevance feedback for similar queries
title_sort Ulysses-RFSQ: improving information retrieval through relevance feedback for similar queries
author VITÓRIO, Douglas Álisson Marques de Sá
author_facet VITÓRIO, Douglas Álisson Marques de Sá
author_role author
dc.contributor.none.fl_str_mv OLIVEIRA, Adriano Lorena Inácio de
PEREIRA, Ellen Polliana Ramos Souza
http://lattes.cnpq.br/2138402381175111
http://lattes.cnpq.br/5194381227316437
http://lattes.cnpq.br/6593918610781356
https://orcid.org/0000-0003-2285-574X
https://orcid.org/0000-0002-7706-4809
dc.contributor.author.fl_str_mv VITÓRIO, Douglas Álisson Marques de Sá
dc.subject.por.fl_str_mv Recuperação de informação
Feedback de relevância
Consultas similares
Re-ranqueamento
Domínio legislativo
topic Recuperação de informação
Feedback de relevância
Consultas similares
Re-ranqueamento
Domínio legislativo
description The use of Relevance Feedback can enhance the Information Retrieval (IR) performance, but this method is often used only to improve the retrieval for a specific query: the one currently being processed. When there is available relevance information from past searches, this information may be useful to help future searches. If two queries are sufficiently similar, the relevant documents for one may also be relevant for the other. However, only a few studies were found in the literature dealing with this use of relevance information from past queries, as there is a lack of benchmark datasets containing this information for similar queries. In this sense, this study presents Ulysses-RFSQ, a novel IR method that aims to improve the results for future queries by using the Relevance Feedback information from past similar ones. It works by re-ranking the list of documents retrieved by a base IR algorithm through the addition of a bonus or a penalty to the documents’ score. Therefore, it can be used with any algorithm that computes a score for the documents, such as BM25 or Sentence-BERT models. To evaluate the Ulysses-RFSQ method, a Relevance Feedback dataset, called Ulysses RFCorpus, was built together with the Brazilian Chamber of Deputies and made available to the community. Besides Ulysses-RFCorpus, the proposed method was also evaluated in larger dataset (the Preliminary Search corpus) provided by the Chamber, which could not be made available. The method’s evaluation in the legislative scenario is justified by the fact that most of the queries used in the Brazilian legislative process are redundant. As results, the findings pointed out that Ulysses-RFSQ can use the past feedback information from similar queries to improve the base algorithm’s performance for future queries. Improvements in MAP, MRP, MRR, and nDCG showed that the proposed method could re-rank the retrieved documents list in a way that can rearrange the relevant documents in the first positions while fetching relevant documents not retrieved by the base IR algorithm. The improvements could be better seen in scenarios in which the base IR algorithm did not achieve great results and while using a larger set of stored queries. For instance, the observed improvements in the MAP results ranged from 0.0384 to 0.0773 for the Preliminary Search corpus — in some cases, more than doubling the baseline’s performance.
publishDate 2025
dc.date.none.fl_str_mv 2025-11-27
2026-01-28T14:51:34Z
2026-01-28T14:51:34Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv VITORIO, Douglas Álisson Marques de Sá. Ulysses-RFSQ: improving information retrieval through relevance feedback for similar queries. 2025. Tese (Doutorado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2025.
https://repositorio.ufpe.br/handle/123456789/67960
identifier_str_mv VITORIO, Douglas Álisson Marques de Sá. Ulysses-RFSQ: improving information retrieval through relevance feedback for similar queries. 2025. Tese (Doutorado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2025.
url https://repositorio.ufpe.br/handle/123456789/67960
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv https://creativecommons.org/licenses/by-nc-nd/4.0/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-nd/4.0/
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Federal de Pernambuco
UFPE
Brasil
Programa de Pos Graduacao em Ciencia da Computacao
publisher.none.fl_str_mv Universidade Federal de Pernambuco
UFPE
Brasil
Programa de Pos Graduacao em Ciencia da Computacao
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFPE
instname:Universidade Federal de Pernambuco (UFPE)
instacron:UFPE
instname_str Universidade Federal de Pernambuco (UFPE)
instacron_str UFPE
institution UFPE
reponame_str Repositório Institucional da UFPE
collection Repositório Institucional da UFPE
repository.name.fl_str_mv Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)
repository.mail.fl_str_mv attena@ufpe.br
_version_ 1856041970751766528