Answering differentially private multi-dimensional queries

Detalhes bibliográficos
Ano de defesa: 2025
Autor(a) principal: Costa Filho, José Serafim da
Orientador(a): Machado, Javam de Castro
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Não Informado pela instituição
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Área do conhecimento CNPq:
Link de acesso: http://repositorio.ufc.br/handle/riufc/83226
Resumo: Providing privacy-preserving answers to multi-dimensional range queries is a critical problem that has attracted significant attention in recent years, given that range queries constitute a fundamental operation in data analysis. However, four principal technical challenges remain: (i) effectively capturing correlations among attributes, (ii) mitigating the curse of dimensionality, (iii) handling large attribute domains, and (iv) accommodating heterogeneous user privacy requirements. Existing methods fail to comprehensively address all these challenges. We build our approach on the idea of using multi-dimensional grids. Specifically, users’ data are mapped onto grid structures, which are then perturbed to ensure privacy before being transmitted to an aggregator. The aggregator utilizes the perturbed grid information to estimate the underlying data distribution and subsequently answer range queries. There exists a trade-off in grid granularity: finer grids amplify noise-induced error, whereas coarser grids introduce bias-induced error. To overcome these limitations, we propose a grid construction optimization that considers multiple factors to enhance accuracy. In addition, we build a correlation model to determine how attributes are correlated, enabling the use of fewer, strategically constructed grids, which improves the signal-to-noise ratio. Also, we incorporate a novel optimization procedure that accounts for workload-specific characteristics. This step finds the user-to-grid assignment that minimizes the total expected error. Finally, we combine different differentially private estimators to improve the accuracy when answering multi-dimensional range queries. We validate our approach through extensive experiments on both real-world and synthetic datasets, demonstrating that our method significantly outperforms existing state-of-the-art techniques.
id UFC-7_4e3feaf8d6b4369adc9853c597190dd0
oai_identifier_str oai:repositorio.ufc.br:riufc/83226
network_acronym_str UFC-7
network_name_str Repositório Institucional da Universidade Federal do Ceará (UFC)
repository_id_str
spelling Costa Filho, José Serafim daMachado, Javam de Castro2025-10-28T13:34:02Z2025-10-28T13:34:02Z2025COSTA FILHO, José Serafim da. Answering differentially private multi-dimensional queries. 2025. 150 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2025.http://repositorio.ufc.br/handle/riufc/83226Providing privacy-preserving answers to multi-dimensional range queries is a critical problem that has attracted significant attention in recent years, given that range queries constitute a fundamental operation in data analysis. However, four principal technical challenges remain: (i) effectively capturing correlations among attributes, (ii) mitigating the curse of dimensionality, (iii) handling large attribute domains, and (iv) accommodating heterogeneous user privacy requirements. Existing methods fail to comprehensively address all these challenges. We build our approach on the idea of using multi-dimensional grids. Specifically, users’ data are mapped onto grid structures, which are then perturbed to ensure privacy before being transmitted to an aggregator. The aggregator utilizes the perturbed grid information to estimate the underlying data distribution and subsequently answer range queries. There exists a trade-off in grid granularity: finer grids amplify noise-induced error, whereas coarser grids introduce bias-induced error. To overcome these limitations, we propose a grid construction optimization that considers multiple factors to enhance accuracy. In addition, we build a correlation model to determine how attributes are correlated, enabling the use of fewer, strategically constructed grids, which improves the signal-to-noise ratio. Also, we incorporate a novel optimization procedure that accounts for workload-specific characteristics. This step finds the user-to-grid assignment that minimizes the total expected error. Finally, we combine different differentially private estimators to improve the accuracy when answering multi-dimensional range queries. We validate our approach through extensive experiments on both real-world and synthetic datasets, demonstrating that our method significantly outperforms existing state-of-the-art techniques.Fornecer respostas com garantia de privacidade para consultas por intervalo em múltiplas dimensões é um problema fundamental que tem atraído significativa atenção nos últimos anos, dado que esse tipo de consulta constitui uma operação central em análise de dados. No entanto, quatro principais desafios técnicos permanecem: (i) capturar de forma eficaz as correlações entre atributos, (ii) mitigar a maldição da dimensionalidade, (iii) lidar com domínios de atributos extensos e (iv) acomodar requisitos heterogêneos de privacidade entre os usuários. Métodos existentes falham em abordar todos esses desafios de maneira abrangente. Nossa abordagem baseia-se na ideia de utilizar grades multidimensionais. Especificamente, os dados dos usuários são mapeados em estruturas de grade, que são então perturbadas para garantir privacidade antes de serem transmitidas a um agregador. O agregador utiliza as informações perturbadas das grades para estimar a distribuição subjacente dos dados e, em seguida, responder às consultas por intervalo. Existe um trade-off inerente à granularidade da grade: grades mais finas amplificam o erro induzido pelo ruído, enquanto grades mais grosseiras introduzem erro por viés. Para superar essas limitações, propomos uma otimização na construção das grades que considera múltiplos fatores com o objetivo de melhorar a acurácia. Além disso, desenvolvemos um modelo de correlação para identificar como os atributos se relacionam, permitindo a utilização de um número reduzido de grades estrategicamente construídas, o que melhora a razão sinal-ruído. Também incorporamos um novo procedimento de otimização que leva em conta características específicas da carga de trabalho. Esta etapa determina a atribuição ideal de usuários às grades, minimizando o erro esperado total. Por fim, combinamos diferentes estimadores com garantia diferencial de privacidade para melhorar a precisão na resposta a consultas por intervalo multidimensionais. Validamos nossa abordagem por meio de extensos experimentos em conjuntos de dados reais e sintéticos, demonstrando que nosso método supera significativamente as técnicas mais avançadas disponíveis na literatura.Answering differentially private multi-dimensional queriesAnswering differentially private multi-dimensional queriesinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisPrivacidade diferencialDados multidimensionaisCorrelações entre atributosResposta a consultasDifferential privacyMulti-dimensional dataAttribute correlationsQuery answeringCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOinfo:eu-repo/semantics/openAccessengreponame:Repositório Institucional da Universidade Federal do Ceará (UFC)instname:Universidade Federal do Ceará (UFC)instacron:UFC0000-0002-8452-1975http://lattes.cnpq.br/5014333194028146https://orcid.org/0000-0002-8430-9421http://lattes.cnpq.br/98849805189862252025-10-28LICENSElicense.txtlicense.txttext/plain; charset=utf-81748http://repositorio.ufc.br/bitstream/riufc/83226/4/license.txt8a4605be74aa9ea9d79846c1fba20a33MD54ORIGINAL2025_tese_jscostafilho.pdf2025_tese_jscostafilho.pdfapplication/pdf2798678http://repositorio.ufc.br/bitstream/riufc/83226/3/2025_tese_jscostafilho.pdf26eb67f0151b7cbe3c4979991d93e4c6MD53riufc/832262025-10-28 10:34:03.394oai:repositorio.ufc.br:riufc/83226Tk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=Repositório InstitucionalPUBhttp://www.repositorio.ufc.br/ri-oai/requestbu@ufc.br || repositorio@ufc.bropendoar:2025-10-28T13:34:03Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)false
dc.title.pt_BR.fl_str_mv Answering differentially private multi-dimensional queries
dc.title.en.pt_BR.fl_str_mv Answering differentially private multi-dimensional queries
title Answering differentially private multi-dimensional queries
spellingShingle Answering differentially private multi-dimensional queries
Costa Filho, José Serafim da
CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
Privacidade diferencial
Dados multidimensionais
Correlações entre atributos
Resposta a consultas
Differential privacy
Multi-dimensional data
Attribute correlations
Query answering
title_short Answering differentially private multi-dimensional queries
title_full Answering differentially private multi-dimensional queries
title_fullStr Answering differentially private multi-dimensional queries
title_full_unstemmed Answering differentially private multi-dimensional queries
title_sort Answering differentially private multi-dimensional queries
author Costa Filho, José Serafim da
author_facet Costa Filho, José Serafim da
author_role author
dc.contributor.author.fl_str_mv Costa Filho, José Serafim da
dc.contributor.advisor1.fl_str_mv Machado, Javam de Castro
contributor_str_mv Machado, Javam de Castro
dc.subject.cnpq.fl_str_mv CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
topic CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
Privacidade diferencial
Dados multidimensionais
Correlações entre atributos
Resposta a consultas
Differential privacy
Multi-dimensional data
Attribute correlations
Query answering
dc.subject.ptbr.pt_BR.fl_str_mv Privacidade diferencial
Dados multidimensionais
Correlações entre atributos
Resposta a consultas
dc.subject.en.pt_BR.fl_str_mv Differential privacy
Multi-dimensional data
Attribute correlations
Query answering
description Providing privacy-preserving answers to multi-dimensional range queries is a critical problem that has attracted significant attention in recent years, given that range queries constitute a fundamental operation in data analysis. However, four principal technical challenges remain: (i) effectively capturing correlations among attributes, (ii) mitigating the curse of dimensionality, (iii) handling large attribute domains, and (iv) accommodating heterogeneous user privacy requirements. Existing methods fail to comprehensively address all these challenges. We build our approach on the idea of using multi-dimensional grids. Specifically, users’ data are mapped onto grid structures, which are then perturbed to ensure privacy before being transmitted to an aggregator. The aggregator utilizes the perturbed grid information to estimate the underlying data distribution and subsequently answer range queries. There exists a trade-off in grid granularity: finer grids amplify noise-induced error, whereas coarser grids introduce bias-induced error. To overcome these limitations, we propose a grid construction optimization that considers multiple factors to enhance accuracy. In addition, we build a correlation model to determine how attributes are correlated, enabling the use of fewer, strategically constructed grids, which improves the signal-to-noise ratio. Also, we incorporate a novel optimization procedure that accounts for workload-specific characteristics. This step finds the user-to-grid assignment that minimizes the total expected error. Finally, we combine different differentially private estimators to improve the accuracy when answering multi-dimensional range queries. We validate our approach through extensive experiments on both real-world and synthetic datasets, demonstrating that our method significantly outperforms existing state-of-the-art techniques.
publishDate 2025
dc.date.accessioned.fl_str_mv 2025-10-28T13:34:02Z
dc.date.available.fl_str_mv 2025-10-28T13:34:02Z
dc.date.issued.fl_str_mv 2025
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv COSTA FILHO, José Serafim da. Answering differentially private multi-dimensional queries. 2025. 150 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2025.
dc.identifier.uri.fl_str_mv http://repositorio.ufc.br/handle/riufc/83226
identifier_str_mv COSTA FILHO, José Serafim da. Answering differentially private multi-dimensional queries. 2025. 150 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2025.
url http://repositorio.ufc.br/handle/riufc/83226
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositório Institucional da Universidade Federal do Ceará (UFC)
instname:Universidade Federal do Ceará (UFC)
instacron:UFC
instname_str Universidade Federal do Ceará (UFC)
instacron_str UFC
institution UFC
reponame_str Repositório Institucional da Universidade Federal do Ceará (UFC)
collection Repositório Institucional da Universidade Federal do Ceará (UFC)
bitstream.url.fl_str_mv http://repositorio.ufc.br/bitstream/riufc/83226/4/license.txt
http://repositorio.ufc.br/bitstream/riufc/83226/3/2025_tese_jscostafilho.pdf
bitstream.checksum.fl_str_mv 8a4605be74aa9ea9d79846c1fba20a33
26eb67f0151b7cbe3c4979991d93e4c6
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)
repository.mail.fl_str_mv bu@ufc.br || repositorio@ufc.br
_version_ 1847793071957213184