Characterization and identification of synonyms on anonymous social networks
| Ano de defesa: | 2019 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Tese |
| Tipo de acesso: | Acesso aberto |
| Idioma: | eng |
| Instituição de defesa: |
Universidade Federal do Rio de Janeiro
Brasil Instituto Alberto Luiz Coimbra de Pós-Graduação e Pesquisa de Engenharia Programa de Pós-Graduação em Engenharia de Sistemas e Computação UFRJ |
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | http://hdl.handle.net/11422/13547 |
Resumo: | In many scenarios objects are referred to using multiple labels and this diversity leads to ambiguities. Addressing name ambiguity is an important step in data consolidation and with the growth in the amount of digital data has become even more pressing. Moreover, the growing privacy concerns among individuals and enterprises is leading to the removal of personally identifiable information (PII) in data that is publicly available. In this work, we focus on the characterization and identification of synonyms in anonymous social networks where only the network structure is considered, all PII has been discarded. The main contributions of this thesis are to classify name usage patterns by individuals that use multiple names, to propose a probabilistic model for synonyms in social networks, and to propose algorithms to identify synonyms in anonymous social networks. The first algorithm considers distance between nodes and number of common neighbors to identify synonyms in a social network. The second algorithm considers ego-centered collaboration networks and identifies the different nodes that correspond to the egonet owner. The algorithm is based on the dominating set and independent set problems in graphs. The last algorithm is a framework that classifies nodes as having duplicates in social networks. This algorithm extracts subgraphs to generate features for nodes that are then used as input to a two-level neural network designed specifically for this problem. Real collaboration networks, extracted from DBLP and Google Scholar, as well as familial networks are used to evaluate the proposed algorithms. Experimental results indicate that synonyms can effectively be identified even on anonymous social networks leveraging only network structure. |
| id |
UFRJ_3c805ae8d15037ee4c6cdcd8e44a05e2 |
|---|---|
| oai_identifier_str |
oai:pantheon.ufrj.br:11422/13547 |
| network_acronym_str |
UFRJ |
| network_name_str |
Repositório Institucional da UFRJ |
| repository_id_str |
|
| spelling |
Characterization and identification of synonyms on anonymous social networksCaracterização e identificação de sinônimos em redes sociais anônimasAmbiguitySocial networksAlgorithmCNPQ::ENGENHARIASIn many scenarios objects are referred to using multiple labels and this diversity leads to ambiguities. Addressing name ambiguity is an important step in data consolidation and with the growth in the amount of digital data has become even more pressing. Moreover, the growing privacy concerns among individuals and enterprises is leading to the removal of personally identifiable information (PII) in data that is publicly available. In this work, we focus on the characterization and identification of synonyms in anonymous social networks where only the network structure is considered, all PII has been discarded. The main contributions of this thesis are to classify name usage patterns by individuals that use multiple names, to propose a probabilistic model for synonyms in social networks, and to propose algorithms to identify synonyms in anonymous social networks. The first algorithm considers distance between nodes and number of common neighbors to identify synonyms in a social network. The second algorithm considers ego-centered collaboration networks and identifies the different nodes that correspond to the egonet owner. The algorithm is based on the dominating set and independent set problems in graphs. The last algorithm is a framework that classifies nodes as having duplicates in social networks. This algorithm extracts subgraphs to generate features for nodes that are then used as input to a two-level neural network designed specifically for this problem. Real collaboration networks, extracted from DBLP and Google Scholar, as well as familial networks are used to evaluate the proposed algorithms. Experimental results indicate that synonyms can effectively be identified even on anonymous social networks leveraging only network structure.Em muitos cenários objetos são referenciados por meio de vários nomes e essa diversidade de nomes gera ambiguidade. Abordar o problema de ambigüidade de nome é um passo importante na consolidação de dados e com o crescimento da quantidade de dados digitais, tornou-se indispensável. Além disso, o contínuo aumento da preocupação com privacidade por parte de indivíduos e empresas está alterando a forma como os dados ficam disponíveis. Em particular, a remoção de informações pessoalmente identificáveis (PII) está se tornando uma prática comum. Nesse trabalho é feita a caracterização e identificação de sinônimos em redes sociais anônimas e somente a estrutura da rede é considerada, toda PII foi removida. As principais contribuições desta tese são classificar os padrões de uso de diferentes nomes pelos indivíduos que possuem múltiplos nomes, propor um modelo probabilístico para sinônimos em redes sociais, e propor algoritmos para identificar sinônimos em redes sociais anônimas. O primeiro algoritmo considera distância entre nós e número de vizinhos em comum para identificar sinônimos em uma rede social. O segundo algoritmo considera perfis de indivíduos em redes de colaboração e identifica diferentes nós que correspondem ao dono do perfil. O algoritmo é baseado no problema do conjunto dominante e conjunto independente em grafos. O último algoritmo é um framework que classifica nós como tendo duplicatas em redes sociais. Esse algorithmo extrai subgrafos para gerar as características que são utilizadas como entrada para rede neural de dois níveis, projetada especificamente para esse problema. Resultados experimentais indicam que sinonimos podem ser efetivamente identificados mesmo em redes sociais anônimas considerando apenas a estrutura da rede.Universidade Federal do Rio de JaneiroBrasilInstituto Alberto Luiz Coimbra de Pós-Graduação e Pesquisa de EngenhariaPrograma de Pós-Graduação em Engenharia de Sistemas e ComputaçãoUFRJFigueiredo, Daniel Rattonhttp://lattes.cnpq.br/3621433615334969Zaverucha, GersonBarbosa, Valmir CarneiroLaender, Alberto Henrique FradeRibeiro, Bruno Felisberto MartinsGomes, Janaína Sant’Anna Gomide2021-01-22T00:10:26Z2023-12-21T03:07:21Z2019-05info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesishttp://hdl.handle.net/11422/13547enginfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFRJinstname:Universidade Federal do Rio de Janeiro (UFRJ)instacron:UFRJ2023-12-21T03:07:21Zoai:pantheon.ufrj.br:11422/13547Repositório InstitucionalPUBhttp://www.pantheon.ufrj.br/oai/requestpantheon@sibi.ufrj.bropendoar:2023-12-21T03:07:21Repositório Institucional da UFRJ - Universidade Federal do Rio de Janeiro (UFRJ)false |
| dc.title.none.fl_str_mv |
Characterization and identification of synonyms on anonymous social networks Caracterização e identificação de sinônimos em redes sociais anônimas |
| title |
Characterization and identification of synonyms on anonymous social networks |
| spellingShingle |
Characterization and identification of synonyms on anonymous social networks Gomes, Janaína Sant’Anna Gomide Ambiguity Social networks Algorithm CNPQ::ENGENHARIAS |
| title_short |
Characterization and identification of synonyms on anonymous social networks |
| title_full |
Characterization and identification of synonyms on anonymous social networks |
| title_fullStr |
Characterization and identification of synonyms on anonymous social networks |
| title_full_unstemmed |
Characterization and identification of synonyms on anonymous social networks |
| title_sort |
Characterization and identification of synonyms on anonymous social networks |
| author |
Gomes, Janaína Sant’Anna Gomide |
| author_facet |
Gomes, Janaína Sant’Anna Gomide |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Figueiredo, Daniel Ratton http://lattes.cnpq.br/3621433615334969 Zaverucha, Gerson Barbosa, Valmir Carneiro Laender, Alberto Henrique Frade Ribeiro, Bruno Felisberto Martins |
| dc.contributor.author.fl_str_mv |
Gomes, Janaína Sant’Anna Gomide |
| dc.subject.por.fl_str_mv |
Ambiguity Social networks Algorithm CNPQ::ENGENHARIAS |
| topic |
Ambiguity Social networks Algorithm CNPQ::ENGENHARIAS |
| description |
In many scenarios objects are referred to using multiple labels and this diversity leads to ambiguities. Addressing name ambiguity is an important step in data consolidation and with the growth in the amount of digital data has become even more pressing. Moreover, the growing privacy concerns among individuals and enterprises is leading to the removal of personally identifiable information (PII) in data that is publicly available. In this work, we focus on the characterization and identification of synonyms in anonymous social networks where only the network structure is considered, all PII has been discarded. The main contributions of this thesis are to classify name usage patterns by individuals that use multiple names, to propose a probabilistic model for synonyms in social networks, and to propose algorithms to identify synonyms in anonymous social networks. The first algorithm considers distance between nodes and number of common neighbors to identify synonyms in a social network. The second algorithm considers ego-centered collaboration networks and identifies the different nodes that correspond to the egonet owner. The algorithm is based on the dominating set and independent set problems in graphs. The last algorithm is a framework that classifies nodes as having duplicates in social networks. This algorithm extracts subgraphs to generate features for nodes that are then used as input to a two-level neural network designed specifically for this problem. Real collaboration networks, extracted from DBLP and Google Scholar, as well as familial networks are used to evaluate the proposed algorithms. Experimental results indicate that synonyms can effectively be identified even on anonymous social networks leveraging only network structure. |
| publishDate |
2019 |
| dc.date.none.fl_str_mv |
2019-05 2021-01-22T00:10:26Z 2023-12-21T03:07:21Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
| format |
doctoralThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/11422/13547 |
| url |
http://hdl.handle.net/11422/13547 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.publisher.none.fl_str_mv |
Universidade Federal do Rio de Janeiro Brasil Instituto Alberto Luiz Coimbra de Pós-Graduação e Pesquisa de Engenharia Programa de Pós-Graduação em Engenharia de Sistemas e Computação UFRJ |
| publisher.none.fl_str_mv |
Universidade Federal do Rio de Janeiro Brasil Instituto Alberto Luiz Coimbra de Pós-Graduação e Pesquisa de Engenharia Programa de Pós-Graduação em Engenharia de Sistemas e Computação UFRJ |
| dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFRJ instname:Universidade Federal do Rio de Janeiro (UFRJ) instacron:UFRJ |
| instname_str |
Universidade Federal do Rio de Janeiro (UFRJ) |
| instacron_str |
UFRJ |
| institution |
UFRJ |
| reponame_str |
Repositório Institucional da UFRJ |
| collection |
Repositório Institucional da UFRJ |
| repository.name.fl_str_mv |
Repositório Institucional da UFRJ - Universidade Federal do Rio de Janeiro (UFRJ) |
| repository.mail.fl_str_mv |
pantheon@sibi.ufrj.br |
| _version_ |
1831773709350731776 |