Um método automático para estimativa da qualidade de enciclopédias colaborativas on-line: um estudo de caso sobre a wikipédia

Detalhes bibliográficos
Ano de defesa: 2009
Autor(a) principal: Daniel Hasan Dalip
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Minas Gerais
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://hdl.handle.net/1843/SLSS-7WJN62
Resumo: The old dream of a universal repository containing all the human knowledge and culture is becoming possible through the Internet and the Web. Moreover, this is happening with the direct collaborative, participation of people. Wikipedia is a great example. It is an enormous repository of information with free access and edition, created by the community in a collaborative manner. However, this large amount of information, made available democratically and virtually without any control, raises questions about its relative quality. In this work we explore a significant number of quality indicators, some of them proposed by us and used here for the first time, and study their capability to assess the quality of Wikipedia articles. Furthermore, we explore machine learning techniques to combine these quality indicators into one single assessment judgment. Through experiments, we show that the most important quality indicators are the easiest ones to extract on a open digital library, namely, textual features related to length, structure and style. We were also able to determine which indicators did not contribute significantly to the quality assessment. These were, coincidentally, the most complex features, such as those based on link analysis. Finally, we compare our combination method with state-of-the-art solutions and show significant improvements in terms of effective quality prediction.
id UFMG_e3c291b72b0e3eee6efea3088654024e
oai_identifier_str oai:repositorio.ufmg.br:1843/SLSS-7WJN62
network_acronym_str UFMG
network_name_str Repositório Institucional da UFMG
repository_id_str
spelling Um método automático para estimativa da qualidade de enciclopédias colaborativas on-line: um estudo de caso sobre a wikipédiaBibliotecas digitaisRecuperação de informaçãoBibliotecas digitaisrecuperação de informaçãoThe old dream of a universal repository containing all the human knowledge and culture is becoming possible through the Internet and the Web. Moreover, this is happening with the direct collaborative, participation of people. Wikipedia is a great example. It is an enormous repository of information with free access and edition, created by the community in a collaborative manner. However, this large amount of information, made available democratically and virtually without any control, raises questions about its relative quality. In this work we explore a significant number of quality indicators, some of them proposed by us and used here for the first time, and study their capability to assess the quality of Wikipedia articles. Furthermore, we explore machine learning techniques to combine these quality indicators into one single assessment judgment. Through experiments, we show that the most important quality indicators are the easiest ones to extract on a open digital library, namely, textual features related to length, structure and style. We were also able to determine which indicators did not contribute significantly to the quality assessment. These were, coincidentally, the most complex features, such as those based on link analysis. Finally, we compare our combination method with state-of-the-art solutions and show significant improvements in terms of effective quality prediction.Universidade Federal de Minas Gerais2019-08-11T06:44:45Z2025-09-09T01:01:22Z2019-08-11T06:44:45Z2009-04-03info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/1843/SLSS-7WJN62Daniel Hasan Dalipinfo:eu-repo/semantics/openAccessporreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMG2025-09-09T01:01:22Zoai:repositorio.ufmg.br:1843/SLSS-7WJN62Repositório InstitucionalPUBhttps://repositorio.ufmg.br/oairepositorio@ufmg.bropendoar:2025-09-09T01:01:22Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false
dc.title.none.fl_str_mv Um método automático para estimativa da qualidade de enciclopédias colaborativas on-line: um estudo de caso sobre a wikipédia
title Um método automático para estimativa da qualidade de enciclopédias colaborativas on-line: um estudo de caso sobre a wikipédia
spellingShingle Um método automático para estimativa da qualidade de enciclopédias colaborativas on-line: um estudo de caso sobre a wikipédia
Daniel Hasan Dalip
Bibliotecas digitais
Recuperação de informação
Bibliotecas digitais
recuperação de informação
title_short Um método automático para estimativa da qualidade de enciclopédias colaborativas on-line: um estudo de caso sobre a wikipédia
title_full Um método automático para estimativa da qualidade de enciclopédias colaborativas on-line: um estudo de caso sobre a wikipédia
title_fullStr Um método automático para estimativa da qualidade de enciclopédias colaborativas on-line: um estudo de caso sobre a wikipédia
title_full_unstemmed Um método automático para estimativa da qualidade de enciclopédias colaborativas on-line: um estudo de caso sobre a wikipédia
title_sort Um método automático para estimativa da qualidade de enciclopédias colaborativas on-line: um estudo de caso sobre a wikipédia
author Daniel Hasan Dalip
author_facet Daniel Hasan Dalip
author_role author
dc.contributor.author.fl_str_mv Daniel Hasan Dalip
dc.subject.por.fl_str_mv Bibliotecas digitais
Recuperação de informação
Bibliotecas digitais
recuperação de informação
topic Bibliotecas digitais
Recuperação de informação
Bibliotecas digitais
recuperação de informação
description The old dream of a universal repository containing all the human knowledge and culture is becoming possible through the Internet and the Web. Moreover, this is happening with the direct collaborative, participation of people. Wikipedia is a great example. It is an enormous repository of information with free access and edition, created by the community in a collaborative manner. However, this large amount of information, made available democratically and virtually without any control, raises questions about its relative quality. In this work we explore a significant number of quality indicators, some of them proposed by us and used here for the first time, and study their capability to assess the quality of Wikipedia articles. Furthermore, we explore machine learning techniques to combine these quality indicators into one single assessment judgment. Through experiments, we show that the most important quality indicators are the easiest ones to extract on a open digital library, namely, textual features related to length, structure and style. We were also able to determine which indicators did not contribute significantly to the quality assessment. These were, coincidentally, the most complex features, such as those based on link analysis. Finally, we compare our combination method with state-of-the-art solutions and show significant improvements in terms of effective quality prediction.
publishDate 2009
dc.date.none.fl_str_mv 2009-04-03
2019-08-11T06:44:45Z
2019-08-11T06:44:45Z
2025-09-09T01:01:22Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/1843/SLSS-7WJN62
url https://hdl.handle.net/1843/SLSS-7WJN62
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Federal de Minas Gerais
publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFMG
instname:Universidade Federal de Minas Gerais (UFMG)
instacron:UFMG
instname_str Universidade Federal de Minas Gerais (UFMG)
instacron_str UFMG
institution UFMG
reponame_str Repositório Institucional da UFMG
collection Repositório Institucional da UFMG
repository.name.fl_str_mv Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv repositorio@ufmg.br
_version_ 1856414079980142592