Uso de Seleção de Características da Wikipedia na Classificação Automática de Textos.

Alvarenga, Leonel Diógenes Carvalhaes

Uso de Seleção de Características da Wikipedia na Classificação Automática de Textos.

Detalhes bibliográficos
Ano de defesa:	2012
Autor(a) principal:	Alvarenga, Leonel Diógenes Carvalhaes
Orientador(a):	Rosa, Thierson Couto
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
dARK ID:	ark:/38995/0013000007vhv
Idioma:	por
Instituição de defesa:	Universidade Federal de Goiás
Programa de Pós-Graduação:	Programa de Pós Graduação em Ciência da Computação (INF)
Departamento:	Instituto de Informática (INF)
País:	Brasil
Palavras-chave em Português:	Recuperação de informação classificaçao de textos seleçao de caracteristicas expansao de documentos aprendizado de maquina
Palavras-chave em Inglês:	Information retrieval text classification feature selection document expansion machine learning
Área do conhecimento CNPq:	CIENCIA DA COMPUTACAO::SISTEMAS DE COMPUTACAO
Link de acesso:	http://repositorio.bc.ufg.br/tede/handle/tde/2870
Resumo:	The traditional methods of text classification typically represent documents only as a set of words, also known as "Bag of Words"(BOW). Several studies have shown good results on making use of thesauri and encyclopedias as external information sources, aiming to expand the BOW representation by the identification of synonymy and hyponymy relationships between present terms in a document collection. However, the expansion process may introduce terms that lead to an erroneous classification. In this paper, we propose the use of feature selection measures in order to select features extracted from Wikipedia in order to improve the efectiveness of the expansion process. The study also proposes a feature selection measure called Tendency Factor to One Category (TF1C), so that the experiments showed that this measure proves to be competitive with the other measures Information Gain, Gain Ratio and Chisquared, in the process, delivering the best gains in microF1 and macroF1, in most experiments. The full use of features selected in this process showed to be more stable in assisting the classification, while it showed lower performance on restricting its insertion only to documents of the classes in which these features are well punctuated by the selection measures. When applied in the Reuters-21578, Ohsumed first - 20000 and 20Newsgroups collections, our approach to feature selection allowed the reduction of noise insertion inherent in the expansion process, and improved the results of use hyponyms, and demonstrated that the synonym relationship from Wikipedia can also be used in the document expansion, increasing the efectiveness of the automatic text classification.

Metadados do item

id	UFG-2_3eb627c69c9784d3edcd0188d2a3608a
oai_identifier_str	oai:repositorio.bc.ufg.br:tde/2870
network_acronym_str	UFG-2
network_name_str	Repositório Institucional da UFG
repository_id_str
spelling	Rosa, Thierson Coutohttp://lattes.cnpq.br/4414718560764818http://lattes.cnpq.br/9542541522845372Alvarenga, Leonel Diógenes Carvalhaes2014-07-31T14:43:10Z2012-09-20ALVARENGA, Leonel Diógenes Carvalhaes. Uso de Seleção de Características da Wikipedia na Classificação Automática de Textos. 2012. 114 f. - Dissertação (Mestrado em) - Universidade Federal de Goiás, Goiânia, 2012http://repositorio.bc.ufg.br/tede/handle/tde/2870ark:/38995/0013000007vhvThe traditional methods of text classification typically represent documents only as a set of words, also known as "Bag of Words"(BOW). Several studies have shown good results on making use of thesauri and encyclopedias as external information sources, aiming to expand the BOW representation by the identification of synonymy and hyponymy relationships between present terms in a document collection. However, the expansion process may introduce terms that lead to an erroneous classification. In this paper, we propose the use of feature selection measures in order to select features extracted from Wikipedia in order to improve the efectiveness of the expansion process. The study also proposes a feature selection measure called Tendency Factor to One Category (TF1C), so that the experiments showed that this measure proves to be competitive with the other measures Information Gain, Gain Ratio and Chisquared, in the process, delivering the best gains in microF1 and macroF1, in most experiments. The full use of features selected in this process showed to be more stable in assisting the classification, while it showed lower performance on restricting its insertion only to documents of the classes in which these features are well punctuated by the selection measures. When applied in the Reuters-21578, Ohsumed first - 20000 and 20Newsgroups collections, our approach to feature selection allowed the reduction of noise insertion inherent in the expansion process, and improved the results of use hyponyms, and demonstrated that the synonym relationship from Wikipedia can also be used in the document expansion, increasing the efectiveness of the automatic text classification.Os métodos tradicionais de classificação de textos normalmente representam documentos apenas como um conjunto de palavras, também conhecido como BOW (do inglês, Bag of Words). Vários estudos têm mostrado bons resultados ao utilizar-se de tesauros e enciclopédias como fontes externas de informações, objetivando expandir a representação BOW a partir da identificação de relacionamentos de sinonômia e hiponômia entre os termos presentes em uma coleção de documentos. Todavia, o processo de expansão pode introduzir termos que conduzam a uma classificação errônea do documento. No presente trabalho, propõe-se a aplicação de medidas de avaliação de termos para a seleção de características extraídas da Wikipédia, com o objetivo de melhorar a eficácia de sua utilização durante o processo de expansão de documentos. O estudo também propõe uma medida de seleção de características denominada Fator de Tendência a uma Categoria (FT1C), de modo que os experimentos realizados demonstraram que esta medida apresenta desempenho competitivo com as medidas Information Gain, Gain Ratio e Chi-squared, neste processo, apresentando os melhores ganhos de microF1 e macroF1, na maioria dos experimentos realizados. O uso integral das características selecionadas neste processo, demonstrou auxiliar a classificação de forma mais estável, ao passo que apresentou menor desempenho ao se restringir sua inserção somente aos documentos das classes em que estas características são bem pontuadas pelas medidas de seleção. Ao ser aplicada nas coleções Reuters-21578, Ohsumed rst-20000 e 20Newsgroups, a abordagem com seleção de características permitiu a redução da inserção de ruídos inerentes do processo de expansão e potencializou o uso de hipônimos, assim como demonstrou que as relações de sinonômia da Wikipédia também podem ser utilizadas na expansão de documentos, elevando a eficácia da classificação automática de textos.Fundação de Amparo à Pesquisa do Estado de Goiás - FAPEGapplication/pdfhttp://repositorio.bc.ufg.br/tede/retrieve/5859/uso_de_selecao_de_caracteristicas_da_wikipedia_na_classificacao_automatica_de_textos.pdf.jpgporUniversidade Federal de GoiásPrograma de Pós Graduação em Ciência da Computação (INF)UFGBrasilInstituto de Informática (INF)[1] Amati, G.; D'Aloisi, D.; Giannini, V.; Ubaldini, F. A Framework for Filtering News and Managing Distributed Data. Journal Of Universal Computer Science, 3(8):1007{1021, 1997. [2] Apt e, C.; Damerau, F.; Weiss, S. M. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12(3):233{251, July 1994. [3] Baeza-Yates, R.; Ribeiro-Neto, B. Modern information retrieval. ACM Press, New York, New York, USA, 1999. [4] Bekkerman, R.; Allan, J. Using Bigrams in Text Categorization. Department of Computer Science, University of Massachusetts, Amherst, 1003(IR-408):1{10, 2003. [5] Bekkerman, R.; El-Yaniv, R.; Tishby, N.; Winter, Y. Distributional word clusters vs. words for text categorization. The Journal of Machine Learning Research, 3:1183{1208, 2003. [6] Burges, C. J. C. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2(2):121{167, 1998. [7] Carmel, D.; Roitman, H.; Zwerdling, N. Enhancing cluster labeling using wikipedia. Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09, p. 139, 2009. [8] Chandrinos, K. V.; Androutsopoulos, I.; Paliouras, G.; Spyropoulos, C. D. Automatic Web Rating: Filtering Obscene Content on the Web. In: Borbinha, J. L.; Baker, T., editors, Proceedings of ECDL00 4th European Conference on Re- search and Advanced Technology for Digital Libraries, p. 403{406. Springer Verlag, Heidelberg, DE, 2000. [9] Cheng, H.; Yan, X.; Han, J.; Hsu, C.-W. Discriminative Frequent Pattern Analysis for E ective Classi cation. 2007 IEEE 23rd International Conference on Data Engineering, p. 716{725, 2007.11021596803107500956006006006003066264875096245068930092515683771531-961409807440757778http://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccessRecuperação de informaçãoclassificaçao de textosseleçao de caracteristicasexpansao de documentosaprendizado de maquinaInformation retrievaltext classificationfeature selectiondocument expansionmachine learningCIENCIA DA COMPUTACAO::SISTEMAS DE COMPUTACAOUso de Seleção de Características da Wikipedia na Classificação Automática de Textos.Selection of Wikipedia features for automatic text classificationinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisreponame:Repositório Institucional da UFGinstname:Universidade Federal de Goiás (UFG)instacron:UFGTHUMBNAILuso_de_selecao_de_caracteristicas_da_wikipedia_na_classificacao_automatica_de_textos.pdf.jpguso_de_selecao_de_caracteristicas_da_wikipedia_na_classificacao_automatica_de_textos.pdf.jpgGenerated Thumbnailimage/jpeg3513http://repositorio.bc.ufg.br/tede/bitstreams/b96dc716-5e78-4e81-a331-69ce6dc40698/download9dbaf2878f7143f2ad40285eaca704cdMD57LICENSElicense.txtlicense.txttext/plain; charset=utf-82142http://repositorio.bc.ufg.br/tede/bitstreams/217f58eb-f16f-4f55-923e-d331160d91be/download232e528055260031f4e2af4136033daaMD51CC-LICENSElicense_urllicense_urltext/plain; charset=utf-849http://repositorio.bc.ufg.br/tede/bitstreams/35f0b9d8-f8f2-4a19-ad46-d4354e4777b3/download4afdbb8c545fd630ea7db775da747b2fMD52license_textlicense_texttext/html; charset=utf-821936http://repositorio.bc.ufg.br/tede/bitstreams/8e5ae9eb-d0e3-4393-b945-182576c5b91a/download9833653f73f7853880c94a6fead477b1MD53license_rdflicense_rdfapplication/rdf+xml; charset=utf-823148http://repositorio.bc.ufg.br/tede/bitstreams/6bd3f3fa-05a1-4c34-8b18-7b38bf80e32c/download9da0b6dfac957114c6a7714714b86306MD54ORIGINALuso_de_selecao_de_caracteristicas_da_wikipedia_na_classificacao_automatica_de_textos.pdfuso_de_selecao_de_caracteristicas_da_wikipedia_na_classificacao_automatica_de_textos.pdfDissertação - PPGCCOM/RG - Leonel Diogenes Carvalhaes Alvarengaapplication/pdf1449954http://repositorio.bc.ufg.br/tede/bitstreams/31694246-d804-46d5-854a-5a8cf3901c5c/download9086dec3868b6b703340b550c614d33dMD55TEXTuso_de_selecao_de_caracteristicas_da_wikipedia_na_classificacao_automatica_de_textos.pdf.txtuso_de_selecao_de_caracteristicas_da_wikipedia_na_classificacao_automatica_de_textos.pdf.txtExtracted Texttext/plain250753http://repositorio.bc.ufg.br/tede/bitstreams/fa6b40de-8772-458c-a75e-be42ff5e15b2/download62d04f44e977a0c5fa58b24d668f493cMD56tde/28702014-08-21 10:08:58.072http://creativecommons.org/licenses/by-nc-nd/4.0/Acesso abertoopen.accessoai:repositorio.bc.ufg.br:tde/2870http://repositorio.bc.ufg.br/tedeRepositório InstitucionalPUBhttps://repositorio.bc.ufg.br/tedeserver/oai/requestgrt.bc@ufg.bropendoar:oai:repositorio.bc.ufg.br:tede/12342014-08-21T13:08:58Repositório Institucional da UFG - Universidade Federal de Goiás (UFG)falseTk9UQTogQ09MT1FVRSBBUVVJIEEgU1VBIFBSw5NQUklBIExJQ0VOw4dBCkVzdGEgbGljZW7Dp2EgZGUgZXhlbXBsbyDDqSBmb3JuZWNpZGEgYXBlbmFzIHBhcmEgZmlucyBpbmZvcm1hdGl2b3MuCgpMSUNFTsOHQSBERSBESVNUUklCVUnDh8ODTyBOw4NPLUVYQ0xVU0lWQQpDb20gYSBhcHJlc2VudGHDp8OjbyBkZXN0YSBsaWNlbsOnYSwgdm9jw6ogKG8gYXV0b3IgKGVzKSBvdSBvIHRpdHVsYXIgZG9zIGRpcmVpdG9zIGRlIGF1dG9yKSBjb25jZWRlIMOgIFVuaXZlcnNpZGFkZSBYWFggKFNpZ2xhIGRhIFVuaXZlcnNpZGFkZSkgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsIHRyYWR1emlyIChjb25mb3JtZSBkZWZpbmlkbyBhYmFpeG8pLCBlL291IGRpc3RyaWJ1aXIgYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIChpbmNsdWluZG8gbyByZXN1bW8pIHBvciB0b2RvIG8gbXVuZG8gbm8gZm9ybWF0byBpbXByZXNzbyBlIGVsZXRyw7RuaWNvIGUgZW0gcXVhbHF1ZXIgbWVpbywgaW5jbHVpbmRvIG9zIGZvcm1hdG9zIMOhdWRpbyBvdSB2w61kZW8uClZvY8OqIGNvbmNvcmRhIHF1ZSBhIFNpZ2xhIGRlIFVuaXZlcnNpZGFkZSBwb2RlLCBzZW0gYWx0ZXJhciBvIGNvbnRlw7pkbywgdHJhbnNwb3IgYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIHBhcmEgcXVhbHF1ZXIgbWVpbyBvdSBmb3JtYXRvIHBhcmEgZmlucyBkZSBwcmVzZXJ2YcOnw6NvLgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBhIFNpZ2xhIGRlIFVuaXZlcnNpZGFkZSBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KVm9jw6ogZGVjbGFyYSBxdWUgYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIMOpIG9yaWdpbmFsIGUgcXVlIHZvY8OqIHRlbSBvIHBvZGVyIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vuw6dhLiBWb2PDqiB0YW1iw6ltIGRlY2xhcmEgcXVlIG8gZGVww7NzaXRvIGRhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gbsOjbywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCkNhc28gYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIGNvbnRlbmhhIG1hdGVyaWFsIHF1ZSB2b2PDqiBuw6NvIHBvc3N1aSBhIHRpdHVsYXJpZGFkZSBkb3MgZGlyZWl0b3MgYXV0b3JhaXMsIHZvY8OqIGRlY2xhcmEgcXVlIG9idGV2ZSBhIHBlcm1pc3PDo28gaXJyZXN0cml0YSBkbyBkZXRlbnRvciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgcGFyYSBjb25jZWRlciDDoCBTaWdsYSBkZSBVbml2ZXJzaWRhZGUgb3MgZGlyZWl0b3MgYXByZXNlbnRhZG9zIG5lc3RhIGxpY2Vuw6dhLCBlIHF1ZSBlc3NlIG1hdGVyaWFsIGRlIHByb3ByaWVkYWRlIGRlIHRlcmNlaXJvcyBlc3TDoSBjbGFyYW1lbnRlIGlkZW50aWZpY2FkbyBlIHJlY29uaGVjaWRvIG5vIHRleHRvIG91IG5vIGNvbnRlw7pkbyBkYSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gb3JhIGRlcG9zaXRhZGEuCkNBU08gQSBURVNFIE9VIERJU1NFUlRBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UgQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBTSUdMQSBERSBVTklWRVJTSURBREUsIFZPQ8OKIERFQ0xBUkEgUVVFIFJFU1BFSVRPVSBUT0RPUyBFIFFVQUlTUVVFUiBESVJFSVRPUyBERSBSRVZJU8ODTyBDT01PIFRBTULDiU0gQVMgREVNQUlTIE9CUklHQcOHw5VFUyBFWElHSURBUyBQT1IgQ09OVFJBVE8gT1UgQUNPUkRPLgpBIFNpZ2xhIGRlIFVuaXZlcnNpZGFkZSBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lIChzKSBvdSBvKHMpIG5vbWUocykgZG8ocykgZGV0ZW50b3IoZXMpIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBkYSB0ZXNlIG91IGRpc3NlcnRhw6fDo28sIGUgbsOjbyBmYXLDoSBxdWFscXVlciBhbHRlcmHDp8OjbywgYWzDqW0gZGFxdWVsYXMgY29uY2VkaWRhcyBwb3IgZXN0YSBsaWNlbsOnYS4K
dc.title.por.fl_str_mv	Uso de Seleção de Características da Wikipedia na Classificação Automática de Textos.
dc.title.alternative.eng.fl_str_mv	Selection of Wikipedia features for automatic text classification
title	Uso de Seleção de Características da Wikipedia na Classificação Automática de Textos.
spellingShingle	Uso de Seleção de Características da Wikipedia na Classificação Automática de Textos. Alvarenga, Leonel Diógenes Carvalhaes Recuperação de informação classificaçao de textos seleçao de caracteristicas expansao de documentos aprendizado de maquina Information retrieval text classification feature selection document expansion machine learning CIENCIA DA COMPUTACAO::SISTEMAS DE COMPUTACAO
title_short	Uso de Seleção de Características da Wikipedia na Classificação Automática de Textos.
title_full	Uso de Seleção de Características da Wikipedia na Classificação Automática de Textos.
title_fullStr	Uso de Seleção de Características da Wikipedia na Classificação Automática de Textos.
title_full_unstemmed	Uso de Seleção de Características da Wikipedia na Classificação Automática de Textos.
title_sort	Uso de Seleção de Características da Wikipedia na Classificação Automática de Textos.
author	Alvarenga, Leonel Diógenes Carvalhaes
author_facet	Alvarenga, Leonel Diógenes Carvalhaes
author_role	author
dc.contributor.advisor1.fl_str_mv	Rosa, Thierson Couto
dc.contributor.advisor1Lattes.fl_str_mv	http://lattes.cnpq.br/4414718560764818
dc.contributor.authorLattes.fl_str_mv	http://lattes.cnpq.br/9542541522845372
dc.contributor.author.fl_str_mv	Alvarenga, Leonel Diógenes Carvalhaes
contributor_str_mv	Rosa, Thierson Couto
dc.subject.por.fl_str_mv	Recuperação de informação classificaçao de textos seleçao de caracteristicas expansao de documentos aprendizado de maquina
topic	Recuperação de informação classificaçao de textos seleçao de caracteristicas expansao de documentos aprendizado de maquina Information retrieval text classification feature selection document expansion machine learning CIENCIA DA COMPUTACAO::SISTEMAS DE COMPUTACAO
dc.subject.eng.fl_str_mv	Information retrieval text classification feature selection document expansion machine learning
dc.subject.cnpq.fl_str_mv	CIENCIA DA COMPUTACAO::SISTEMAS DE COMPUTACAO
description	The traditional methods of text classification typically represent documents only as a set of words, also known as "Bag of Words"(BOW). Several studies have shown good results on making use of thesauri and encyclopedias as external information sources, aiming to expand the BOW representation by the identification of synonymy and hyponymy relationships between present terms in a document collection. However, the expansion process may introduce terms that lead to an erroneous classification. In this paper, we propose the use of feature selection measures in order to select features extracted from Wikipedia in order to improve the efectiveness of the expansion process. The study also proposes a feature selection measure called Tendency Factor to One Category (TF1C), so that the experiments showed that this measure proves to be competitive with the other measures Information Gain, Gain Ratio and Chisquared, in the process, delivering the best gains in microF1 and macroF1, in most experiments. The full use of features selected in this process showed to be more stable in assisting the classification, while it showed lower performance on restricting its insertion only to documents of the classes in which these features are well punctuated by the selection measures. When applied in the Reuters-21578, Ohsumed first - 20000 and 20Newsgroups collections, our approach to feature selection allowed the reduction of noise insertion inherent in the expansion process, and improved the results of use hyponyms, and demonstrated that the synonym relationship from Wikipedia can also be used in the document expansion, increasing the efectiveness of the automatic text classification.
publishDate	2012
dc.date.issued.fl_str_mv	2012-09-20
dc.date.accessioned.fl_str_mv	2014-07-31T14:43:10Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	ALVARENGA, Leonel Diógenes Carvalhaes. Uso de Seleção de Características da Wikipedia na Classificação Automática de Textos. 2012. 114 f. - Dissertação (Mestrado em) - Universidade Federal de Goiás, Goiânia, 2012
dc.identifier.uri.fl_str_mv	http://repositorio.bc.ufg.br/tede/handle/tde/2870
dc.identifier.dark.fl_str_mv	ark:/38995/0013000007vhv
identifier_str_mv	ALVARENGA, Leonel Diógenes Carvalhaes. Uso de Seleção de Características da Wikipedia na Classificação Automática de Textos. 2012. 114 f. - Dissertação (Mestrado em) - Universidade Federal de Goiás, Goiânia, 2012 ark:/38995/0013000007vhv
url	http://repositorio.bc.ufg.br/tede/handle/tde/2870
dc.language.iso.fl_str_mv	por
language	por
dc.relation.program.fl_str_mv	1102159680310750095
dc.relation.confidence.fl_str_mv	600 600 600 600
dc.relation.department.fl_str_mv	306626487509624506
dc.relation.cnpq.fl_str_mv	8930092515683771531
dc.relation.sponsorship.fl_str_mv	-961409807440757778
dc.relation.references.por.fl_str_mv	[1] Amati, G.; D'Aloisi, D.; Giannini, V.; Ubaldini, F. A Framework for Filtering News and Managing Distributed Data. Journal Of Universal Computer Science, 3(8):1007{1021, 1997. [2] Apt e, C.; Damerau, F.; Weiss, S. M. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12(3):233{251, July 1994. [3] Baeza-Yates, R.; Ribeiro-Neto, B. Modern information retrieval. ACM Press, New York, New York, USA, 1999. [4] Bekkerman, R.; Allan, J. Using Bigrams in Text Categorization. Department of Computer Science, University of Massachusetts, Amherst, 1003(IR-408):1{10, 2003. [5] Bekkerman, R.; El-Yaniv, R.; Tishby, N.; Winter, Y. Distributional word clusters vs. words for text categorization. The Journal of Machine Learning Research, 3:1183{1208, 2003. [6] Burges, C. J. C. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2(2):121{167, 1998. [7] Carmel, D.; Roitman, H.; Zwerdling, N. Enhancing cluster labeling using wikipedia. Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09, p. 139, 2009. [8] Chandrinos, K. V.; Androutsopoulos, I.; Paliouras, G.; Spyropoulos, C. D. Automatic Web Rating: Filtering Obscene Content on the Web. In: Borbinha, J. L.; Baker, T., editors, Proceedings of ECDL00 4th European Conference on Re- search and Advanced Technology for Digital Libraries, p. 403{406. Springer Verlag, Heidelberg, DE, 2000. [9] Cheng, H.; Yan, X.; Han, J.; Hsu, C.-W. Discriminative Frequent Pattern Analysis for E ective Classi cation. 2007 IEEE 23rd International Conference on Data Engineering, p. 716{725, 2007.
dc.rights.driver.fl_str_mv	http://creativecommons.org/licenses/by-nc-nd/4.0/ info:eu-repo/semantics/openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-nd/4.0/
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal de Goiás
dc.publisher.program.fl_str_mv	Programa de Pós Graduação em Ciência da Computação (INF)
dc.publisher.initials.fl_str_mv	UFG
dc.publisher.country.fl_str_mv	Brasil
dc.publisher.department.fl_str_mv	Instituto de Informática (INF)
publisher.none.fl_str_mv	Universidade Federal de Goiás
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFG instname:Universidade Federal de Goiás (UFG) instacron:UFG
instname_str	Universidade Federal de Goiás (UFG)
instacron_str	UFG
institution	UFG
reponame_str	Repositório Institucional da UFG
collection	Repositório Institucional da UFG
bitstream.url.fl_str_mv	http://repositorio.bc.ufg.br/tede/bitstreams/b96dc716-5e78-4e81-a331-69ce6dc40698/download http://repositorio.bc.ufg.br/tede/bitstreams/217f58eb-f16f-4f55-923e-d331160d91be/download http://repositorio.bc.ufg.br/tede/bitstreams/35f0b9d8-f8f2-4a19-ad46-d4354e4777b3/download http://repositorio.bc.ufg.br/tede/bitstreams/8e5ae9eb-d0e3-4393-b945-182576c5b91a/download http://repositorio.bc.ufg.br/tede/bitstreams/6bd3f3fa-05a1-4c34-8b18-7b38bf80e32c/download http://repositorio.bc.ufg.br/tede/bitstreams/31694246-d804-46d5-854a-5a8cf3901c5c/download http://repositorio.bc.ufg.br/tede/bitstreams/fa6b40de-8772-458c-a75e-be42ff5e15b2/download
bitstream.checksum.fl_str_mv	9dbaf2878f7143f2ad40285eaca704cd 232e528055260031f4e2af4136033daa 4afdbb8c545fd630ea7db775da747b2f 9833653f73f7853880c94a6fead477b1 9da0b6dfac957114c6a7714714b86306 9086dec3868b6b703340b550c614d33d 62d04f44e977a0c5fa58b24d668f493c
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5 MD5 MD5 MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFG - Universidade Federal de Goiás (UFG)
repository.mail.fl_str_mv	grt.bc@ufg.br
_version_	1846536687009660928

Uso de Seleção de Características da Wikipedia na Classificação Automática de Textos.

Registros relacionados