Vector representation of texts applied to prediction models

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Stern, Deborah Bassi
Orientador(a): Izbicki, Rafael lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade Federal de São Carlos
Câmpus São Carlos
Programa de Pós-Graduação: Programa Interinstitucional de Pós-Graduação em Estatística - PIPGEs
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: https://repositorio.ufscar.br/handle/20.500.14289/12362
Resumo: Natural Language Processing has gone through substantial changes over time. It was only recently that statistical approaches started receiving attention. The Word2Vec model is one of these. It is a shallow neural network designed to fit vectorial representations of words according to their syntactic and semantic values. The word embeddings acquired by this method are state-of-art. This method has many uses, one of which is the fitting of prediction models based on texts. It is common in the literature for a text to be represented as the mean of its word embeddings. The resulting vector is then used in the predictive model as an explanatory variables. In this dissertation, we propose getting more information of text by adding other summary statistics besides the mean, such as other moments and quantiles. The improvement of the prediction models is studied in real datasets.
id SCAR_5ecc26fd16a328ec04fbf8f5f9f1ad8d
oai_identifier_str oai:repositorio.ufscar.br:20.500.14289/12362
network_acronym_str SCAR
network_name_str Repositório Institucional da UFSCAR
repository_id_str
spelling Stern, Deborah BassiIzbicki, Rafaelhttp://lattes.cnpq.br/9991192137633896http://lattes.cnpq.br/05809715896150883fafee68-46cd-4b16-addf-945cc55088d82020-03-27T16:25:42Z2020-03-27T16:25:42Z2020-03-09STERN, Deborah Bassi. Vector representation of texts applied to prediction models. 2020. Dissertação (Mestrado em Estatística) – Universidade Federal de São Carlos, São Carlos, 2020. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/12362.https://repositorio.ufscar.br/handle/20.500.14289/12362Natural Language Processing has gone through substantial changes over time. It was only recently that statistical approaches started receiving attention. The Word2Vec model is one of these. It is a shallow neural network designed to fit vectorial representations of words according to their syntactic and semantic values. The word embeddings acquired by this method are state-of-art. This method has many uses, one of which is the fitting of prediction models based on texts. It is common in the literature for a text to be represented as the mean of its word embeddings. The resulting vector is then used in the predictive model as an explanatory variables. In this dissertation, we propose getting more information of text by adding other summary statistics besides the mean, such as other moments and quantiles. The improvement of the prediction models is studied in real datasets.Processamento de linguagem natural sofreu uma grande mudança com o tempo. Abordagens estatísticas passaram a ganhar atenção apenas recentemente. O modelo word2vec é uma destas. Ele é uma rede neural rasa desenhada para ajustar representações vetoriais de palavras segundo seus valores semânticos e sintáticos. As representações de palavras obtidas por este método são o estado da arte. Este método tem muitas aplicações, como permitir o ajuste de modelos preditivos baseadas em textos. Na literatura é comum um texto ser representado pela média das representações vetorias das palavras que o compõem. O vetor resultante é então incluído como variável explicativa no modelo. Nesta dissertação propomos a obtenção de mais informação sobre o texto através de outras estatísticas descritivas além da média, como outros momentos e quantis. A melhora dos modelos preditivos é estudada com dados reais.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)CAPES: Código de Financiamento 001engUniversidade Federal de São CarlosCâmpus São CarlosPrograma Interinstitucional de Pós-Graduação em Estatística - PIPGEsUFSCarAttribution-NonCommercial-NoDerivs 3.0 Brazilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessProcessamento de linguagem naturalRedes neuraisRepresentação vetorial de palavrasModelos de prediçãoNatural language processingNeural networksWordVectorsPrediction modelsCIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICAVector representation of texts applied to prediction modelsRepresentações vetoriais de textos aplicados a modelos preditivosinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesis6006003e57f161-19fe-4345-9e87-bc60eb7be98freponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINALdissertacao.pdfdissertacao.pdfapplication/pdf1058304https://repositorio.ufscar.br/bitstreams/07e276e7-c740-4ef9-9439-cd4913c0d04f/download9cf919af9fe04ae0ab3391925268f534MD51trueAnonymousREADpipges-ufscar_cartacomprovante_.pdfpipges-ufscar_cartacomprovante_.pdfapplication/pdf495513https://repositorio.ufscar.br/bitstreams/557ce8c1-4e8e-4308-b685-e300cfbe1d39/downloadf1a5e158a646cc4da736d267e95271c4MD52falseAnonymousREADCC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufscar.br/bitstreams/6f06a9d6-9c48-4fac-90e8-73957ea3969a/downloade39d27027a6cc9cb039ad269a5db8e34MD53falseAnonymousREADTEXTdissertacao.pdf.txtdissertacao.pdf.txtExtracted texttext/plain35349https://repositorio.ufscar.br/bitstreams/d5bc9ee4-6ffb-4ab9-a906-569ef2a9bcfb/download3ad80f646b83608578ebe3ad8c440946MD58falseAnonymousREADpipges-ufscar_cartacomprovante_.pdf.txtpipges-ufscar_cartacomprovante_.pdf.txtExtracted texttext/plain1235https://repositorio.ufscar.br/bitstreams/73b698fd-4869-48f9-955e-a6a3569a640d/download55d83805226deb1ac3e0ecf9322b273aMD510falseAnonymousREADTHUMBNAILdissertacao.pdf.jpgdissertacao.pdf.jpgIM Thumbnailimage/jpeg15197https://repositorio.ufscar.br/bitstreams/8bb6cf9b-0244-4a31-aa80-18d2958ff130/download23bda2c9724269e4fa5034a67efdd5ecMD59falseAnonymousREADpipges-ufscar_cartacomprovante_.pdf.jpgpipges-ufscar_cartacomprovante_.pdf.jpgIM Thumbnailimage/jpeg9146https://repositorio.ufscar.br/bitstreams/28de3935-96fc-42fd-b0ca-085484d782f0/downloadb69e5acb165d8fed523acf1027021583MD511falseAnonymousREAD20.500.14289/123622025-02-05 18:23:47.374http://creativecommons.org/licenses/by-nc-nd/3.0/br/Attribution-NonCommercial-NoDerivs 3.0 Brazilopen.accessoai:repositorio.ufscar.br:20.500.14289/12362https://repositorio.ufscar.brRepositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestrepositorio.sibi@ufscar.bropendoar:43222025-02-05T21:23:47Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)false
dc.title.eng.fl_str_mv Vector representation of texts applied to prediction models
dc.title.alternative.por.fl_str_mv Representações vetoriais de textos aplicados a modelos preditivos
title Vector representation of texts applied to prediction models
spellingShingle Vector representation of texts applied to prediction models
Stern, Deborah Bassi
Processamento de linguagem natural
Redes neurais
Representação vetorial de palavras
Modelos de predição
Natural language processing
Neural networks
WordVectors
Prediction models
CIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA
title_short Vector representation of texts applied to prediction models
title_full Vector representation of texts applied to prediction models
title_fullStr Vector representation of texts applied to prediction models
title_full_unstemmed Vector representation of texts applied to prediction models
title_sort Vector representation of texts applied to prediction models
author Stern, Deborah Bassi
author_facet Stern, Deborah Bassi
author_role author
dc.contributor.authorlattes.por.fl_str_mv http://lattes.cnpq.br/0580971589615088
dc.contributor.author.fl_str_mv Stern, Deborah Bassi
dc.contributor.advisor1.fl_str_mv Izbicki, Rafael
dc.contributor.advisor1Lattes.fl_str_mv http://lattes.cnpq.br/9991192137633896
dc.contributor.authorID.fl_str_mv 3fafee68-46cd-4b16-addf-945cc55088d8
contributor_str_mv Izbicki, Rafael
dc.subject.por.fl_str_mv Processamento de linguagem natural
Redes neurais
Representação vetorial de palavras
Modelos de predição
topic Processamento de linguagem natural
Redes neurais
Representação vetorial de palavras
Modelos de predição
Natural language processing
Neural networks
WordVectors
Prediction models
CIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA
dc.subject.eng.fl_str_mv Natural language processing
Neural networks
WordVectors
Prediction models
dc.subject.cnpq.fl_str_mv CIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA
description Natural Language Processing has gone through substantial changes over time. It was only recently that statistical approaches started receiving attention. The Word2Vec model is one of these. It is a shallow neural network designed to fit vectorial representations of words according to their syntactic and semantic values. The word embeddings acquired by this method are state-of-art. This method has many uses, one of which is the fitting of prediction models based on texts. It is common in the literature for a text to be represented as the mean of its word embeddings. The resulting vector is then used in the predictive model as an explanatory variables. In this dissertation, we propose getting more information of text by adding other summary statistics besides the mean, such as other moments and quantiles. The improvement of the prediction models is studied in real datasets.
publishDate 2020
dc.date.accessioned.fl_str_mv 2020-03-27T16:25:42Z
dc.date.available.fl_str_mv 2020-03-27T16:25:42Z
dc.date.issued.fl_str_mv 2020-03-09
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv STERN, Deborah Bassi. Vector representation of texts applied to prediction models. 2020. Dissertação (Mestrado em Estatística) – Universidade Federal de São Carlos, São Carlos, 2020. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/12362.
dc.identifier.uri.fl_str_mv https://repositorio.ufscar.br/handle/20.500.14289/12362
identifier_str_mv STERN, Deborah Bassi. Vector representation of texts applied to prediction models. 2020. Dissertação (Mestrado em Estatística) – Universidade Federal de São Carlos, São Carlos, 2020. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/12362.
url https://repositorio.ufscar.br/handle/20.500.14289/12362
dc.language.iso.fl_str_mv eng
language eng
dc.relation.confidence.fl_str_mv 600
600
dc.relation.authority.fl_str_mv 3e57f161-19fe-4345-9e87-bc60eb7be98f
dc.rights.driver.fl_str_mv Attribution-NonCommercial-NoDerivs 3.0 Brazil
http://creativecommons.org/licenses/by-nc-nd/3.0/br/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Attribution-NonCommercial-NoDerivs 3.0 Brazil
http://creativecommons.org/licenses/by-nc-nd/3.0/br/
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal de São Carlos
Câmpus São Carlos
dc.publisher.program.fl_str_mv Programa Interinstitucional de Pós-Graduação em Estatística - PIPGEs
dc.publisher.initials.fl_str_mv UFSCar
publisher.none.fl_str_mv Universidade Federal de São Carlos
Câmpus São Carlos
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFSCAR
instname:Universidade Federal de São Carlos (UFSCAR)
instacron:UFSCAR
instname_str Universidade Federal de São Carlos (UFSCAR)
instacron_str UFSCAR
institution UFSCAR
reponame_str Repositório Institucional da UFSCAR
collection Repositório Institucional da UFSCAR
bitstream.url.fl_str_mv https://repositorio.ufscar.br/bitstreams/07e276e7-c740-4ef9-9439-cd4913c0d04f/download
https://repositorio.ufscar.br/bitstreams/557ce8c1-4e8e-4308-b685-e300cfbe1d39/download
https://repositorio.ufscar.br/bitstreams/6f06a9d6-9c48-4fac-90e8-73957ea3969a/download
https://repositorio.ufscar.br/bitstreams/d5bc9ee4-6ffb-4ab9-a906-569ef2a9bcfb/download
https://repositorio.ufscar.br/bitstreams/73b698fd-4869-48f9-955e-a6a3569a640d/download
https://repositorio.ufscar.br/bitstreams/8bb6cf9b-0244-4a31-aa80-18d2958ff130/download
https://repositorio.ufscar.br/bitstreams/28de3935-96fc-42fd-b0ca-085484d782f0/download
bitstream.checksum.fl_str_mv 9cf919af9fe04ae0ab3391925268f534
f1a5e158a646cc4da736d267e95271c4
e39d27027a6cc9cb039ad269a5db8e34
3ad80f646b83608578ebe3ad8c440946
55d83805226deb1ac3e0ecf9322b273a
23bda2c9724269e4fa5034a67efdd5ec
b69e5acb165d8fed523acf1027021583
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)
repository.mail.fl_str_mv repositorio.sibi@ufscar.br
_version_ 1851688889486409728