Vector representation of texts applied to prediction models
| Ano de defesa: | 2020 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Dissertação |
| Tipo de acesso: | Acesso aberto |
| Idioma: | eng |
| Instituição de defesa: |
Universidade Federal de São Carlos
Câmpus São Carlos |
| Programa de Pós-Graduação: |
Programa Interinstitucional de Pós-Graduação em Estatística - PIPGEs
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Palavras-chave em Inglês: | |
| Área do conhecimento CNPq: | |
| Link de acesso: | https://repositorio.ufscar.br/handle/20.500.14289/12362 |
Resumo: | Natural Language Processing has gone through substantial changes over time. It was only recently that statistical approaches started receiving attention. The Word2Vec model is one of these. It is a shallow neural network designed to fit vectorial representations of words according to their syntactic and semantic values. The word embeddings acquired by this method are state-of-art. This method has many uses, one of which is the fitting of prediction models based on texts. It is common in the literature for a text to be represented as the mean of its word embeddings. The resulting vector is then used in the predictive model as an explanatory variables. In this dissertation, we propose getting more information of text by adding other summary statistics besides the mean, such as other moments and quantiles. The improvement of the prediction models is studied in real datasets. |
| id |
SCAR_5ecc26fd16a328ec04fbf8f5f9f1ad8d |
|---|---|
| oai_identifier_str |
oai:repositorio.ufscar.br:20.500.14289/12362 |
| network_acronym_str |
SCAR |
| network_name_str |
Repositório Institucional da UFSCAR |
| repository_id_str |
|
| spelling |
Stern, Deborah BassiIzbicki, Rafaelhttp://lattes.cnpq.br/9991192137633896http://lattes.cnpq.br/05809715896150883fafee68-46cd-4b16-addf-945cc55088d82020-03-27T16:25:42Z2020-03-27T16:25:42Z2020-03-09STERN, Deborah Bassi. Vector representation of texts applied to prediction models. 2020. Dissertação (Mestrado em Estatística) – Universidade Federal de São Carlos, São Carlos, 2020. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/12362.https://repositorio.ufscar.br/handle/20.500.14289/12362Natural Language Processing has gone through substantial changes over time. It was only recently that statistical approaches started receiving attention. The Word2Vec model is one of these. It is a shallow neural network designed to fit vectorial representations of words according to their syntactic and semantic values. The word embeddings acquired by this method are state-of-art. This method has many uses, one of which is the fitting of prediction models based on texts. It is common in the literature for a text to be represented as the mean of its word embeddings. The resulting vector is then used in the predictive model as an explanatory variables. In this dissertation, we propose getting more information of text by adding other summary statistics besides the mean, such as other moments and quantiles. The improvement of the prediction models is studied in real datasets.Processamento de linguagem natural sofreu uma grande mudança com o tempo. Abordagens estatísticas passaram a ganhar atenção apenas recentemente. O modelo word2vec é uma destas. Ele é uma rede neural rasa desenhada para ajustar representações vetoriais de palavras segundo seus valores semânticos e sintáticos. As representações de palavras obtidas por este método são o estado da arte. Este método tem muitas aplicações, como permitir o ajuste de modelos preditivos baseadas em textos. Na literatura é comum um texto ser representado pela média das representações vetorias das palavras que o compõem. O vetor resultante é então incluído como variável explicativa no modelo. Nesta dissertação propomos a obtenção de mais informação sobre o texto através de outras estatísticas descritivas além da média, como outros momentos e quantis. A melhora dos modelos preditivos é estudada com dados reais.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)CAPES: Código de Financiamento 001engUniversidade Federal de São CarlosCâmpus São CarlosPrograma Interinstitucional de Pós-Graduação em Estatística - PIPGEsUFSCarAttribution-NonCommercial-NoDerivs 3.0 Brazilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessProcessamento de linguagem naturalRedes neuraisRepresentação vetorial de palavrasModelos de prediçãoNatural language processingNeural networksWordVectorsPrediction modelsCIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICAVector representation of texts applied to prediction modelsRepresentações vetoriais de textos aplicados a modelos preditivosinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesis6006003e57f161-19fe-4345-9e87-bc60eb7be98freponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINALdissertacao.pdfdissertacao.pdfapplication/pdf1058304https://repositorio.ufscar.br/bitstreams/07e276e7-c740-4ef9-9439-cd4913c0d04f/download9cf919af9fe04ae0ab3391925268f534MD51trueAnonymousREADpipges-ufscar_cartacomprovante_.pdfpipges-ufscar_cartacomprovante_.pdfapplication/pdf495513https://repositorio.ufscar.br/bitstreams/557ce8c1-4e8e-4308-b685-e300cfbe1d39/downloadf1a5e158a646cc4da736d267e95271c4MD52falseAnonymousREADCC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufscar.br/bitstreams/6f06a9d6-9c48-4fac-90e8-73957ea3969a/downloade39d27027a6cc9cb039ad269a5db8e34MD53falseAnonymousREADTEXTdissertacao.pdf.txtdissertacao.pdf.txtExtracted texttext/plain35349https://repositorio.ufscar.br/bitstreams/d5bc9ee4-6ffb-4ab9-a906-569ef2a9bcfb/download3ad80f646b83608578ebe3ad8c440946MD58falseAnonymousREADpipges-ufscar_cartacomprovante_.pdf.txtpipges-ufscar_cartacomprovante_.pdf.txtExtracted texttext/plain1235https://repositorio.ufscar.br/bitstreams/73b698fd-4869-48f9-955e-a6a3569a640d/download55d83805226deb1ac3e0ecf9322b273aMD510falseAnonymousREADTHUMBNAILdissertacao.pdf.jpgdissertacao.pdf.jpgIM Thumbnailimage/jpeg15197https://repositorio.ufscar.br/bitstreams/8bb6cf9b-0244-4a31-aa80-18d2958ff130/download23bda2c9724269e4fa5034a67efdd5ecMD59falseAnonymousREADpipges-ufscar_cartacomprovante_.pdf.jpgpipges-ufscar_cartacomprovante_.pdf.jpgIM Thumbnailimage/jpeg9146https://repositorio.ufscar.br/bitstreams/28de3935-96fc-42fd-b0ca-085484d782f0/downloadb69e5acb165d8fed523acf1027021583MD511falseAnonymousREAD20.500.14289/123622025-02-05 18:23:47.374http://creativecommons.org/licenses/by-nc-nd/3.0/br/Attribution-NonCommercial-NoDerivs 3.0 Brazilopen.accessoai:repositorio.ufscar.br:20.500.14289/12362https://repositorio.ufscar.brRepositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestrepositorio.sibi@ufscar.bropendoar:43222025-02-05T21:23:47Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)false |
| dc.title.eng.fl_str_mv |
Vector representation of texts applied to prediction models |
| dc.title.alternative.por.fl_str_mv |
Representações vetoriais de textos aplicados a modelos preditivos |
| title |
Vector representation of texts applied to prediction models |
| spellingShingle |
Vector representation of texts applied to prediction models Stern, Deborah Bassi Processamento de linguagem natural Redes neurais Representação vetorial de palavras Modelos de predição Natural language processing Neural networks WordVectors Prediction models CIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA |
| title_short |
Vector representation of texts applied to prediction models |
| title_full |
Vector representation of texts applied to prediction models |
| title_fullStr |
Vector representation of texts applied to prediction models |
| title_full_unstemmed |
Vector representation of texts applied to prediction models |
| title_sort |
Vector representation of texts applied to prediction models |
| author |
Stern, Deborah Bassi |
| author_facet |
Stern, Deborah Bassi |
| author_role |
author |
| dc.contributor.authorlattes.por.fl_str_mv |
http://lattes.cnpq.br/0580971589615088 |
| dc.contributor.author.fl_str_mv |
Stern, Deborah Bassi |
| dc.contributor.advisor1.fl_str_mv |
Izbicki, Rafael |
| dc.contributor.advisor1Lattes.fl_str_mv |
http://lattes.cnpq.br/9991192137633896 |
| dc.contributor.authorID.fl_str_mv |
3fafee68-46cd-4b16-addf-945cc55088d8 |
| contributor_str_mv |
Izbicki, Rafael |
| dc.subject.por.fl_str_mv |
Processamento de linguagem natural Redes neurais Representação vetorial de palavras Modelos de predição |
| topic |
Processamento de linguagem natural Redes neurais Representação vetorial de palavras Modelos de predição Natural language processing Neural networks WordVectors Prediction models CIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA |
| dc.subject.eng.fl_str_mv |
Natural language processing Neural networks WordVectors Prediction models |
| dc.subject.cnpq.fl_str_mv |
CIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA |
| description |
Natural Language Processing has gone through substantial changes over time. It was only recently that statistical approaches started receiving attention. The Word2Vec model is one of these. It is a shallow neural network designed to fit vectorial representations of words according to their syntactic and semantic values. The word embeddings acquired by this method are state-of-art. This method has many uses, one of which is the fitting of prediction models based on texts. It is common in the literature for a text to be represented as the mean of its word embeddings. The resulting vector is then used in the predictive model as an explanatory variables. In this dissertation, we propose getting more information of text by adding other summary statistics besides the mean, such as other moments and quantiles. The improvement of the prediction models is studied in real datasets. |
| publishDate |
2020 |
| dc.date.accessioned.fl_str_mv |
2020-03-27T16:25:42Z |
| dc.date.available.fl_str_mv |
2020-03-27T16:25:42Z |
| dc.date.issued.fl_str_mv |
2020-03-09 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.citation.fl_str_mv |
STERN, Deborah Bassi. Vector representation of texts applied to prediction models. 2020. Dissertação (Mestrado em Estatística) – Universidade Federal de São Carlos, São Carlos, 2020. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/12362. |
| dc.identifier.uri.fl_str_mv |
https://repositorio.ufscar.br/handle/20.500.14289/12362 |
| identifier_str_mv |
STERN, Deborah Bassi. Vector representation of texts applied to prediction models. 2020. Dissertação (Mestrado em Estatística) – Universidade Federal de São Carlos, São Carlos, 2020. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/12362. |
| url |
https://repositorio.ufscar.br/handle/20.500.14289/12362 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.confidence.fl_str_mv |
600 600 |
| dc.relation.authority.fl_str_mv |
3e57f161-19fe-4345-9e87-bc60eb7be98f |
| dc.rights.driver.fl_str_mv |
Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/ info:eu-repo/semantics/openAccess |
| rights_invalid_str_mv |
Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/ |
| eu_rights_str_mv |
openAccess |
| dc.publisher.none.fl_str_mv |
Universidade Federal de São Carlos Câmpus São Carlos |
| dc.publisher.program.fl_str_mv |
Programa Interinstitucional de Pós-Graduação em Estatística - PIPGEs |
| dc.publisher.initials.fl_str_mv |
UFSCar |
| publisher.none.fl_str_mv |
Universidade Federal de São Carlos Câmpus São Carlos |
| dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFSCAR instname:Universidade Federal de São Carlos (UFSCAR) instacron:UFSCAR |
| instname_str |
Universidade Federal de São Carlos (UFSCAR) |
| instacron_str |
UFSCAR |
| institution |
UFSCAR |
| reponame_str |
Repositório Institucional da UFSCAR |
| collection |
Repositório Institucional da UFSCAR |
| bitstream.url.fl_str_mv |
https://repositorio.ufscar.br/bitstreams/07e276e7-c740-4ef9-9439-cd4913c0d04f/download https://repositorio.ufscar.br/bitstreams/557ce8c1-4e8e-4308-b685-e300cfbe1d39/download https://repositorio.ufscar.br/bitstreams/6f06a9d6-9c48-4fac-90e8-73957ea3969a/download https://repositorio.ufscar.br/bitstreams/d5bc9ee4-6ffb-4ab9-a906-569ef2a9bcfb/download https://repositorio.ufscar.br/bitstreams/73b698fd-4869-48f9-955e-a6a3569a640d/download https://repositorio.ufscar.br/bitstreams/8bb6cf9b-0244-4a31-aa80-18d2958ff130/download https://repositorio.ufscar.br/bitstreams/28de3935-96fc-42fd-b0ca-085484d782f0/download |
| bitstream.checksum.fl_str_mv |
9cf919af9fe04ae0ab3391925268f534 f1a5e158a646cc4da736d267e95271c4 e39d27027a6cc9cb039ad269a5db8e34 3ad80f646b83608578ebe3ad8c440946 55d83805226deb1ac3e0ecf9322b273a 23bda2c9724269e4fa5034a67efdd5ec b69e5acb165d8fed523acf1027021583 |
| bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 MD5 MD5 MD5 |
| repository.name.fl_str_mv |
Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR) |
| repository.mail.fl_str_mv |
repositorio.sibi@ufscar.br |
| _version_ |
1851688889486409728 |