Descrição sintático-semântica de nomes predicadores em tweets do mercado financeiro em português

Barbosa, Bryan Khelven da Silva

Descrição sintático-semântica de nomes predicadores em tweets do mercado financeiro em português

Detalhes bibliográficos
Ano de defesa:	2024
Autor(a) principal:	Barbosa, Bryan Khelven da Silva
Orientador(a):	Di Felippo, Ariani
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de São Carlos Câmpus São Carlos
Programa de Pós-Graduação:	Programa de Pós-Graduação em Linguística - PPGL
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	PLN Tweet Nome predicador Estrutura de argumentos
Área do conhecimento CNPq:	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO LINGUISTICA, LETRAS E ARTES::LINGUISTICA
Link de acesso:	https://repositorio.ufscar.br/handle/20.500.14289/20503
Resumo:	In this study, the argument structure (A-structure) of predicative nouns (Npred) occurring in the financial market tweet corpus DANTEStocks was described, given the preference for using this type of predicator in digital genres of this domain. The specific objective was to verify: (i) the presence/absence of arguments (Arg) in the tweets, (ii) the syntactic realization of Args, and (iii) the influence of linguistic phenomena in tweets on the realization of the A-structure of the nouns. Specifically, 145 Npreds and 1,756 instances (tweets with at least one Npred) from the corpus were described at the syntactic-semantic level. Syntactically, semi-automatic annotation of the entire DANTEStocks was carried out according to the Universal Dependencies (UD) model. At the semantic level, syntactic dependency trees guided the manual annotation of instances according to NomBank. The syntactic-semantic mapping revealed that: (i) the A-structure of valency one (V1) Npreds is always filled in syntax, (ii) the A-structure of Npreds with V>1 shows some missing Args, (iii) most analyzed Npreds are of V3, with only 2 Args in most instances, (iv) the deprels most frequently connecting Npreds to their Args are nmod and amod, and (v) the syntactic realization of the A-structure in 24 instances was reduced by tweet-specific phenomena (truncation and juxtaposition of elements). These results enrich the descriptive framework of lexical aspects of the language in financial market tweets. Moreover, the syntactic-semantic valency description of Npreds was systematized in NounBank.DS, an online lexical repository that can support further linguistic-computational research. A contribution to Natural Language Processing (NLP) is the UD-syntactic annotation of DANTEStocks, which led to the creation of the first Portuguese tweebank. This resource enabled the development of the first UD-parser of UGC for this language. The NomBank-like semantic annotation of a portion of the corpus also generated a significant resource. Thus, this study produced reference linguistic resources and a tool (parser) for the automatic processing of Portuguese tweets, which are essential for developing NLP applications targeting this type of UGC.

Metadados do item

id	SCAR_1a02bac144269bc812dfd8e09c447a53
oai_identifier_str	oai:repositorio.ufscar.br:20.500.14289/20503
network_acronym_str	SCAR
network_name_str	Repositório Institucional da UFSCAR
repository_id_str
spelling	Barbosa, Bryan Khelven da SilvaDi Felippo, Arianihttp://lattes.cnpq.br/8648412103197455http://lattes.cnpq.br/8007703579388553https://orcid.org/0000-0002-4637-6498https://orcid.org/0000-0002-4566-93522024-09-10T11:50:54Z2024-09-10T11:50:54Z2024-08-12BARBOSA, Bryan Khelven da Silva. Descrição sintático-semântica de nomes predicadores em tweets do mercado financeiro em português. 2024. Dissertação (Mestrado em Linguística) – Universidade Federal de São Carlos, São Carlos, 2024. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/20503.https://repositorio.ufscar.br/handle/20.500.14289/20503In this study, the argument structure (A-structure) of predicative nouns (Npred) occurring in the financial market tweet corpus DANTEStocks was described, given the preference for using this type of predicator in digital genres of this domain. The specific objective was to verify: (i) the presence/absence of arguments (Arg) in the tweets, (ii) the syntactic realization of Args, and (iii) the influence of linguistic phenomena in tweets on the realization of the A-structure of the nouns. Specifically, 145 Npreds and 1,756 instances (tweets with at least one Npred) from the corpus were described at the syntactic-semantic level. Syntactically, semi-automatic annotation of the entire DANTEStocks was carried out according to the Universal Dependencies (UD) model. At the semantic level, syntactic dependency trees guided the manual annotation of instances according to NomBank. The syntactic-semantic mapping revealed that: (i) the A-structure of valency one (V1) Npreds is always filled in syntax, (ii) the A-structure of Npreds with V>1 shows some missing Args, (iii) most analyzed Npreds are of V3, with only 2 Args in most instances, (iv) the deprels most frequently connecting Npreds to their Args are nmod and amod, and (v) the syntactic realization of the A-structure in 24 instances was reduced by tweet-specific phenomena (truncation and juxtaposition of elements). These results enrich the descriptive framework of lexical aspects of the language in financial market tweets. Moreover, the syntactic-semantic valency description of Npreds was systematized in NounBank.DS, an online lexical repository that can support further linguistic-computational research. A contribution to Natural Language Processing (NLP) is the UD-syntactic annotation of DANTEStocks, which led to the creation of the first Portuguese tweebank. This resource enabled the development of the first UD-parser of UGC for this language. The NomBank-like semantic annotation of a portion of the corpus also generated a significant resource. Thus, this study produced reference linguistic resources and a tool (parser) for the automatic processing of Portuguese tweets, which are essential for developing NLP applications targeting this type of UGC.Neste trabalho, descreveu-se a estrutura de argumentos (estrutura-A) dos nomes predicadores (Npred) que ocorrem no córpus de tweets do mercado financeiro DANTEStocks, pois há preferência pelo uso desse tipo de predicador em gêneros digitais desse domínio. O objetivo específico foi verificar: (i) a presença/ausência dos argumentos (Arg) nos tweets, (ii) a realização sintática dos Arg e (iii) a influência dos fenômenos linguísticos dos tweets na realização da estrutura-A dos nomes. Especificamente, 145 Npred e 1.756 instâncias (tweets com ao menos um Npred) do córpus foram descritas em nível sintático-semântico. Quanto à sintaxe, fez-se a anotação semiautomática de todo o DANTEStocks de acordo com o modelo Universal Dependencies (UD). Em nível semântico, as árvores de dependência sintática guiaram a anotação manual das instâncias segundo o NomBank. O mapeamento sintático-semântico permitiu observar que: (i) a estrutura-A dos Npred de valência um (V1) é sempre preenchida na sintaxe, (ii) a estrutura-A dos Npred de V>1 apresenta algum Arg ausente, (iii) a maioria dos Npred analisados é de V3, com apenas 2 Arg na maioria das instâncias, (iv) as deprel que mais frequentemente conectam os Npred a seus Arg são nmod e amod, e (v) a realização sintática da estrutura-A em 24 instâncias foi reduzida pela ocorrência de fenômenos dos tweets (truncamento e justaposição de elementos). Esses resultados enriquecem o arcabouço de estudos descritivos sobre os aspectos lexicais da linguagem dos tweets do mercado financeiro. Aliás, a descrição da valência sintático-semântica dos Npred foi sistematizada no NounBank.DS, um repositório lexical online que pode subsidiar outras pesquisas linguístico-computacionais. Como contribuição para o Processamento das Línguas Naturais (PLN), destaca-se a anotação sintática-UD do DANTEStocks, a qual deu origem ao primeiro tweebank em português. Esse recurso permitiu o desenvolvimento do primeiro parser-UD para UGC na referida língua. A anotação semântica à la NomBank de uma parcela do córpus também gerou um importante recurso. Assim, este trabalho produziu recursos linguísticos de referência e uma ferramenta (parser) para o processamento automático de tweets em português, os quais são essenciais para o desenvolvimento de aplicações de PLN destinadas a esse tipo de CGU.Outra2019/07665-4porUniversidade Federal de São CarlosCâmpus São CarlosPrograma de Pós-Graduação em Linguística - PPGLUFSCarAttribution-NonCommercial-NoDerivs 3.0 Brazilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessPLNTweetNome predicadorEstrutura de argumentosCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOLINGUISTICA, LETRAS E ARTES::LINGUISTICADescrição sintático-semântica de nomes predicadores em tweets do mercado financeiro em portuguêsSyntactic-semantic description of predicative nouns in portuguese financial market tweetsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisreponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARTEXTDissertacao_Bryan_Khelven_Versao_Final.pdf.txtDissertacao_Bryan_Khelven_Versao_Final.pdf.txtExtracted texttext/plain103893https://repositorio.ufscar.br/bitstreams/d880629b-0220-41d9-b681-0e6d685e65c7/downloadc78bb51c01baadde9fe1db4065a033d8MD54falseAnonymousREADTHUMBNAILDissertacao_Bryan_Khelven_Versao_Final.pdf.jpgDissertacao_Bryan_Khelven_Versao_Final.pdf.jpgGenerated Thumbnailimage/jpeg6549https://repositorio.ufscar.br/bitstreams/c51eb39a-bd87-414f-9b44-961376bb89a6/download2805e214a5c19e2ec7ae78b484319df0MD55falseAnonymousREADORIGINALDissertacao_Bryan_Khelven_Versao_Final.pdfDissertacao_Bryan_Khelven_Versao_Final.pdfVersão final da Dissertação de Mestradoapplication/pdf16654509https://repositorio.ufscar.br/bitstreams/78bb090c-4700-49f8-9c4d-c1cb5c283d2f/downloade5df26599501feaf5a0f7a675d512206MD51trueAnonymousREADCC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8810https://repositorio.ufscar.br/bitstreams/432bcc8e-2a2f-4288-8bf4-3280bb3d5a03/downloadf337d95da1fce0a22c77480e5e9a7aecMD53falseAnonymousREAD20.500.14289/205032025-02-06 03:10:58.707http://creativecommons.org/licenses/by-nc-nd/3.0/br/Attribution-NonCommercial-NoDerivs 3.0 Brazilopen.accessoai:repositorio.ufscar.br:20.500.14289/20503https://repositorio.ufscar.brRepositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestrepositorio.sibi@ufscar.bropendoar:43222025-02-06T06:10:58Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)false
dc.title.por.fl_str_mv	Descrição sintático-semântica de nomes predicadores em tweets do mercado financeiro em português
dc.title.alternative.eng.fl_str_mv	Syntactic-semantic description of predicative nouns in portuguese financial market tweets
title	Descrição sintático-semântica de nomes predicadores em tweets do mercado financeiro em português
spellingShingle	Descrição sintático-semântica de nomes predicadores em tweets do mercado financeiro em português Barbosa, Bryan Khelven da Silva PLN Tweet Nome predicador Estrutura de argumentos CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO LINGUISTICA, LETRAS E ARTES::LINGUISTICA
title_short	Descrição sintático-semântica de nomes predicadores em tweets do mercado financeiro em português
title_full	Descrição sintático-semântica de nomes predicadores em tweets do mercado financeiro em português
title_fullStr	Descrição sintático-semântica de nomes predicadores em tweets do mercado financeiro em português
title_full_unstemmed	Descrição sintático-semântica de nomes predicadores em tweets do mercado financeiro em português
title_sort	Descrição sintático-semântica de nomes predicadores em tweets do mercado financeiro em português
author	Barbosa, Bryan Khelven da Silva
author_facet	Barbosa, Bryan Khelven da Silva
author_role	author
dc.contributor.authorlattes.por.fl_str_mv	http://lattes.cnpq.br/8007703579388553
dc.contributor.authororcid.por.fl_str_mv	https://orcid.org/0000-0002-4637-6498
dc.contributor.advisor1orcid.por.fl_str_mv	https://orcid.org/0000-0002-4566-9352
dc.contributor.author.fl_str_mv	Barbosa, Bryan Khelven da Silva
dc.contributor.advisor1.fl_str_mv	Di Felippo, Ariani
dc.contributor.advisor1Lattes.fl_str_mv	http://lattes.cnpq.br/8648412103197455
contributor_str_mv	Di Felippo, Ariani
dc.subject.por.fl_str_mv	PLN Tweet Nome predicador Estrutura de argumentos
topic	PLN Tweet Nome predicador Estrutura de argumentos CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO LINGUISTICA, LETRAS E ARTES::LINGUISTICA
dc.subject.cnpq.fl_str_mv	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO LINGUISTICA, LETRAS E ARTES::LINGUISTICA
description	In this study, the argument structure (A-structure) of predicative nouns (Npred) occurring in the financial market tweet corpus DANTEStocks was described, given the preference for using this type of predicator in digital genres of this domain. The specific objective was to verify: (i) the presence/absence of arguments (Arg) in the tweets, (ii) the syntactic realization of Args, and (iii) the influence of linguistic phenomena in tweets on the realization of the A-structure of the nouns. Specifically, 145 Npreds and 1,756 instances (tweets with at least one Npred) from the corpus were described at the syntactic-semantic level. Syntactically, semi-automatic annotation of the entire DANTEStocks was carried out according to the Universal Dependencies (UD) model. At the semantic level, syntactic dependency trees guided the manual annotation of instances according to NomBank. The syntactic-semantic mapping revealed that: (i) the A-structure of valency one (V1) Npreds is always filled in syntax, (ii) the A-structure of Npreds with V>1 shows some missing Args, (iii) most analyzed Npreds are of V3, with only 2 Args in most instances, (iv) the deprels most frequently connecting Npreds to their Args are nmod and amod, and (v) the syntactic realization of the A-structure in 24 instances was reduced by tweet-specific phenomena (truncation and juxtaposition of elements). These results enrich the descriptive framework of lexical aspects of the language in financial market tweets. Moreover, the syntactic-semantic valency description of Npreds was systematized in NounBank.DS, an online lexical repository that can support further linguistic-computational research. A contribution to Natural Language Processing (NLP) is the UD-syntactic annotation of DANTEStocks, which led to the creation of the first Portuguese tweebank. This resource enabled the development of the first UD-parser of UGC for this language. The NomBank-like semantic annotation of a portion of the corpus also generated a significant resource. Thus, this study produced reference linguistic resources and a tool (parser) for the automatic processing of Portuguese tweets, which are essential for developing NLP applications targeting this type of UGC.
publishDate	2024
dc.date.accessioned.fl_str_mv	2024-09-10T11:50:54Z
dc.date.available.fl_str_mv	2024-09-10T11:50:54Z
dc.date.issued.fl_str_mv	2024-08-12
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	BARBOSA, Bryan Khelven da Silva. Descrição sintático-semântica de nomes predicadores em tweets do mercado financeiro em português. 2024. Dissertação (Mestrado em Linguística) – Universidade Federal de São Carlos, São Carlos, 2024. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/20503.
dc.identifier.uri.fl_str_mv	https://repositorio.ufscar.br/handle/20.500.14289/20503
identifier_str_mv	BARBOSA, Bryan Khelven da Silva. Descrição sintático-semântica de nomes predicadores em tweets do mercado financeiro em português. 2024. Dissertação (Mestrado em Linguística) – Universidade Federal de São Carlos, São Carlos, 2024. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/20503.
url	https://repositorio.ufscar.br/handle/20.500.14289/20503
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/ info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Universidade Federal de São Carlos Câmpus São Carlos
dc.publisher.program.fl_str_mv	Programa de Pós-Graduação em Linguística - PPGL
dc.publisher.initials.fl_str_mv	UFSCar
publisher.none.fl_str_mv	Universidade Federal de São Carlos Câmpus São Carlos
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFSCAR instname:Universidade Federal de São Carlos (UFSCAR) instacron:UFSCAR
instname_str	Universidade Federal de São Carlos (UFSCAR)
instacron_str	UFSCAR
institution	UFSCAR
reponame_str	Repositório Institucional da UFSCAR
collection	Repositório Institucional da UFSCAR
bitstream.url.fl_str_mv	https://repositorio.ufscar.br/bitstreams/d880629b-0220-41d9-b681-0e6d685e65c7/download https://repositorio.ufscar.br/bitstreams/c51eb39a-bd87-414f-9b44-961376bb89a6/download https://repositorio.ufscar.br/bitstreams/78bb090c-4700-49f8-9c4d-c1cb5c283d2f/download https://repositorio.ufscar.br/bitstreams/432bcc8e-2a2f-4288-8bf4-3280bb3d5a03/download
bitstream.checksum.fl_str_mv	c78bb51c01baadde9fe1db4065a033d8 2805e214a5c19e2ec7ae78b484319df0 e5df26599501feaf5a0f7a675d512206 f337d95da1fce0a22c77480e5e9a7aec
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)
repository.mail.fl_str_mv	repositorio.sibi@ufscar.br
_version_	1851688893288546304

Descrição sintático-semântica de nomes predicadores em tweets do mercado financeiro em português

Registros relacionados