Employing syntactical dependency and a mesoscopic scale to model books\' narratives through recurrence networks

Detalhes bibliográficos
Ano de defesa: 2023
Autor(a) principal: Souza, Bárbara Cortes e
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
NLP
PLN
Link de acesso: https://www.teses.usp.br/teses/disponiveis/55/55134/tde-02042024-162451/
Resumo: In recent years, science has been deeply impacted by the growing amount of data available for research. Specifically, the continuous increase of textual data availability has been essential for the development and proposal of new methodologies to tackle text processing problems. There are several new approaches that focus on different components of linguistics, such as lexicon, syntax and semantics. Natural Language Processing, for example, is a multidisciplinary field that concerns the interaction between natural languages and computers. Some examples of problems in this field are: topic detection, text classification, stylometry, automatic summarization, and others. Since natural languages are actually complex systems, it is also appropriate to represent them as complex networks, to help address these various challenges. One well known example of text modelling method is the word adjacency network, that maps each of the words in a text into nodes, and create an edge between any pair of terms that occur adjacent to each other in the text. In this Masters work, however, we focus on a larger, mesoscopic scale with the intent of capturing the overall context of a narrative. In this methodology, a single node refers to a sequence of paragraphs in the text, and the edges are created between the most similar ones. Additionally, we apply syntactical dependency knowledge to increase informativeness and, therefore, obtain a better performance on grasping the contextual semantics of the text. Finally, one can extract significant network measures in order to characterize it, including accessibility, symmetry and the new proposed recurrence signature, as a manner of capturing topological properties that reflect the narratives context. Several method validations have been performed, including a comparison with other trivial measures, two experiments to discriminate real from meaningless texts and between literary genres and, finally, a comparison of the current method to other orthodox approaches, namely co-occurrence networks and doc2vec.
id USP_700aa17a1da9bc2b2521ff1dbf913dde
oai_identifier_str oai:teses.usp.br:tde-02042024-162451
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str
spelling Employing syntactical dependency and a mesoscopic scale to model books\' narratives through recurrence networksAplicando dependência sintática e uma escala mesoscópica para modelar narrativas de livros a partir de redes de recorrênciaComplex networksDependência sintáticaEscala mesoscópicaMesoscopic scaleNLPPLNRecurrence networksRedes complexasRedes de recorrênciaSyntactical dependencyIn recent years, science has been deeply impacted by the growing amount of data available for research. Specifically, the continuous increase of textual data availability has been essential for the development and proposal of new methodologies to tackle text processing problems. There are several new approaches that focus on different components of linguistics, such as lexicon, syntax and semantics. Natural Language Processing, for example, is a multidisciplinary field that concerns the interaction between natural languages and computers. Some examples of problems in this field are: topic detection, text classification, stylometry, automatic summarization, and others. Since natural languages are actually complex systems, it is also appropriate to represent them as complex networks, to help address these various challenges. One well known example of text modelling method is the word adjacency network, that maps each of the words in a text into nodes, and create an edge between any pair of terms that occur adjacent to each other in the text. In this Masters work, however, we focus on a larger, mesoscopic scale with the intent of capturing the overall context of a narrative. In this methodology, a single node refers to a sequence of paragraphs in the text, and the edges are created between the most similar ones. Additionally, we apply syntactical dependency knowledge to increase informativeness and, therefore, obtain a better performance on grasping the contextual semantics of the text. Finally, one can extract significant network measures in order to characterize it, including accessibility, symmetry and the new proposed recurrence signature, as a manner of capturing topological properties that reflect the narratives context. Several method validations have been performed, including a comparison with other trivial measures, two experiments to discriminate real from meaningless texts and between literary genres and, finally, a comparison of the current method to other orthodox approaches, namely co-occurrence networks and doc2vec.Nos últimos anos, a ciência tem sido fortemente influenciada pelo contínuo aumento no volume de informações disponíveis à pesquisa. Especificamente, o crescimento da quantidade de dados textuais desempenhou um papel fundamental no desenvolvimento e na apresentação de novas metodologias para abordar desafios na área de processamento de textos. Diversas abordagens inovadoras têm surgido, com enfoque em diferentes componentes da linguística, como léxico, sintaxe e semântica. O Processamento de Linguagem Natural, por exemplo, é um campo multidisciplinar que aborda a interação entre linguagens naturais e computadores. Alguns exemplos de problemas dessa área são: detecção de tópicos, classificação de textos, estilometria, sumarização automática, entre outros. Dado que linguagens naturais são consideradas sistemas complexos, é apropriado que sejam representadas por redes complexas, para auxiliar na resolução desses diferentes tipos de problemas. Um conhecido método de modelagem de textos é a rede de adjacência de palavras, na qual cada nó mapeia uma palavra do texto e arestas são criadas entre termos que ocorrem em sequência no texto. Neste projeto de Mestrado, no entanto, o foco é em uma escala mesoscópica mais abrangente, visando a capturar o contexto geral da narrativa. Nessa metodologia, um nó se refere a uma sequência de parágrafos do texto, e arestas são criadas entre os mais similares. Adicionalmente, uma análise de dependência sintática é aplicada para aumentar o nível de informatividade e, portanto, obter uma performance superior em capturar o contexto semântico de um texto. Finalmente, é possível extrair medidas de rede significativas para sua caracterização, incluindo acessibilidade, simetria e a proposta Assinatura de Recorrência, como forma de capturar as propriedades topológicas que refletem o contexto narrativo. Diversas validações de método foram executadas, incluindo uma comparação com outras medidas de rede triviais, dois experimentos para diferenciar entre textos reais e randomizados e entre diferentes gêneros literários, e, finalmente, uma comparação do método proposto com outras abordagens mais ortodoxas na literatura: redes de co-ocorrência e doc2vec.Biblioteca Digitais de Teses e Dissertações da USPAmancio, Diego RaphaelSouza, Bárbara Cortes e2023-12-05info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/55/55134/tde-02042024-162451/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2024-04-02T19:30:02Zoai:teses.usp.br:tde-02042024-162451Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212024-04-02T19:30:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Employing syntactical dependency and a mesoscopic scale to model books\' narratives through recurrence networks
Aplicando dependência sintática e uma escala mesoscópica para modelar narrativas de livros a partir de redes de recorrência
title Employing syntactical dependency and a mesoscopic scale to model books\' narratives through recurrence networks
spellingShingle Employing syntactical dependency and a mesoscopic scale to model books\' narratives through recurrence networks
Souza, Bárbara Cortes e
Complex networks
Dependência sintática
Escala mesoscópica
Mesoscopic scale
NLP
PLN
Recurrence networks
Redes complexas
Redes de recorrência
Syntactical dependency
title_short Employing syntactical dependency and a mesoscopic scale to model books\' narratives through recurrence networks
title_full Employing syntactical dependency and a mesoscopic scale to model books\' narratives through recurrence networks
title_fullStr Employing syntactical dependency and a mesoscopic scale to model books\' narratives through recurrence networks
title_full_unstemmed Employing syntactical dependency and a mesoscopic scale to model books\' narratives through recurrence networks
title_sort Employing syntactical dependency and a mesoscopic scale to model books\' narratives through recurrence networks
author Souza, Bárbara Cortes e
author_facet Souza, Bárbara Cortes e
author_role author
dc.contributor.none.fl_str_mv Amancio, Diego Raphael
dc.contributor.author.fl_str_mv Souza, Bárbara Cortes e
dc.subject.por.fl_str_mv Complex networks
Dependência sintática
Escala mesoscópica
Mesoscopic scale
NLP
PLN
Recurrence networks
Redes complexas
Redes de recorrência
Syntactical dependency
topic Complex networks
Dependência sintática
Escala mesoscópica
Mesoscopic scale
NLP
PLN
Recurrence networks
Redes complexas
Redes de recorrência
Syntactical dependency
description In recent years, science has been deeply impacted by the growing amount of data available for research. Specifically, the continuous increase of textual data availability has been essential for the development and proposal of new methodologies to tackle text processing problems. There are several new approaches that focus on different components of linguistics, such as lexicon, syntax and semantics. Natural Language Processing, for example, is a multidisciplinary field that concerns the interaction between natural languages and computers. Some examples of problems in this field are: topic detection, text classification, stylometry, automatic summarization, and others. Since natural languages are actually complex systems, it is also appropriate to represent them as complex networks, to help address these various challenges. One well known example of text modelling method is the word adjacency network, that maps each of the words in a text into nodes, and create an edge between any pair of terms that occur adjacent to each other in the text. In this Masters work, however, we focus on a larger, mesoscopic scale with the intent of capturing the overall context of a narrative. In this methodology, a single node refers to a sequence of paragraphs in the text, and the edges are created between the most similar ones. Additionally, we apply syntactical dependency knowledge to increase informativeness and, therefore, obtain a better performance on grasping the contextual semantics of the text. Finally, one can extract significant network measures in order to characterize it, including accessibility, symmetry and the new proposed recurrence signature, as a manner of capturing topological properties that reflect the narratives context. Several method validations have been performed, including a comparison with other trivial measures, two experiments to discriminate real from meaningless texts and between literary genres and, finally, a comparison of the current method to other orthodox approaches, namely co-occurrence networks and doc2vec.
publishDate 2023
dc.date.none.fl_str_mv 2023-12-05
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://www.teses.usp.br/teses/disponiveis/55/55134/tde-02042024-162451/
url https://www.teses.usp.br/teses/disponiveis/55/55134/tde-02042024-162451/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1815257806438137856