Employing syntactical dependency and a mesoscopic scale to model books\' narratives through recurrence networks

Souza, Bárbara Cortes e

Employing syntactical dependency and a mesoscopic scale to model books\' narratives through recurrence networks

Detalhes bibliográficos
Ano de defesa:	2023
Autor(a) principal:	Souza, Bárbara Cortes e
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	eng
Instituição de defesa:	Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Complex networks Dependência sintática Escala mesoscópica Mesoscopic scale NLP PLN Recurrence networks Redes complexas Redes de recorrência Syntactical dependency
Link de acesso:	https://www.teses.usp.br/teses/disponiveis/55/55134/tde-02042024-162451/
Resumo:	In recent years, science has been deeply impacted by the growing amount of data available for research. Specifically, the continuous increase of textual data availability has been essential for the development and proposal of new methodologies to tackle text processing problems. There are several new approaches that focus on different components of linguistics, such as lexicon, syntax and semantics. Natural Language Processing, for example, is a multidisciplinary field that concerns the interaction between natural languages and computers. Some examples of problems in this field are: topic detection, text classification, stylometry, automatic summarization, and others. Since natural languages are actually complex systems, it is also appropriate to represent them as complex networks, to help address these various challenges. One well known example of text modelling method is the word adjacency network, that maps each of the words in a text into nodes, and create an edge between any pair of terms that occur adjacent to each other in the text. In this Masters work, however, we focus on a larger, mesoscopic scale with the intent of capturing the overall context of a narrative. In this methodology, a single node refers to a sequence of paragraphs in the text, and the edges are created between the most similar ones. Additionally, we apply syntactical dependency knowledge to increase informativeness and, therefore, obtain a better performance on grasping the contextual semantics of the text. Finally, one can extract significant network measures in order to characterize it, including accessibility, symmetry and the new proposed recurrence signature, as a manner of capturing topological properties that reflect the narratives context. Several method validations have been performed, including a comparison with other trivial measures, two experiments to discriminate real from meaningless texts and between literary genres and, finally, a comparison of the current method to other orthodox approaches, namely co-occurrence networks and doc2vec.

Metadados do item

id	USP_700aa17a1da9bc2b2521ff1dbf913dde
oai_identifier_str	oai:teses.usp.br:tde-02042024-162451
network_acronym_str	USP
network_name_str	Biblioteca Digital de Teses e Dissertações da USP
repository_id_str
spelling	Employing syntactical dependency and a mesoscopic scale to model books\' narratives through recurrence networksAplicando dependência sintática e uma escala mesoscópica para modelar narrativas de livros a partir de redes de recorrênciaComplex networksDependência sintáticaEscala mesoscópicaMesoscopic scaleNLPPLNRecurrence networksRedes complexasRedes de recorrênciaSyntactical dependencyIn recent years, science has been deeply impacted by the growing amount of data available for research. Specifically, the continuous increase of textual data availability has been essential for the development and proposal of new methodologies to tackle text processing problems. There are several new approaches that focus on different components of linguistics, such as lexicon, syntax and semantics. Natural Language Processing, for example, is a multidisciplinary field that concerns the interaction between natural languages and computers. Some examples of problems in this field are: topic detection, text classification, stylometry, automatic summarization, and others. Since natural languages are actually complex systems, it is also appropriate to represent them as complex networks, to help address these various challenges. One well known example of text modelling method is the word adjacency network, that maps each of the words in a text into nodes, and create an edge between any pair of terms that occur adjacent to each other in the text. In this Masters work, however, we focus on a larger, mesoscopic scale with the intent of capturing the overall context of a narrative. In this methodology, a single node refers to a sequence of paragraphs in the text, and the edges are created between the most similar ones. Additionally, we apply syntactical dependency knowledge to increase informativeness and, therefore, obtain a better performance on grasping the contextual semantics of the text. Finally, one can extract significant network measures in order to characterize it, including accessibility, symmetry and the new proposed recurrence signature, as a manner of capturing topological properties that reflect the narratives context. Several method validations have been performed, including a comparison with other trivial measures, two experiments to discriminate real from meaningless texts and between literary genres and, finally, a comparison of the current method to other orthodox approaches, namely co-occurrence networks and doc2vec.Nos últimos anos, a ciência tem sido fortemente influenciada pelo contínuo aumento no volume de informações disponíveis à pesquisa. Especificamente, o crescimento da quantidade de dados textuais desempenhou um papel fundamental no desenvolvimento e na apresentação de novas metodologias para abordar desafios na área de processamento de textos. Diversas abordagens inovadoras têm surgido, com enfoque em diferentes componentes da linguística, como léxico, sintaxe e semântica. O Processamento de Linguagem Natural, por exemplo, é um campo multidisciplinar que aborda a interação entre linguagens naturais e computadores. Alguns exemplos de problemas dessa área são: detecção de tópicos, classificação de textos, estilometria, sumarização automática, entre outros. Dado que linguagens naturais são consideradas sistemas complexos, é apropriado que sejam representadas por redes complexas, para auxiliar na resolução desses diferentes tipos de problemas. Um conhecido método de modelagem de textos é a rede de adjacência de palavras, na qual cada nó mapeia uma palavra do texto e arestas são criadas entre termos que ocorrem em sequência no texto. Neste projeto de Mestrado, no entanto, o foco é em uma escala mesoscópica mais abrangente, visando a capturar o contexto geral da narrativa. Nessa metodologia, um nó se refere a uma sequência de parágrafos do texto, e arestas são criadas entre os mais similares. Adicionalmente, uma análise de dependência sintática é aplicada para aumentar o nível de informatividade e, portanto, obter uma performance superior em capturar o contexto semântico de um texto. Finalmente, é possível extrair medidas de rede significativas para sua caracterização, incluindo acessibilidade, simetria e a proposta Assinatura de Recorrência, como forma de capturar as propriedades topológicas que refletem o contexto narrativo. Diversas validações de método foram executadas, incluindo uma comparação com outras medidas de rede triviais, dois experimentos para diferenciar entre textos reais e randomizados e entre diferentes gêneros literários, e, finalmente, uma comparação do método proposto com outras abordagens mais ortodoxas na literatura: redes de co-ocorrência e doc2vec.Biblioteca Digitais de Teses e Dissertações da USPAmancio, Diego RaphaelSouza, Bárbara Cortes e2023-12-05info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/55/55134/tde-02042024-162451/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2024-04-02T19:30:02Zoai:teses.usp.br:tde-02042024-162451Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.bropendoar:27212024-04-02T19:30:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv	Employing syntactical dependency and a mesoscopic scale to model books\' narratives through recurrence networks Aplicando dependência sintática e uma escala mesoscópica para modelar narrativas de livros a partir de redes de recorrência
title	Employing syntactical dependency and a mesoscopic scale to model books\' narratives through recurrence networks
spellingShingle	Employing syntactical dependency and a mesoscopic scale to model books\' narratives through recurrence networks Souza, Bárbara Cortes e Complex networks Dependência sintática Escala mesoscópica Mesoscopic scale NLP PLN Recurrence networks Redes complexas Redes de recorrência Syntactical dependency
title_short	Employing syntactical dependency and a mesoscopic scale to model books\' narratives through recurrence networks
title_full	Employing syntactical dependency and a mesoscopic scale to model books\' narratives through recurrence networks
title_fullStr	Employing syntactical dependency and a mesoscopic scale to model books\' narratives through recurrence networks
title_full_unstemmed	Employing syntactical dependency and a mesoscopic scale to model books\' narratives through recurrence networks
title_sort	Employing syntactical dependency and a mesoscopic scale to model books\' narratives through recurrence networks
author	Souza, Bárbara Cortes e
author_facet	Souza, Bárbara Cortes e
author_role	author
dc.contributor.none.fl_str_mv	Amancio, Diego Raphael
dc.contributor.author.fl_str_mv	Souza, Bárbara Cortes e
dc.subject.por.fl_str_mv	Complex networks Dependência sintática Escala mesoscópica Mesoscopic scale NLP PLN Recurrence networks Redes complexas Redes de recorrência Syntactical dependency
topic	Complex networks Dependência sintática Escala mesoscópica Mesoscopic scale NLP PLN Recurrence networks Redes complexas Redes de recorrência Syntactical dependency
description	In recent years, science has been deeply impacted by the growing amount of data available for research. Specifically, the continuous increase of textual data availability has been essential for the development and proposal of new methodologies to tackle text processing problems. There are several new approaches that focus on different components of linguistics, such as lexicon, syntax and semantics. Natural Language Processing, for example, is a multidisciplinary field that concerns the interaction between natural languages and computers. Some examples of problems in this field are: topic detection, text classification, stylometry, automatic summarization, and others. Since natural languages are actually complex systems, it is also appropriate to represent them as complex networks, to help address these various challenges. One well known example of text modelling method is the word adjacency network, that maps each of the words in a text into nodes, and create an edge between any pair of terms that occur adjacent to each other in the text. In this Masters work, however, we focus on a larger, mesoscopic scale with the intent of capturing the overall context of a narrative. In this methodology, a single node refers to a sequence of paragraphs in the text, and the edges are created between the most similar ones. Additionally, we apply syntactical dependency knowledge to increase informativeness and, therefore, obtain a better performance on grasping the contextual semantics of the text. Finally, one can extract significant network measures in order to characterize it, including accessibility, symmetry and the new proposed recurrence signature, as a manner of capturing topological properties that reflect the narratives context. Several method validations have been performed, including a comparison with other trivial measures, two experiments to discriminate real from meaningless texts and between literary genres and, finally, a comparison of the current method to other orthodox approaches, namely co-occurrence networks and doc2vec.
publishDate	2023
dc.date.none.fl_str_mv	2023-12-05
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://www.teses.usp.br/teses/disponiveis/55/55134/tde-02042024-162451/
url	https://www.teses.usp.br/teses/disponiveis/55/55134/tde-02042024-162451/
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv	Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Liberar o conteúdo para acesso público.
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP
instname_str	Universidade de São Paulo (USP)
instacron_str	USP
institution	USP
reponame_str	Biblioteca Digital de Teses e Dissertações da USP
collection	Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv	virginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.br
_version_	1815257806438137856

Employing syntactical dependency and a mesoscopic scale to model books\' narratives through recurrence networks

Registros relacionados