Redes da textualidade: ciência das redes aplicada à modelagem da coesão lexical

Oliveira, Davi Alves

Redes da textualidade: ciência das redes aplicada à modelagem da coesão lexical

Detalhes bibliográficos
Ano de defesa:	2025
Autor(a) principal:	Oliveira, Davi Alves
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Instituto Federal de Educação, Ciência e Tecnologia da Bahia Brasil PROGRAMA DE PÓS-GRADUAÇÃO MULTI-INSTITUCIONAL EM DIFUSÃO DO CONHECIMENTO (DMMDC) Doutorado Multi-Institucional e Multidisciplinar em Difusão do Conhecimento (DMMDC) IFBA
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Coesão textual Coesão lexical Ciência das redes Redes semânticas Redes textuais Textual cohesion Lexical cohesion Network science Semantic networks Textual networks CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::SISTEMAS DE COMPUTACAO
Link de acesso:	https://repositorio.ifba.edu.br/jspui/handle/123456789/1064
Resumo:	This doctoral thesis aimed to develop a generalizable model of lexical cohesion based on network science to evaluate patterns in texts of different genres. Its specific objectives were: (a) to define network-based indices of lexical cohesion to differentiate texts from random sets of sentences, (b) to characterize the topology of the networks to verify emergent patterns and establish analysis criteria, (c) to construct a network-based method for analyzing lexical cohesion, and (d) to analyze the lexical cohesion of different text genres to classify them. Adopting critical rationalism and Grounded Theory, four experiments were conducted based on data mining and network analysis. Experiment 1 defined six indices of lexical cohesion and evaluated them in 60 texts from 6 genres and 60 pseudo-texts. The indices were calculated for each sentence and average indices were calculated for each text and each pseudo-text. The values of the indices were. Three indices differentiated texts from pseudo-texts, and one differentiated text genres. The first 10 to 60 sentences proved sufficient for the analysis of textual cohesion with the proposed method. Experiment 2 investigated the impact of different definitions of elements inserted in the networks’ construction. The definitions did not affect the identification of small-world behavior, calculated by efficiency, but influenced the identification of scale invariance. Experiment 3 defined levels of pseudo-texts and non-texts, comparing their cohesion indices with texts. The need for manual text cleaning was also evaluated, and the edge definition method was revised. The revised method and the use of uncleaned texts proved efficient. Non-texts generated by random tokens and uniform random selection resulted in the most appropriate null model. Experiment 4 replicated the results of Experiment 1 with 870 texts and 145 non-texts using the Method for Lexical Cohesion Analysis Based on Network Science initiated in Experiment 1 and refined in Experiments 2 and 3. The average indices of lexical cohesion showed statistically significant differences between texts and non-texts in the six genres, with emphasis on the Average Global Backward Vertex Cohesion Index, which is affected by text size. The global backward vertex and edge cohesion indices exhibited distinct behaviors per genre in their evolution throughout the text. Text classification using the Average Global Backward Vertex Cohesion Index and the logarithm of the number of periods achieved an average accuracy of 70%. It was concluded that network-based indices capture lexical repetition patterns that characterize texts, differentiating them from non-texts and capturing differences between genres, mainly the Average Global Backward Vertex Cohesion Index. The model contributes to unifying computational methods of textual analysis and psycholinguistic theories of text, formalizing its computational representation as a complex system. Based on the results, six hypotheses were constructed to expand the model.

Metadados do item

id	IFBA-1_c29cd6a7ed0f2ac6e6ee0492cb00b038
oai_identifier_str	oai:repositorio.ifba.edu.br:123456789/1064
network_acronym_str	IFBA-1
network_name_str	Repositório Institucional da IFBA
repository_id_str
spelling	Redes da textualidade: ciência das redes aplicada à modelagem da coesão lexicalCoesão textualCoesão lexicalCiência das redesRedes semânticasRedes textuaisTextual cohesionLexical cohesionNetwork scienceSemantic networksTextual networksCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::SISTEMAS DE COMPUTACAOThis doctoral thesis aimed to develop a generalizable model of lexical cohesion based on network science to evaluate patterns in texts of different genres. Its specific objectives were: (a) to define network-based indices of lexical cohesion to differentiate texts from random sets of sentences, (b) to characterize the topology of the networks to verify emergent patterns and establish analysis criteria, (c) to construct a network-based method for analyzing lexical cohesion, and (d) to analyze the lexical cohesion of different text genres to classify them. Adopting critical rationalism and Grounded Theory, four experiments were conducted based on data mining and network analysis. Experiment 1 defined six indices of lexical cohesion and evaluated them in 60 texts from 6 genres and 60 pseudo-texts. The indices were calculated for each sentence and average indices were calculated for each text and each pseudo-text. The values of the indices were. Three indices differentiated texts from pseudo-texts, and one differentiated text genres. The first 10 to 60 sentences proved sufficient for the analysis of textual cohesion with the proposed method. Experiment 2 investigated the impact of different definitions of elements inserted in the networks’ construction. The definitions did not affect the identification of small-world behavior, calculated by efficiency, but influenced the identification of scale invariance. Experiment 3 defined levels of pseudo-texts and non-texts, comparing their cohesion indices with texts. The need for manual text cleaning was also evaluated, and the edge definition method was revised. The revised method and the use of uncleaned texts proved efficient. Non-texts generated by random tokens and uniform random selection resulted in the most appropriate null model. Experiment 4 replicated the results of Experiment 1 with 870 texts and 145 non-texts using the Method for Lexical Cohesion Analysis Based on Network Science initiated in Experiment 1 and refined in Experiments 2 and 3. The average indices of lexical cohesion showed statistically significant differences between texts and non-texts in the six genres, with emphasis on the Average Global Backward Vertex Cohesion Index, which is affected by text size. The global backward vertex and edge cohesion indices exhibited distinct behaviors per genre in their evolution throughout the text. Text classification using the Average Global Backward Vertex Cohesion Index and the logarithm of the number of periods achieved an average accuracy of 70%. It was concluded that network-based indices capture lexical repetition patterns that characterize texts, differentiating them from non-texts and capturing differences between genres, mainly the Average Global Backward Vertex Cohesion Index. The model contributes to unifying computational methods of textual analysis and psycholinguistic theories of text, formalizing its computational representation as a complex system. Based on the results, six hypotheses were constructed to expand the model.Este trabalho doutoral objetivou elaborar um modelo generalizável de coesão lexical baseado na ciência das redes para avaliar padrões em textos de diferentes gêneros. Como objetivos específicos, pretendeu: (a) definir índices de coesão lexical baseados em redes para diferenciar textos de conjuntos aleatórios de períodos, (b) caracterizar a topologia das redes utilizadas para verificar padrões emergentes e estabelecer critérios de análise, (c) construir um Método de Análise da Coesão Lexical Baseada na Ciência das Redes e (d) analisar a coesão lexical de diferentes gêneros textuais para classificá-los. Adotando o racionalismo-crítico e a Teoria Fundamentada em Dados, foram conduzidos quatro experimentos baseados em mineração de dados e análise de redes. O Experimento 1 definiu seis índices de coesão lexical e os avaliou em 60 textos de 6 gêneros e 60 pseudotextos. Os índices foram calculados para cada período e índices médios foram calculados para cada texto e pseudotexto. Três índices médios diferenciaram textos de pseudotextos e um diferenciou gêneros textuais. Os primeiros 10 a 60 períodos se mostraram suficientes para a análise da coesão lexical com o método proposto. O Experimento 2 investigou o impacto de diferentes definições de elementos inseridos na construção das redes. As definições não afetaram a identificação do comportamento de mundo pequeno calculado pela eficiência, mas influenciaram a identificação da invariância de escala. O Experimento 3 definiu níveis de pseudotextos e não-textos, comparando seus índices de coesão com textos. Também foi avaliada a necessidade de limpeza manual de textos e foi feita a revisão do método de definição de arestas. O método revisado e o uso de textos sem limpeza mostraram-se eficientes. Não-textos gerados por tokens aleatórios e seleção aleatória uniforme resultaram no modelo nulo mais apropriado. O Experimento 4 replicou os resultados do Experimento 1 com 870 textos e 145 não-textos utilizando o Método de Análise da Coesão Lexical Baseada na Ciência das Redes iniciado com o Experimento 1 e refinado nos Experimentos 2 e 3. Os índices de coesão lexical médios apresentaram diferenças estatisticamente significativas entre textos e não-textos nos seis gêneros, com destaque para o Índice Médio de Coesão Regressiva Global de Vértices, afetado pelo tamanho do texto. Os índices de coesão regressiva global de vértices e de arestas exibiram comportamentos distintos por gênero em sua evolução ao longo do texto. A classificação de textos usando o Índice Médio de Coesão Regressiva Global de Vértices e o logaritmo do número de períodos alcançou 70% de acurácia média. Concluiu-se que os índices baseados em redes capturam padrões de repetição lexical que caracterizam textos diferenciando-os de não-textos e capturam diferenças entre gêneros, principalmente o Índice Médio de Coesão Regressiva Global de Vértices. O modelo contribui para unificar métodos computacionais de análise textual e teorias psicolinguísticas de texto, formalizando sua representação computacional como sistema complexo. Com base nos resultados, seis hipóteses foram construídas com vistas a expandir o modelo.Instituto Federal de Educação, Ciência e Tecnologia da BahiaBrasilPROGRAMA DE PÓS-GRADUAÇÃO MULTI-INSTITUCIONAL EM DIFUSÃO DO CONHECIMENTO (DMMDC)Doutorado Multi-Institucional e Multidisciplinar em Difusão do Conhecimento (DMMDC)IFBAPereira, Hernane Borges de Barroshttp://lattes.cnpq.br/1706259684834362Kritz, Maurício VieiraMiranda, José Garcia VivasOliveira, Roberta Pires dePacheco, Roberto Carlos dos SantosOliveira, Davi Alves2026-04-13T12:37:52Z2025-12-022026-04-13T12:37:52Z2025-10-06info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfapplication/pdfOLIVEIRA, Davi Alves. Redes da textualidade: ciência das redes aplicada à modelagem da coesão lexical. 2025. 172 f. Tese (Doutorado) – Programa de Pós-Graduação Multi-Institucional em Difusão do Conhecimento (PPGDC/DMMDC), Instituto Federal da Bahia, Salvador, 2025.https://repositorio.ifba.edu.br/jspui/handle/123456789/1064porAttribution 3.0 United Stateshttp://creativecommons.org/licenses/by/3.0/us/info:eu-repo/semantics/openAccessreponame:Repositório Institucional da IFBAinstname:Instituto Federal de Educação, Ciência e Tecnologia da Bahia (IFBA)instacron:IFBA2026-04-13T12:37:56Zoai:repositorio.ifba.edu.br:123456789/1064Repositório InstitucionalPUBhttp://repositorio.ifba.edu.br:8080/oai/requestsib@ifba.edu.br \|\| andreiasr@ifba.edu.br \|\| repositorio@ifba.edu.bropendoar:oai:repositorio.ifba.edu.br:123456789/12342026-04-13T12:37:56Repositório Institucional da IFBA - Instituto Federal de Educação, Ciência e Tecnologia da Bahia (IFBA)false
dc.title.none.fl_str_mv	Redes da textualidade: ciência das redes aplicada à modelagem da coesão lexical
title	Redes da textualidade: ciência das redes aplicada à modelagem da coesão lexical
spellingShingle	Redes da textualidade: ciência das redes aplicada à modelagem da coesão lexical Oliveira, Davi Alves Coesão textual Coesão lexical Ciência das redes Redes semânticas Redes textuais Textual cohesion Lexical cohesion Network science Semantic networks Textual networks CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::SISTEMAS DE COMPUTACAO
title_short	Redes da textualidade: ciência das redes aplicada à modelagem da coesão lexical
title_full	Redes da textualidade: ciência das redes aplicada à modelagem da coesão lexical
title_fullStr	Redes da textualidade: ciência das redes aplicada à modelagem da coesão lexical
title_full_unstemmed	Redes da textualidade: ciência das redes aplicada à modelagem da coesão lexical
title_sort	Redes da textualidade: ciência das redes aplicada à modelagem da coesão lexical
author	Oliveira, Davi Alves
author_facet	Oliveira, Davi Alves
author_role	author
dc.contributor.none.fl_str_mv	Pereira, Hernane Borges de Barros http://lattes.cnpq.br/1706259684834362 Kritz, Maurício Vieira Miranda, José Garcia Vivas Oliveira, Roberta Pires de Pacheco, Roberto Carlos dos Santos
dc.contributor.author.fl_str_mv	Oliveira, Davi Alves
dc.subject.por.fl_str_mv	Coesão textual Coesão lexical Ciência das redes Redes semânticas Redes textuais Textual cohesion Lexical cohesion Network science Semantic networks Textual networks CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::SISTEMAS DE COMPUTACAO
topic	Coesão textual Coesão lexical Ciência das redes Redes semânticas Redes textuais Textual cohesion Lexical cohesion Network science Semantic networks Textual networks CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::SISTEMAS DE COMPUTACAO
description	This doctoral thesis aimed to develop a generalizable model of lexical cohesion based on network science to evaluate patterns in texts of different genres. Its specific objectives were: (a) to define network-based indices of lexical cohesion to differentiate texts from random sets of sentences, (b) to characterize the topology of the networks to verify emergent patterns and establish analysis criteria, (c) to construct a network-based method for analyzing lexical cohesion, and (d) to analyze the lexical cohesion of different text genres to classify them. Adopting critical rationalism and Grounded Theory, four experiments were conducted based on data mining and network analysis. Experiment 1 defined six indices of lexical cohesion and evaluated them in 60 texts from 6 genres and 60 pseudo-texts. The indices were calculated for each sentence and average indices were calculated for each text and each pseudo-text. The values of the indices were. Three indices differentiated texts from pseudo-texts, and one differentiated text genres. The first 10 to 60 sentences proved sufficient for the analysis of textual cohesion with the proposed method. Experiment 2 investigated the impact of different definitions of elements inserted in the networks’ construction. The definitions did not affect the identification of small-world behavior, calculated by efficiency, but influenced the identification of scale invariance. Experiment 3 defined levels of pseudo-texts and non-texts, comparing their cohesion indices with texts. The need for manual text cleaning was also evaluated, and the edge definition method was revised. The revised method and the use of uncleaned texts proved efficient. Non-texts generated by random tokens and uniform random selection resulted in the most appropriate null model. Experiment 4 replicated the results of Experiment 1 with 870 texts and 145 non-texts using the Method for Lexical Cohesion Analysis Based on Network Science initiated in Experiment 1 and refined in Experiments 2 and 3. The average indices of lexical cohesion showed statistically significant differences between texts and non-texts in the six genres, with emphasis on the Average Global Backward Vertex Cohesion Index, which is affected by text size. The global backward vertex and edge cohesion indices exhibited distinct behaviors per genre in their evolution throughout the text. Text classification using the Average Global Backward Vertex Cohesion Index and the logarithm of the number of periods achieved an average accuracy of 70%. It was concluded that network-based indices capture lexical repetition patterns that characterize texts, differentiating them from non-texts and capturing differences between genres, mainly the Average Global Backward Vertex Cohesion Index. The model contributes to unifying computational methods of textual analysis and psycholinguistic theories of text, formalizing its computational representation as a complex system. Based on the results, six hypotheses were constructed to expand the model.
publishDate	2025
dc.date.none.fl_str_mv	2025-12-02 2025-10-06 2026-04-13T12:37:52Z 2026-04-13T12:37:52Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	OLIVEIRA, Davi Alves. Redes da textualidade: ciência das redes aplicada à modelagem da coesão lexical. 2025. 172 f. Tese (Doutorado) – Programa de Pós-Graduação Multi-Institucional em Difusão do Conhecimento (PPGDC/DMMDC), Instituto Federal da Bahia, Salvador, 2025. https://repositorio.ifba.edu.br/jspui/handle/123456789/1064
identifier_str_mv	OLIVEIRA, Davi Alves. Redes da textualidade: ciência das redes aplicada à modelagem da coesão lexical. 2025. 172 f. Tese (Doutorado) – Programa de Pós-Graduação Multi-Institucional em Difusão do Conhecimento (PPGDC/DMMDC), Instituto Federal da Bahia, Salvador, 2025.
url	https://repositorio.ifba.edu.br/jspui/handle/123456789/1064
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	Attribution 3.0 United States http://creativecommons.org/licenses/by/3.0/us/ info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Attribution 3.0 United States http://creativecommons.org/licenses/by/3.0/us/
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf application/pdf
dc.publisher.none.fl_str_mv	Instituto Federal de Educação, Ciência e Tecnologia da Bahia Brasil PROGRAMA DE PÓS-GRADUAÇÃO MULTI-INSTITUCIONAL EM DIFUSÃO DO CONHECIMENTO (DMMDC) Doutorado Multi-Institucional e Multidisciplinar em Difusão do Conhecimento (DMMDC) IFBA
publisher.none.fl_str_mv	Instituto Federal de Educação, Ciência e Tecnologia da Bahia Brasil PROGRAMA DE PÓS-GRADUAÇÃO MULTI-INSTITUCIONAL EM DIFUSÃO DO CONHECIMENTO (DMMDC) Doutorado Multi-Institucional e Multidisciplinar em Difusão do Conhecimento (DMMDC) IFBA
dc.source.none.fl_str_mv	reponame:Repositório Institucional da IFBA instname:Instituto Federal de Educação, Ciência e Tecnologia da Bahia (IFBA) instacron:IFBA
instname_str	Instituto Federal de Educação, Ciência e Tecnologia da Bahia (IFBA)
instacron_str	IFBA
institution	IFBA
reponame_str	Repositório Institucional da IFBA
collection	Repositório Institucional da IFBA
repository.name.fl_str_mv	Repositório Institucional da IFBA - Instituto Federal de Educação, Ciência e Tecnologia da Bahia (IFBA)
repository.mail.fl_str_mv	sib@ifba.edu.br \|\| andreiasr@ifba.edu.br \|\| repositorio@ifba.edu.br
_version_	1865813333068742656

Redes da textualidade: ciência das redes aplicada à modelagem da coesão lexical

Registros relacionados