Um estudo direcionado por corpora: estruturas lexicais em um corpus especializado
| Ano de defesa: | 2024 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Dissertação |
| Tipo de acesso: | Acesso aberto |
| Idioma: | por |
| Instituição de defesa: |
Universidade Federal de Minas Gerais
|
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | https://hdl.handle.net/1843/75902 |
Resumo: | Linguistic studies related to phraseology have been gaining credibility, especially regarding the formation and analysis of formulaic language (Hunston & Francis, 1999; Wray, 2002; Biber, 2009). In this research, we aimed to directly identify, analyze, and classify Phrasal frames (P-frames) from a corpus-driven perspective (Biber, 2012). We considered the hypothesis of identifying p-frames independently of Lexical Bundles (LBs), analyzing the internal structures of lexical units (ULs). We utilized the specialized Corpus of Articles of Applied Linguistics (CorAAL), compiled from 6 high-impact journals in the field of Applied Linguistics in English language, totaling 973,844 words from 150 articles published between 2014 and 2018. AntConc (Anthony, 2022), through N-gram tool, generated the list of ULs. We investigated lexical sequences of 5-words with a variable gap, with a minimum frequency of 20 times per million words and a minimum dispersion of 10 times, resulting in a final list of 66 ULs. We identified 11 ULs that are not associated with the LBs in the study by Biber et al. (1999), but their absence in this study does not automatically classify them as p-frames. For such analysis, the parameters of variability and predictability (Tan & Römer, 2022) were integrated into the frequency criterion. We employed agglomerative hierarchical clustering and R scripts to compare the frequency, variability, and internal entropies of the ULs, observing low variability (0.02 - 0.05) and predictability (0.0 - 0.0). For instance, the lexical units ((at, in) the + of the [end, beginning, time]), (english as a + language [foreign, second]), and (it is + to note [important]) exhibit characteristics of p-frames by displaying discontinuity in their lexical units and flexibility regarding the filling of gaps with functional and content words. Thus, by identifying p-frames only from continuous ULs, we exclude those with low variability, as highlighted in the analysis. Furthermore, the filling of internal spaces in the 11 identified ULs (1*345, 12*45, 123*5), consists of content words with nominal base (Nb), verbal base (Vb), and adjectival base (Ab), as exemplified by the expression: the + of the [purpose(s), validity, teaching, use, majority, etc.]. These clusters demonstrate high levels of internal variability (from .11 to .74) and predictability (from .58 to .97), being divided into subgroups according to the reduction of similarity of the clusters being merged. The second grouping presents distinct subdivisions in the dendrogram. The results show that as internal variability increases, p-frames filled with content words, and different from each other, tend to form distinct groups. Thus, statistical analysis using internal variability and entropy allowed the identification of p-frames not derived from LBs. |
| id |
UFMG_aabba0a03063eda4dacb3e2dab7fdffc |
|---|---|
| oai_identifier_str |
oai:repositorio.ufmg.br:1843/75902 |
| network_acronym_str |
UFMG |
| network_name_str |
Repositório Institucional da UFMG |
| repository_id_str |
|
| spelling |
Um estudo direcionado por corpora: estruturas lexicais em um corpus especializadoCorpus-driven study: phrase frame in a specialized corpusLíngua inglesa – Estudo e ensinoLinguística de corpusLíngua inglesa – LexicologiaDirecionado por corpusEstruturas LexicaisPacotes LexicaisAnálise Multivariada de DadosClustersLinguistic studies related to phraseology have been gaining credibility, especially regarding the formation and analysis of formulaic language (Hunston & Francis, 1999; Wray, 2002; Biber, 2009). In this research, we aimed to directly identify, analyze, and classify Phrasal frames (P-frames) from a corpus-driven perspective (Biber, 2012). We considered the hypothesis of identifying p-frames independently of Lexical Bundles (LBs), analyzing the internal structures of lexical units (ULs). We utilized the specialized Corpus of Articles of Applied Linguistics (CorAAL), compiled from 6 high-impact journals in the field of Applied Linguistics in English language, totaling 973,844 words from 150 articles published between 2014 and 2018. AntConc (Anthony, 2022), through N-gram tool, generated the list of ULs. We investigated lexical sequences of 5-words with a variable gap, with a minimum frequency of 20 times per million words and a minimum dispersion of 10 times, resulting in a final list of 66 ULs. We identified 11 ULs that are not associated with the LBs in the study by Biber et al. (1999), but their absence in this study does not automatically classify them as p-frames. For such analysis, the parameters of variability and predictability (Tan & Römer, 2022) were integrated into the frequency criterion. We employed agglomerative hierarchical clustering and R scripts to compare the frequency, variability, and internal entropies of the ULs, observing low variability (0.02 - 0.05) and predictability (0.0 - 0.0). For instance, the lexical units ((at, in) the + of the [end, beginning, time]), (english as a + language [foreign, second]), and (it is + to note [important]) exhibit characteristics of p-frames by displaying discontinuity in their lexical units and flexibility regarding the filling of gaps with functional and content words. Thus, by identifying p-frames only from continuous ULs, we exclude those with low variability, as highlighted in the analysis. Furthermore, the filling of internal spaces in the 11 identified ULs (1*345, 12*45, 123*5), consists of content words with nominal base (Nb), verbal base (Vb), and adjectival base (Ab), as exemplified by the expression: the + of the [purpose(s), validity, teaching, use, majority, etc.]. These clusters demonstrate high levels of internal variability (from .11 to .74) and predictability (from .58 to .97), being divided into subgroups according to the reduction of similarity of the clusters being merged. The second grouping presents distinct subdivisions in the dendrogram. The results show that as internal variability increases, p-frames filled with content words, and different from each other, tend to form distinct groups. Thus, statistical analysis using internal variability and entropy allowed the identification of p-frames not derived from LBs.Universidade Federal de Minas Gerais2024-09-03T14:30:57Z2025-09-08T23:50:43Z2024-09-03T14:30:57Z2024-08-02info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/1843/75902porPrograma Institucional de Internacionalização – CAPES - PrInthttp://creativecommons.org/licenses/by-nd/3.0/pt/info:eu-repo/semantics/openAccessEdilson Rosa da Rochareponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMG2025-09-08T23:50:43Zoai:repositorio.ufmg.br:1843/75902Repositório InstitucionalPUBhttps://repositorio.ufmg.br/oairepositorio@ufmg.bropendoar:2025-09-08T23:50:43Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false |
| dc.title.none.fl_str_mv |
Um estudo direcionado por corpora: estruturas lexicais em um corpus especializado Corpus-driven study: phrase frame in a specialized corpus |
| title |
Um estudo direcionado por corpora: estruturas lexicais em um corpus especializado |
| spellingShingle |
Um estudo direcionado por corpora: estruturas lexicais em um corpus especializado Edilson Rosa da Rocha Língua inglesa – Estudo e ensino Linguística de corpus Língua inglesa – Lexicologia Direcionado por corpus Estruturas Lexicais Pacotes Lexicais Análise Multivariada de Dados Clusters |
| title_short |
Um estudo direcionado por corpora: estruturas lexicais em um corpus especializado |
| title_full |
Um estudo direcionado por corpora: estruturas lexicais em um corpus especializado |
| title_fullStr |
Um estudo direcionado por corpora: estruturas lexicais em um corpus especializado |
| title_full_unstemmed |
Um estudo direcionado por corpora: estruturas lexicais em um corpus especializado |
| title_sort |
Um estudo direcionado por corpora: estruturas lexicais em um corpus especializado |
| author |
Edilson Rosa da Rocha |
| author_facet |
Edilson Rosa da Rocha |
| author_role |
author |
| dc.contributor.author.fl_str_mv |
Edilson Rosa da Rocha |
| dc.subject.por.fl_str_mv |
Língua inglesa – Estudo e ensino Linguística de corpus Língua inglesa – Lexicologia Direcionado por corpus Estruturas Lexicais Pacotes Lexicais Análise Multivariada de Dados Clusters |
| topic |
Língua inglesa – Estudo e ensino Linguística de corpus Língua inglesa – Lexicologia Direcionado por corpus Estruturas Lexicais Pacotes Lexicais Análise Multivariada de Dados Clusters |
| description |
Linguistic studies related to phraseology have been gaining credibility, especially regarding the formation and analysis of formulaic language (Hunston & Francis, 1999; Wray, 2002; Biber, 2009). In this research, we aimed to directly identify, analyze, and classify Phrasal frames (P-frames) from a corpus-driven perspective (Biber, 2012). We considered the hypothesis of identifying p-frames independently of Lexical Bundles (LBs), analyzing the internal structures of lexical units (ULs). We utilized the specialized Corpus of Articles of Applied Linguistics (CorAAL), compiled from 6 high-impact journals in the field of Applied Linguistics in English language, totaling 973,844 words from 150 articles published between 2014 and 2018. AntConc (Anthony, 2022), through N-gram tool, generated the list of ULs. We investigated lexical sequences of 5-words with a variable gap, with a minimum frequency of 20 times per million words and a minimum dispersion of 10 times, resulting in a final list of 66 ULs. We identified 11 ULs that are not associated with the LBs in the study by Biber et al. (1999), but their absence in this study does not automatically classify them as p-frames. For such analysis, the parameters of variability and predictability (Tan & Römer, 2022) were integrated into the frequency criterion. We employed agglomerative hierarchical clustering and R scripts to compare the frequency, variability, and internal entropies of the ULs, observing low variability (0.02 - 0.05) and predictability (0.0 - 0.0). For instance, the lexical units ((at, in) the + of the [end, beginning, time]), (english as a + language [foreign, second]), and (it is + to note [important]) exhibit characteristics of p-frames by displaying discontinuity in their lexical units and flexibility regarding the filling of gaps with functional and content words. Thus, by identifying p-frames only from continuous ULs, we exclude those with low variability, as highlighted in the analysis. Furthermore, the filling of internal spaces in the 11 identified ULs (1*345, 12*45, 123*5), consists of content words with nominal base (Nb), verbal base (Vb), and adjectival base (Ab), as exemplified by the expression: the + of the [purpose(s), validity, teaching, use, majority, etc.]. These clusters demonstrate high levels of internal variability (from .11 to .74) and predictability (from .58 to .97), being divided into subgroups according to the reduction of similarity of the clusters being merged. The second grouping presents distinct subdivisions in the dendrogram. The results show that as internal variability increases, p-frames filled with content words, and different from each other, tend to form distinct groups. Thus, statistical analysis using internal variability and entropy allowed the identification of p-frames not derived from LBs. |
| publishDate |
2024 |
| dc.date.none.fl_str_mv |
2024-09-03T14:30:57Z 2024-09-03T14:30:57Z 2024-08-02 2025-09-08T23:50:43Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/1843/75902 |
| url |
https://hdl.handle.net/1843/75902 |
| dc.language.iso.fl_str_mv |
por |
| language |
por |
| dc.relation.none.fl_str_mv |
Programa Institucional de Internacionalização – CAPES - PrInt |
| dc.rights.driver.fl_str_mv |
http://creativecommons.org/licenses/by-nd/3.0/pt/ info:eu-repo/semantics/openAccess |
| rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nd/3.0/pt/ |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.publisher.none.fl_str_mv |
Universidade Federal de Minas Gerais |
| publisher.none.fl_str_mv |
Universidade Federal de Minas Gerais |
| dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFMG instname:Universidade Federal de Minas Gerais (UFMG) instacron:UFMG |
| instname_str |
Universidade Federal de Minas Gerais (UFMG) |
| instacron_str |
UFMG |
| institution |
UFMG |
| reponame_str |
Repositório Institucional da UFMG |
| collection |
Repositório Institucional da UFMG |
| repository.name.fl_str_mv |
Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG) |
| repository.mail.fl_str_mv |
repositorio@ufmg.br |
| _version_ |
1856414031074557952 |