Processing a learner corpus to identify differences: the influence of task, genre and student background
| Ano de defesa: | 2016 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Dissertação |
| Tipo de acesso: | Acesso aberto |
| Idioma: | por |
| Instituição de defesa: |
Universidade Federal de Minas Gerais
|
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | https://hdl.handle.net/1843/MGSS-A9KGY5 |
Resumo: | This master thesis deals with the technical and methodological aspects in creating, cleaning and processing a Brazilian university level learner corpus, the Corpus do Inglês sem Fronteiras (CorIsF) v 1.0. The two main goals of this study consist of making the processing of CorIsF replicable and in investigating and describing the variation of some linguistic characteristics across different learner groups, tasks andgenres. The procedure was carried in R, a free software environment for statistical computing and graphics, and was divided in four parts: dataset compilation and preprocessing; dataset processing; extraction of the key features; and data visualization. The first step deals with the method used to collect the data and to do the first cleaning process, such as eliminating unwanted data and keeping the relevant ones. In the following step, CorIsF was subset in five small corpora covering different learner profiles, two different tasks, and on genre, and annotated with a part-ofspeech (POS) tagger. In the third step the variability of POS within subcorpora, the frequency of types and tokens, and the usage of n-grams were investigated. In the final step some exploratory data visualization were performed with the creation and analysis of plots and wordclouds. After the preparation of the data, the language used in each subcorpora was contrasted and analysed, suggesting that task, genre and student background are likely to influence learners written production. |
| id |
UFMG_b3c3b13ee5f5900facf6bcf756bd18dd |
|---|---|
| oai_identifier_str |
oai:repositorio.ufmg.br:1843/MGSS-A9KGY5 |
| network_acronym_str |
UFMG |
| network_name_str |
Repositório Institucional da UFMG |
| repository_id_str |
|
| spelling |
Processing a learner corpus to identify differences: the influence of task, genre and student backgroundLíngua inglesa Estudo e ensino Falantes de português BrasilLíngua inglesa Estudo e ensino Falantes estrangeirosLingüística textualAquisição da segunda linguagemLingua inglesa GramaticaLinguística de corpusInglês para fins acadêmicosCorpus de aprendizDesenho de corpusThis master thesis deals with the technical and methodological aspects in creating, cleaning and processing a Brazilian university level learner corpus, the Corpus do Inglês sem Fronteiras (CorIsF) v 1.0. The two main goals of this study consist of making the processing of CorIsF replicable and in investigating and describing the variation of some linguistic characteristics across different learner groups, tasks andgenres. The procedure was carried in R, a free software environment for statistical computing and graphics, and was divided in four parts: dataset compilation and preprocessing; dataset processing; extraction of the key features; and data visualization. The first step deals with the method used to collect the data and to do the first cleaning process, such as eliminating unwanted data and keeping the relevant ones. In the following step, CorIsF was subset in five small corpora covering different learner profiles, two different tasks, and on genre, and annotated with a part-ofspeech (POS) tagger. In the third step the variability of POS within subcorpora, the frequency of types and tokens, and the usage of n-grams were investigated. In the final step some exploratory data visualization were performed with the creation and analysis of plots and wordclouds. After the preparation of the data, the language used in each subcorpora was contrasted and analysed, suggesting that task, genre and student background are likely to influence learners written production.Universidade Federal de Minas Gerais2019-08-14T21:55:50Z2025-09-08T23:39:36Z2019-08-14T21:55:50Z2016-03-21info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/1843/MGSS-A9KGY5Andressa Rodrigues Gomideinfo:eu-repo/semantics/openAccessporreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMG2025-09-08T23:39:36Zoai:repositorio.ufmg.br:1843/MGSS-A9KGY5Repositório InstitucionalPUBhttps://repositorio.ufmg.br/oairepositorio@ufmg.bropendoar:2025-09-08T23:39:36Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false |
| dc.title.none.fl_str_mv |
Processing a learner corpus to identify differences: the influence of task, genre and student background |
| title |
Processing a learner corpus to identify differences: the influence of task, genre and student background |
| spellingShingle |
Processing a learner corpus to identify differences: the influence of task, genre and student background Andressa Rodrigues Gomide Língua inglesa Estudo e ensino Falantes de português Brasil Língua inglesa Estudo e ensino Falantes estrangeiros Lingüística textual Aquisição da segunda linguagem Lingua inglesa Gramatica Linguística de corpus Inglês para fins acadêmicos Corpus de aprendiz Desenho de corpus |
| title_short |
Processing a learner corpus to identify differences: the influence of task, genre and student background |
| title_full |
Processing a learner corpus to identify differences: the influence of task, genre and student background |
| title_fullStr |
Processing a learner corpus to identify differences: the influence of task, genre and student background |
| title_full_unstemmed |
Processing a learner corpus to identify differences: the influence of task, genre and student background |
| title_sort |
Processing a learner corpus to identify differences: the influence of task, genre and student background |
| author |
Andressa Rodrigues Gomide |
| author_facet |
Andressa Rodrigues Gomide |
| author_role |
author |
| dc.contributor.author.fl_str_mv |
Andressa Rodrigues Gomide |
| dc.subject.por.fl_str_mv |
Língua inglesa Estudo e ensino Falantes de português Brasil Língua inglesa Estudo e ensino Falantes estrangeiros Lingüística textual Aquisição da segunda linguagem Lingua inglesa Gramatica Linguística de corpus Inglês para fins acadêmicos Corpus de aprendiz Desenho de corpus |
| topic |
Língua inglesa Estudo e ensino Falantes de português Brasil Língua inglesa Estudo e ensino Falantes estrangeiros Lingüística textual Aquisição da segunda linguagem Lingua inglesa Gramatica Linguística de corpus Inglês para fins acadêmicos Corpus de aprendiz Desenho de corpus |
| description |
This master thesis deals with the technical and methodological aspects in creating, cleaning and processing a Brazilian university level learner corpus, the Corpus do Inglês sem Fronteiras (CorIsF) v 1.0. The two main goals of this study consist of making the processing of CorIsF replicable and in investigating and describing the variation of some linguistic characteristics across different learner groups, tasks andgenres. The procedure was carried in R, a free software environment for statistical computing and graphics, and was divided in four parts: dataset compilation and preprocessing; dataset processing; extraction of the key features; and data visualization. The first step deals with the method used to collect the data and to do the first cleaning process, such as eliminating unwanted data and keeping the relevant ones. In the following step, CorIsF was subset in five small corpora covering different learner profiles, two different tasks, and on genre, and annotated with a part-ofspeech (POS) tagger. In the third step the variability of POS within subcorpora, the frequency of types and tokens, and the usage of n-grams were investigated. In the final step some exploratory data visualization were performed with the creation and analysis of plots and wordclouds. After the preparation of the data, the language used in each subcorpora was contrasted and analysed, suggesting that task, genre and student background are likely to influence learners written production. |
| publishDate |
2016 |
| dc.date.none.fl_str_mv |
2016-03-21 2019-08-14T21:55:50Z 2019-08-14T21:55:50Z 2025-09-08T23:39:36Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/1843/MGSS-A9KGY5 |
| url |
https://hdl.handle.net/1843/MGSS-A9KGY5 |
| dc.language.iso.fl_str_mv |
por |
| language |
por |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.publisher.none.fl_str_mv |
Universidade Federal de Minas Gerais |
| publisher.none.fl_str_mv |
Universidade Federal de Minas Gerais |
| dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFMG instname:Universidade Federal de Minas Gerais (UFMG) instacron:UFMG |
| instname_str |
Universidade Federal de Minas Gerais (UFMG) |
| instacron_str |
UFMG |
| institution |
UFMG |
| reponame_str |
Repositório Institucional da UFMG |
| collection |
Repositório Institucional da UFMG |
| repository.name.fl_str_mv |
Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG) |
| repository.mail.fl_str_mv |
repositorio@ufmg.br |
| _version_ |
1856414097768185856 |