As marcas lexicais da discriminação étnico-racial entre negros e brancos: um estudo da linguística de corpus

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Souza, Rafael Webster Ferreira de
Orientador(a): Sardinha, Antonio Paulo Berber lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Pontifícia Universidade Católica de São Paulo
Programa de Pós-Graduação: Programa de Estudos Pós-Graduados em Linguística Aplicada e Estudos da Linguagem
Departamento: Faculdade de Filosofia, Comunicação, Letras e Artes
País: Brasil
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: https://repositorio.pucsp.br/jspui/handle/handle/24088
Resumo: Language serves as a means of social indexing, varying between groups of speakers and allowing social groups to distinguish between insiders and outsiders. In other words, it allows us to distinguish between individuals who belong and do not belong to a particular group (HALL, 2016; BAKER, 2010). The overall purpose of this study is threefold: (a) to identify the subject matters, themes, or discourses produced by black and white speakers of Brazilian Portuguese, (b) to determine which of these subject matters, themes or discourses are similar or different among black and white speakers, and (c) to analyze whether the ethnic-racial component of its speakers is likely to be predicted by the lexical variables used in the texts. These goals were pursued from a Corpus Linguistics perspective (HUNSTON; ESIMAJE, 2019; BAKER; EGBERT, 2016; BIBER, 2009; BERBER SARDINHA, 2004). Therefore, a corpus has been collected to represent speaking situations in which black and white speakers are involved in Brazilian Portuguese. The corpus was called ‘Corpus of Ethnic-Racial Registers’, whose name in Portuguese is Corpus de Registros Étnico-Raciais (CRER), comprising 788 texts representing several samples of language in use from different registers (namely popular songs, vlogs, and oral life histories) in Brazilian Portuguese. The texts for each register (and subregisters) in the corpus were balanced for black and white speakers. Once the corpus was fully built, the sample files were morphosyntactically tagged and the words tagged as nouns, adjectives, verbs and adverbs were selected for each text. The counts for those words were then normalized by 1,000 words. The resulting lexis variables were analyzed using two methods: Lexical Multidimensional Analysis (BERBER SARDINHA, 2014) and Discriminant Functional Analysis (BERBER SARDINHA; VEIRANO PINTO, 2014, 2019; NORRIS, 2015). For the former method, dimensions of lexical variation were statistically identified and communicatively interpreted to find the subject matters, themes or discourses. The statistical comparisons showed a high degree of difference across the registers for the dimensions, indicating that register is a powerful predictor of variation for language use, which corroborates previous multi-dimensional studies (BIBER, 1988; BERBER SARDINHA; VEIRANO PINTO, 2014, 2019). At the same time, the dimensions failed to reveal variation between black and white speakers. For the latter method, the selected lexis from the texts was entered in a Discriminant Functional Analysis returning sets of words considered predictors of each ethnic group. The statistical comparisons showed a high degree of accurate prediction for both ethnic groups (88,13% for blacks, and 72,29% for whites). Additionally, the predicting words of each group were interpreted in connection with the underlying subject matters, themes and discourses, which surprisingly revealed subtle, yet undeniable traces of discrimination, racism and bias toward blacks. In short, both methods showed different outcomes; while the dimensions of lexical variation did not reveal differences between black and white speakers, the Discriminant Analysis pointed out clear differences for both groups. Thus, we were able to conclude that the speakers’ ethnic group can be identified through the lexis employed in their subject matters, themes and discourses and, unexpectedly, that there is hidden racial discrimination and prejudice in the topics black and white Brazilians talk about every day
id PUC_SP-1_add2387fb0abc8a9e9058bb494fb01f7
oai_identifier_str oai:repositorio.pucsp.br:handle/24088
network_acronym_str PUC_SP-1
network_name_str Biblioteca Digital de Teses e Dissertações da PUC_SP
repository_id_str
spelling Sardinha, Antonio Paulo Berberhttp://lattes.cnpq.br/6940454346543706Souza, Rafael Webster Ferreira de2021-11-26T19:59:30Z2021-11-26T19:59:30Z2020-03-12Souza, Rafael Webster Ferreira de. As marcas lexicais da discriminação étnico-racial entre negros e brancos: um estudo da linguística de corpus. 2020. Dissertação (Mestrado em Linguística Aplicada e Estudos da Linguagem) - Programa de Estudos Pós-Graduados em Linguística Aplicada e Estudos da Linguagem da Pontifícia Universidade Católica de São Paulo, São Paulo, 2020.https://repositorio.pucsp.br/jspui/handle/handle/24088Language serves as a means of social indexing, varying between groups of speakers and allowing social groups to distinguish between insiders and outsiders. In other words, it allows us to distinguish between individuals who belong and do not belong to a particular group (HALL, 2016; BAKER, 2010). The overall purpose of this study is threefold: (a) to identify the subject matters, themes, or discourses produced by black and white speakers of Brazilian Portuguese, (b) to determine which of these subject matters, themes or discourses are similar or different among black and white speakers, and (c) to analyze whether the ethnic-racial component of its speakers is likely to be predicted by the lexical variables used in the texts. These goals were pursued from a Corpus Linguistics perspective (HUNSTON; ESIMAJE, 2019; BAKER; EGBERT, 2016; BIBER, 2009; BERBER SARDINHA, 2004). Therefore, a corpus has been collected to represent speaking situations in which black and white speakers are involved in Brazilian Portuguese. The corpus was called ‘Corpus of Ethnic-Racial Registers’, whose name in Portuguese is Corpus de Registros Étnico-Raciais (CRER), comprising 788 texts representing several samples of language in use from different registers (namely popular songs, vlogs, and oral life histories) in Brazilian Portuguese. The texts for each register (and subregisters) in the corpus were balanced for black and white speakers. Once the corpus was fully built, the sample files were morphosyntactically tagged and the words tagged as nouns, adjectives, verbs and adverbs were selected for each text. The counts for those words were then normalized by 1,000 words. The resulting lexis variables were analyzed using two methods: Lexical Multidimensional Analysis (BERBER SARDINHA, 2014) and Discriminant Functional Analysis (BERBER SARDINHA; VEIRANO PINTO, 2014, 2019; NORRIS, 2015). For the former method, dimensions of lexical variation were statistically identified and communicatively interpreted to find the subject matters, themes or discourses. The statistical comparisons showed a high degree of difference across the registers for the dimensions, indicating that register is a powerful predictor of variation for language use, which corroborates previous multi-dimensional studies (BIBER, 1988; BERBER SARDINHA; VEIRANO PINTO, 2014, 2019). At the same time, the dimensions failed to reveal variation between black and white speakers. For the latter method, the selected lexis from the texts was entered in a Discriminant Functional Analysis returning sets of words considered predictors of each ethnic group. The statistical comparisons showed a high degree of accurate prediction for both ethnic groups (88,13% for blacks, and 72,29% for whites). Additionally, the predicting words of each group were interpreted in connection with the underlying subject matters, themes and discourses, which surprisingly revealed subtle, yet undeniable traces of discrimination, racism and bias toward blacks. In short, both methods showed different outcomes; while the dimensions of lexical variation did not reveal differences between black and white speakers, the Discriminant Analysis pointed out clear differences for both groups. Thus, we were able to conclude that the speakers’ ethnic group can be identified through the lexis employed in their subject matters, themes and discourses and, unexpectedly, that there is hidden racial discrimination and prejudice in the topics black and white Brazilians talk about every dayA língua serve como meio de indexação social, variando entre grupos de falantes e permitindo que grupos sociais se distingam entre internos e externos. Em outras palavras, permite distinguir entre indivíduos que pertencem e não pertencem a um determinado grupo (HALL, 2016; BAKER, 2010). Desta forma, esta pesquisa tem como objetivo (a) identificar os assuntos, temas ou discursos produzidos por falantes negros e brancos, (b) determinar quais desses assuntos, temas ou discursos são semelhantes e diferentes entre estes dois grupos étnicos pesquisados, e (c) averiguar se o componente étnico-racial do falante do texto pode ser previsto pelas variáveis lexicais empregadas nos assuntos, temas e discursos dos textos. Esses objetivos estão alicerçados teórica e metodologicamente na Linguística de Corpus (HUNSTON; ESIMAJE, 2019; BAKER; EGBERT, 2016; BIBER, 2009; BERBER SARDINHA, 2004). Para a consecução da pesquisa, foi coletado um corpus para representar situações de língua em uso nos quais falantes negros e brancos do português brasileiro estão envolvidos. O corpus foi nomeado como Corpus de Registros Étnico-Raciais (CRER), contemplando 788 textos representando a língua em uso em diferentes registros (Letras de Músicas, Vlogs, e Relatos Orais de Vida) em língua portuguesa brasileira. Os textos dos registros (e subregistros) do corpus foram coletados de forma que amostras de língua em uso estivessem balanceadas para os falantes negros e brancos. Uma vez coletado o corpus, os arquivos foram etiquetados morfossintaticamente e as palavras etiquetadas como substantivos, adjetivos, verbos e advérbios foram selecionadas de cada texto. As contagens dessas palavras foram então normalizadas por 1.000 palavras e a amostra resultante do léxico foi analisada através de dois métodos – Análise Multidimensional Lexical (BERBER SARDINHA, 2014) e Análise Discriminante (BERBER SARDINHA; VEIRANO PINTO, 2014, 2019; NORRIS, 2015). Para o primeiro método, as dimensões de variação lexical foram estatisticamente identificadas e comunicativamente interpretadas a fim de extrair os assuntos, temas e discursos presentes nelas. As comparações estatísticas mostraram um alto grau de diferença entre os registros nas dimensões, indicando que o registro é um forte preditor de variação da língua em uso, o que corrobora os estudos multidimensionais anteriores, como (BIBER, 1998; BERBER SARDINHA; VEIRANO PINTO, 2014, 2019). Ao mesmo tempo, as dimensões não se demonstraram capazes de revelar variação entre falantes negros e brancos. Para o segundo método, o léxico selecionado foi utilizado em uma Análise Discriminante Funcional que resultou em variáveis lexicais preditores para cada grupo étnico. As comparações estatísticas mostraram um nível alto de predição para ambos os grupos étnicos, sendo 88,3% para negros e 72,29% para brancos. As variáveis preditoras foram, ainda, interpretadas de acordo com os seus assuntos, temas e discursos e revelaram, surpreendentemente, mesmo que de forma sutil, traços inegáveis de discriminação, racismo e preconceito aos negros. Em suma, ambos os métodos produziram resultados diferentes; enquanto as dimensões de variação lexical não revelaram diferenças entre os falantes negros e brancos, a Análise Discriminante evidenciou diferenças marcantes entre os dois grupos. Assim, podemos concluir que o pertencimento étnico-racial dos falantes pode ser previsto pelo léxico empregado em seus assuntos, temas e discursos e, de forma inesperada, que há discriminação e preconceito racial ocultos nos assuntos, temas e discursos que os falantes negros e brancos brasileiros produzem em seu dia a diaCoordenação de Aperfeiçoamento de Pessoal de Nível Superior – CAPESporPontifícia Universidade Católica de São PauloPrograma de Estudos Pós-Graduados em Linguística Aplicada e Estudos da LinguagemPUC-SPBrasilFaculdade de Filosofia, Comunicação, Letras e ArtesCNPQ::LINGUISTICA, LETRAS E ARTES::LINGUISTICA::LINGUISTICA APLICADADiscriminação racialNegrosBrancosLinguística de corpusGrupos étnicosLinguagem e línguas - VariaçãoRace discriminationBlacksWhitesCorpora (Linguistics)Ethnic groupsLanguage and languages - VariationAs marcas lexicais da discriminação étnico-racial entre negros e brancos: um estudo da linguística de corpusThe lexical marks of the ethnic-racial discrimination between blacks and whites: a Corpus Linguistics studyinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da PUC_SPinstname:Pontifícia Universidade Católica de São Paulo (PUC-SP)instacron:PUC_SPORIGINALRafael Webster F. Souza.pdfapplication/pdf2574481https://repositorio.pucsp.br/xmlui/bitstream/handle/24088/1/Rafael%20Webster%20F.%20Souza.pdf660fbaa108be4aa99432be973ace1a2eMD51TEXTRafael Webster F. Souza.pdf.txtRafael Webster F. Souza.pdf.txtExtracted texttext/plain360128https://repositorio.pucsp.br/xmlui/bitstream/handle/24088/2/Rafael%20Webster%20F.%20Souza.pdf.txt16fb9d324d05dc425a8e3294c6b31c7dMD52THUMBNAILRafael Webster F. Souza.pdf.jpgRafael Webster F. Souza.pdf.jpgGenerated Thumbnailimage/jpeg1253https://repositorio.pucsp.br/xmlui/bitstream/handle/24088/3/Rafael%20Webster%20F.%20Souza.pdf.jpgf580bc456d722ad67fbfa33ba53547ceMD53handle/240882021-11-29 12:10:46.382oai:repositorio.pucsp.br:handle/24088Biblioteca Digital de Teses e Dissertaçõeshttps://sapientia.pucsp.br/https://sapientia.pucsp.br/oai/requestbngkatende@pucsp.br||rapassi@pucsp.bropendoar:2021-11-29T15:10:46Biblioteca Digital de Teses e Dissertações da PUC_SP - Pontifícia Universidade Católica de São Paulo (PUC-SP)false
dc.title.pt_BR.fl_str_mv As marcas lexicais da discriminação étnico-racial entre negros e brancos: um estudo da linguística de corpus
dc.title.alternative.en_US.fl_str_mv The lexical marks of the ethnic-racial discrimination between blacks and whites: a Corpus Linguistics study
title As marcas lexicais da discriminação étnico-racial entre negros e brancos: um estudo da linguística de corpus
spellingShingle As marcas lexicais da discriminação étnico-racial entre negros e brancos: um estudo da linguística de corpus
Souza, Rafael Webster Ferreira de
CNPQ::LINGUISTICA, LETRAS E ARTES::LINGUISTICA::LINGUISTICA APLICADA
Discriminação racial
Negros
Brancos
Linguística de corpus
Grupos étnicos
Linguagem e línguas - Variação
Race discrimination
Blacks
Whites
Corpora (Linguistics)
Ethnic groups
Language and languages - Variation
title_short As marcas lexicais da discriminação étnico-racial entre negros e brancos: um estudo da linguística de corpus
title_full As marcas lexicais da discriminação étnico-racial entre negros e brancos: um estudo da linguística de corpus
title_fullStr As marcas lexicais da discriminação étnico-racial entre negros e brancos: um estudo da linguística de corpus
title_full_unstemmed As marcas lexicais da discriminação étnico-racial entre negros e brancos: um estudo da linguística de corpus
title_sort As marcas lexicais da discriminação étnico-racial entre negros e brancos: um estudo da linguística de corpus
author Souza, Rafael Webster Ferreira de
author_facet Souza, Rafael Webster Ferreira de
author_role author
dc.contributor.advisor1.fl_str_mv Sardinha, Antonio Paulo Berber
dc.contributor.advisor1Lattes.fl_str_mv http://lattes.cnpq.br/6940454346543706
dc.contributor.author.fl_str_mv Souza, Rafael Webster Ferreira de
contributor_str_mv Sardinha, Antonio Paulo Berber
dc.subject.cnpq.fl_str_mv CNPQ::LINGUISTICA, LETRAS E ARTES::LINGUISTICA::LINGUISTICA APLICADA
topic CNPQ::LINGUISTICA, LETRAS E ARTES::LINGUISTICA::LINGUISTICA APLICADA
Discriminação racial
Negros
Brancos
Linguística de corpus
Grupos étnicos
Linguagem e línguas - Variação
Race discrimination
Blacks
Whites
Corpora (Linguistics)
Ethnic groups
Language and languages - Variation
dc.subject.por.fl_str_mv Discriminação racial
Negros
Brancos
Linguística de corpus
Grupos étnicos
Linguagem e línguas - Variação
dc.subject.eng.fl_str_mv Race discrimination
Blacks
Whites
Corpora (Linguistics)
Ethnic groups
Language and languages - Variation
description Language serves as a means of social indexing, varying between groups of speakers and allowing social groups to distinguish between insiders and outsiders. In other words, it allows us to distinguish between individuals who belong and do not belong to a particular group (HALL, 2016; BAKER, 2010). The overall purpose of this study is threefold: (a) to identify the subject matters, themes, or discourses produced by black and white speakers of Brazilian Portuguese, (b) to determine which of these subject matters, themes or discourses are similar or different among black and white speakers, and (c) to analyze whether the ethnic-racial component of its speakers is likely to be predicted by the lexical variables used in the texts. These goals were pursued from a Corpus Linguistics perspective (HUNSTON; ESIMAJE, 2019; BAKER; EGBERT, 2016; BIBER, 2009; BERBER SARDINHA, 2004). Therefore, a corpus has been collected to represent speaking situations in which black and white speakers are involved in Brazilian Portuguese. The corpus was called ‘Corpus of Ethnic-Racial Registers’, whose name in Portuguese is Corpus de Registros Étnico-Raciais (CRER), comprising 788 texts representing several samples of language in use from different registers (namely popular songs, vlogs, and oral life histories) in Brazilian Portuguese. The texts for each register (and subregisters) in the corpus were balanced for black and white speakers. Once the corpus was fully built, the sample files were morphosyntactically tagged and the words tagged as nouns, adjectives, verbs and adverbs were selected for each text. The counts for those words were then normalized by 1,000 words. The resulting lexis variables were analyzed using two methods: Lexical Multidimensional Analysis (BERBER SARDINHA, 2014) and Discriminant Functional Analysis (BERBER SARDINHA; VEIRANO PINTO, 2014, 2019; NORRIS, 2015). For the former method, dimensions of lexical variation were statistically identified and communicatively interpreted to find the subject matters, themes or discourses. The statistical comparisons showed a high degree of difference across the registers for the dimensions, indicating that register is a powerful predictor of variation for language use, which corroborates previous multi-dimensional studies (BIBER, 1988; BERBER SARDINHA; VEIRANO PINTO, 2014, 2019). At the same time, the dimensions failed to reveal variation between black and white speakers. For the latter method, the selected lexis from the texts was entered in a Discriminant Functional Analysis returning sets of words considered predictors of each ethnic group. The statistical comparisons showed a high degree of accurate prediction for both ethnic groups (88,13% for blacks, and 72,29% for whites). Additionally, the predicting words of each group were interpreted in connection with the underlying subject matters, themes and discourses, which surprisingly revealed subtle, yet undeniable traces of discrimination, racism and bias toward blacks. In short, both methods showed different outcomes; while the dimensions of lexical variation did not reveal differences between black and white speakers, the Discriminant Analysis pointed out clear differences for both groups. Thus, we were able to conclude that the speakers’ ethnic group can be identified through the lexis employed in their subject matters, themes and discourses and, unexpectedly, that there is hidden racial discrimination and prejudice in the topics black and white Brazilians talk about every day
publishDate 2020
dc.date.issued.fl_str_mv 2020-03-12
dc.date.accessioned.fl_str_mv 2021-11-26T19:59:30Z
dc.date.available.fl_str_mv 2021-11-26T19:59:30Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv Souza, Rafael Webster Ferreira de. As marcas lexicais da discriminação étnico-racial entre negros e brancos: um estudo da linguística de corpus. 2020. Dissertação (Mestrado em Linguística Aplicada e Estudos da Linguagem) - Programa de Estudos Pós-Graduados em Linguística Aplicada e Estudos da Linguagem da Pontifícia Universidade Católica de São Paulo, São Paulo, 2020.
dc.identifier.uri.fl_str_mv https://repositorio.pucsp.br/jspui/handle/handle/24088
identifier_str_mv Souza, Rafael Webster Ferreira de. As marcas lexicais da discriminação étnico-racial entre negros e brancos: um estudo da linguística de corpus. 2020. Dissertação (Mestrado em Linguística Aplicada e Estudos da Linguagem) - Programa de Estudos Pós-Graduados em Linguística Aplicada e Estudos da Linguagem da Pontifícia Universidade Católica de São Paulo, São Paulo, 2020.
url https://repositorio.pucsp.br/jspui/handle/handle/24088
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Pontifícia Universidade Católica de São Paulo
dc.publisher.program.fl_str_mv Programa de Estudos Pós-Graduados em Linguística Aplicada e Estudos da Linguagem
dc.publisher.initials.fl_str_mv PUC-SP
dc.publisher.country.fl_str_mv Brasil
dc.publisher.department.fl_str_mv Faculdade de Filosofia, Comunicação, Letras e Artes
publisher.none.fl_str_mv Pontifícia Universidade Católica de São Paulo
dc.source.none.fl_str_mv reponame:Biblioteca Digital de Teses e Dissertações da PUC_SP
instname:Pontifícia Universidade Católica de São Paulo (PUC-SP)
instacron:PUC_SP
instname_str Pontifícia Universidade Católica de São Paulo (PUC-SP)
instacron_str PUC_SP
institution PUC_SP
reponame_str Biblioteca Digital de Teses e Dissertações da PUC_SP
collection Biblioteca Digital de Teses e Dissertações da PUC_SP
bitstream.url.fl_str_mv https://repositorio.pucsp.br/xmlui/bitstream/handle/24088/1/Rafael%20Webster%20F.%20Souza.pdf
https://repositorio.pucsp.br/xmlui/bitstream/handle/24088/2/Rafael%20Webster%20F.%20Souza.pdf.txt
https://repositorio.pucsp.br/xmlui/bitstream/handle/24088/3/Rafael%20Webster%20F.%20Souza.pdf.jpg
bitstream.checksum.fl_str_mv 660fbaa108be4aa99432be973ace1a2e
16fb9d324d05dc425a8e3294c6b31c7d
f580bc456d722ad67fbfa33ba53547ce
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da PUC_SP - Pontifícia Universidade Católica de São Paulo (PUC-SP)
repository.mail.fl_str_mv bngkatende@pucsp.br||rapassi@pucsp.br
_version_ 1793424101935677440