Proposta de uma abordagem para sumarização extrativa de textos científicos longos
| Ano de defesa: | 2022 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Dissertação |
| Tipo de acesso: | Acesso aberto |
| Idioma: | por |
| Instituição de defesa: |
Universidade Federal de Minas Gerais
|
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | https://hdl.handle.net/1843/51324 |
Resumo: | Automatic text summarization is one of the solutions that allows users to identify the most relevant information in a textual document, consequently reducing the time to search for information. The objective of this technique is to condense the information of a text into a simple and descriptive summary, which gives the reader a general idea of the text without having to read all its content. Most of the literature in automatic text summarization focuses on proposing and improving Deep Learning methods in order to make these models applicable in the context of long text summarization. Unfortunately, these models still have limitations on the input sequence length. Such a limitation may lead to a loss of information that impairs the quality of the summaries generated. For this reason, we propose in this dissertation a new approach to extractive summarization of long texts. We have two hypotheses, the first is that subdividing the summarization problem into smaller problems and solving them separately, and later combining these solutions can be beneficial for the task of summarizing long texts. The second hypothesis is that there are other characteristics of the text that can be useful in the creation of the summary. With this in mind, we model the text summarization problem as a binary classification problem. We tested different algorithms and showed that multi-section summarization outperforms single-section summarization with a performance gain of approximately 14% and 5% of BertScore for the Plos One and ArXiv datasets, respectively. We also evaluated the performance of the proposed summarizer using different representations of the text and showed that the single-view representation of attributes is the one that gets the best results. This shows that, for the extractive text summarization task, the attributes selected to compose the attributes view allow to better identify the importance of the sentences. Finally, we compare the proposed method with different state-of-the-art models in extractive, abstractive and hybrid summarization and show that our approach outperforms these models. |
| id |
UFMG_70640e93d43fbc2fcec5ffaedad4d6f9 |
|---|---|
| oai_identifier_str |
oai:repositorio.ufmg.br:1843/51324 |
| network_acronym_str |
UFMG |
| network_name_str |
Repositório Institucional da UFMG |
| repository_id_str |
|
| spelling |
Proposta de uma abordagem para sumarização extrativa de textos científicos longos. Computação – TesesSumarização automática de textos – TesesAprendizado de máquina multivisão– TesesClassificação – TesesSumariza ̧c ̃ao extrativa de textosAprendizado Multi-visãoClassificaçãoAutomatic text summarization is one of the solutions that allows users to identify the most relevant information in a textual document, consequently reducing the time to search for information. The objective of this technique is to condense the information of a text into a simple and descriptive summary, which gives the reader a general idea of the text without having to read all its content. Most of the literature in automatic text summarization focuses on proposing and improving Deep Learning methods in order to make these models applicable in the context of long text summarization. Unfortunately, these models still have limitations on the input sequence length. Such a limitation may lead to a loss of information that impairs the quality of the summaries generated. For this reason, we propose in this dissertation a new approach to extractive summarization of long texts. We have two hypotheses, the first is that subdividing the summarization problem into smaller problems and solving them separately, and later combining these solutions can be beneficial for the task of summarizing long texts. The second hypothesis is that there are other characteristics of the text that can be useful in the creation of the summary. With this in mind, we model the text summarization problem as a binary classification problem. We tested different algorithms and showed that multi-section summarization outperforms single-section summarization with a performance gain of approximately 14% and 5% of BertScore for the Plos One and ArXiv datasets, respectively. We also evaluated the performance of the proposed summarizer using different representations of the text and showed that the single-view representation of attributes is the one that gets the best results. This shows that, for the extractive text summarization task, the attributes selected to compose the attributes view allow to better identify the importance of the sentences. Finally, we compare the proposed method with different state-of-the-art models in extractive, abstractive and hybrid summarization and show that our approach outperforms these models.Universidade Federal de Minas Gerais2023-03-29T14:51:16Z2025-09-08T23:59:05Z2023-03-29T14:51:16Z2022-12-05info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/1843/51324porCinthia Mikaela de Souzainfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMG2025-09-08T23:59:05Zoai:repositorio.ufmg.br:1843/51324Repositório InstitucionalPUBhttps://repositorio.ufmg.br/oairepositorio@ufmg.bropendoar:2025-09-08T23:59:05Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false |
| dc.title.none.fl_str_mv |
Proposta de uma abordagem para sumarização extrativa de textos científicos longos |
| title |
Proposta de uma abordagem para sumarização extrativa de textos científicos longos |
| spellingShingle |
Proposta de uma abordagem para sumarização extrativa de textos científicos longos Cinthia Mikaela de Souza . Computação – Teses Sumarização automática de textos – Teses Aprendizado de máquina multivisão– Teses Classificação – Teses Sumariza ̧c ̃ao extrativa de textos Aprendizado Multi-visão Classificação |
| title_short |
Proposta de uma abordagem para sumarização extrativa de textos científicos longos |
| title_full |
Proposta de uma abordagem para sumarização extrativa de textos científicos longos |
| title_fullStr |
Proposta de uma abordagem para sumarização extrativa de textos científicos longos |
| title_full_unstemmed |
Proposta de uma abordagem para sumarização extrativa de textos científicos longos |
| title_sort |
Proposta de uma abordagem para sumarização extrativa de textos científicos longos |
| author |
Cinthia Mikaela de Souza |
| author_facet |
Cinthia Mikaela de Souza |
| author_role |
author |
| dc.contributor.author.fl_str_mv |
Cinthia Mikaela de Souza |
| dc.subject.por.fl_str_mv |
. Computação – Teses Sumarização automática de textos – Teses Aprendizado de máquina multivisão– Teses Classificação – Teses Sumariza ̧c ̃ao extrativa de textos Aprendizado Multi-visão Classificação |
| topic |
. Computação – Teses Sumarização automática de textos – Teses Aprendizado de máquina multivisão– Teses Classificação – Teses Sumariza ̧c ̃ao extrativa de textos Aprendizado Multi-visão Classificação |
| description |
Automatic text summarization is one of the solutions that allows users to identify the most relevant information in a textual document, consequently reducing the time to search for information. The objective of this technique is to condense the information of a text into a simple and descriptive summary, which gives the reader a general idea of the text without having to read all its content. Most of the literature in automatic text summarization focuses on proposing and improving Deep Learning methods in order to make these models applicable in the context of long text summarization. Unfortunately, these models still have limitations on the input sequence length. Such a limitation may lead to a loss of information that impairs the quality of the summaries generated. For this reason, we propose in this dissertation a new approach to extractive summarization of long texts. We have two hypotheses, the first is that subdividing the summarization problem into smaller problems and solving them separately, and later combining these solutions can be beneficial for the task of summarizing long texts. The second hypothesis is that there are other characteristics of the text that can be useful in the creation of the summary. With this in mind, we model the text summarization problem as a binary classification problem. We tested different algorithms and showed that multi-section summarization outperforms single-section summarization with a performance gain of approximately 14% and 5% of BertScore for the Plos One and ArXiv datasets, respectively. We also evaluated the performance of the proposed summarizer using different representations of the text and showed that the single-view representation of attributes is the one that gets the best results. This shows that, for the extractive text summarization task, the attributes selected to compose the attributes view allow to better identify the importance of the sentences. Finally, we compare the proposed method with different state-of-the-art models in extractive, abstractive and hybrid summarization and show that our approach outperforms these models. |
| publishDate |
2022 |
| dc.date.none.fl_str_mv |
2022-12-05 2023-03-29T14:51:16Z 2023-03-29T14:51:16Z 2025-09-08T23:59:05Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/1843/51324 |
| url |
https://hdl.handle.net/1843/51324 |
| dc.language.iso.fl_str_mv |
por |
| language |
por |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.publisher.none.fl_str_mv |
Universidade Federal de Minas Gerais |
| publisher.none.fl_str_mv |
Universidade Federal de Minas Gerais |
| dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFMG instname:Universidade Federal de Minas Gerais (UFMG) instacron:UFMG |
| instname_str |
Universidade Federal de Minas Gerais (UFMG) |
| instacron_str |
UFMG |
| institution |
UFMG |
| reponame_str |
Repositório Institucional da UFMG |
| collection |
Repositório Institucional da UFMG |
| repository.name.fl_str_mv |
Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG) |
| repository.mail.fl_str_mv |
repositorio@ufmg.br |
| _version_ |
1856414054142181376 |