Exploring the radiomics approach for covid-19 identification in lung computed tomography

Detalhes bibliográficos
Ano de defesa: 2022
Autor(a) principal: Oliveira, Christian Mattjie de
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Pontifícia Universidade Católica do Rio Grande do Sul
Escola de Medicina
Brasil
PUCRS
Programa de Pós-Graduação em Gerontologia Biomédica
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://tede2.pucrs.br/tede2/handle/tede/10223
Resumo: The COVID-19 pneumonia outbreak has caused global turmoil and was declared a pandemic by the World Health Organization on March 13, 2020. Chest radiological examinations, such as chest X-rays or CT scans, play a vital role in the diagnosis of COVID-19. Several studies have proposed the use of classification models using radiomic features extracted from the lungs in radiological images, mainly for COVID-19 diagnosis and severity assessment. However, few of these studies explore how feature extraction parameters, such as discretization, impact the extracted features. Therefore, this study aims to implement models for identifying COVID-19 through the radiomic signature while investigating different preprocessing and discretization parameters. Our dataset was composed by 180 (128 COVID and 52 non-COVID) chest CT scans performed at Hospital São Lucas da PUCRS which were divided into training (50\%), validation (25\%), and test (25\%) sets. We performed lung segmentation, applied several filters, and discretized the image with 6 different bin sizes: 1, 5, 10, 25, 50, and 75. Features were extracted from all applied filters and bin sizes. Wavelet and non-wavelet features were merged into 36 combinations of bin sizes with 1774 features for each lung. A classification model was trained with each combination of features and the best three models were chosen for the optimization. We identified some of our limitations and used four alternative strategies to try to overcome them: SMOTE, undersampling, feature selection, and only using features from the original image. The best performance was achieved by SMOTE NW25-1 model with an AUC of 0.800. The best three models for each of these alternative strategies were also optimized. Of the 15 optimized models, the six best were selected for feature importance analysis. The laplacian of gaussian and wavelet filters were the ones that generated the most relevant features. Our results indicate that smaller bin sizes, in a range from 1 to 25 may be further investigated for feature extraction in the original image and most filters. Laplacian of gaussian and wavelet filters may perform better with even smaller bin sizes, with a range from 1 to 10.
id P_RS_031f72a9828b5bcf212a56f1b2fbaafb
oai_identifier_str oai:tede2.pucrs.br:tede/10223
network_acronym_str P_RS
network_name_str Biblioteca Digital de Teses e Dissertações da PUC_RS
repository_id_str
spelling Exploring the radiomics approach for covid-19 identification in lung computed tomographyTomografia ComputadorizadaCOVID-19RadiômicaDiscretizaçãoImportância de AtributosCIENCIAS DA SAUDE::MEDICINAThe COVID-19 pneumonia outbreak has caused global turmoil and was declared a pandemic by the World Health Organization on March 13, 2020. Chest radiological examinations, such as chest X-rays or CT scans, play a vital role in the diagnosis of COVID-19. Several studies have proposed the use of classification models using radiomic features extracted from the lungs in radiological images, mainly for COVID-19 diagnosis and severity assessment. However, few of these studies explore how feature extraction parameters, such as discretization, impact the extracted features. Therefore, this study aims to implement models for identifying COVID-19 through the radiomic signature while investigating different preprocessing and discretization parameters. Our dataset was composed by 180 (128 COVID and 52 non-COVID) chest CT scans performed at Hospital São Lucas da PUCRS which were divided into training (50\%), validation (25\%), and test (25\%) sets. We performed lung segmentation, applied several filters, and discretized the image with 6 different bin sizes: 1, 5, 10, 25, 50, and 75. Features were extracted from all applied filters and bin sizes. Wavelet and non-wavelet features were merged into 36 combinations of bin sizes with 1774 features for each lung. A classification model was trained with each combination of features and the best three models were chosen for the optimization. We identified some of our limitations and used four alternative strategies to try to overcome them: SMOTE, undersampling, feature selection, and only using features from the original image. The best performance was achieved by SMOTE NW25-1 model with an AUC of 0.800. The best three models for each of these alternative strategies were also optimized. Of the 15 optimized models, the six best were selected for feature importance analysis. The laplacian of gaussian and wavelet filters were the ones that generated the most relevant features. Our results indicate that smaller bin sizes, in a range from 1 to 25 may be further investigated for feature extraction in the original image and most filters. Laplacian of gaussian and wavelet filters may perform better with even smaller bin sizes, with a range from 1 to 10.O surto de pneumonia de COVID-19 causou transtornos globais e foi declarado uma pandemia pela Organização Mundial da Saúde em 13 de março de 2020. Os exames radiológicos do tórax, como radiografias do tórax ou tomografias computadorizadas, têm um papel vital no diagnóstico da COVID-19. Vários estudos propuseram o uso de modelos de classificação utilizando características radiômicas extraídas dos pulmões em imagens radiológicas, principalmente para o diagnóstico e avaliação da gravidade da COVID-19. Entretanto, poucos desses estudos exploram como os parâmetros de extração de características, como a discretização, impactam as características extraídas. Portanto, este estudo visa implementar modelos para identificar a COVID-19 através da assinatura radiômica enquanto investiga diferentes parâmetros de pré-processamento e discretização. O conjunto de dados utilizado foi de 180 (128 COVID e 52 não COVID) tomografias de tórax realizadas no Hospital São Lucas da PUCRS que foram divididas em conjuntos de treinamento (50\%), validação (25\%) e teste (25\%). Realizamos segmentação dos pulmões, aplicamos diversos filtros e discretizamos a imagem com 6 tamanhos diferentes de bin: 1, 5, 10, 25, 50, e 75. As características foram extraídas de todos os filtros aplicados e tamanhos de bin. Os atributos Wavelet e não-wavelet foram fundidos em 36 combinações de tamanhos de bin com 1774 atributos para cada pulmão. Um modelo de classificação foi treinado com cada combinação de características e os três melhores modelos foram escolhidos para a otimização. Identificamos algumas de nossas limitações e utilizamos quatro estratégias alternativas para tentar superá-las: SMOTE, subamostragem, seleção de atributos e somente utilizar atributos da imagem original. O melhor desempenho foi alcançado pelo modelo SMOTE NW25-1 com um AUC de 0,800. Os três melhores modelos para cada uma destas estratégias alternativas também foram otimizados. Dos 15 modelos otimizados, os seis melhores foram selecionados para análise da importância dos atributos. Os filtros laplaciano da gaussiana e wavelet foram os que geraram os atributos mais relevantes. Nossos resultados indicam que os tamanhos menores de bin, em uma faixa de 1 a 25, podem ser mais investigados para extração de características na imagem original e na maior parte dos filtros. Os filtros laplaciano da gaussiana e wavelet podem ter melhor desempenho com bins ainda menores, em uma faixa de 1 a 10.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPESPontifícia Universidade Católica do Rio Grande do SulEscola de MedicinaBrasilPUCRSPrograma de Pós-Graduação em Gerontologia BiomédicaSilva, Ana Maria Marques dahttp://lattes.cnpq.br/5375482124482980Oliveira, Christian Mattjie de2022-05-17T13:48:10Z2022-03-25info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://tede2.pucrs.br/tede2/handle/tede/10223enginfo:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da PUC_RSinstname:Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)instacron:PUC_RS2022-05-17T23:00:14Zoai:tede2.pucrs.br:tede/10223Biblioteca Digital de Teses e Dissertaçõeshttp://tede2.pucrs.br/tede2/PRIhttps://tede2.pucrs.br/oai/requestbiblioteca.central@pucrs.br||opendoar:2022-05-17T23:00:14Biblioteca Digital de Teses e Dissertações da PUC_RS - Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)false
dc.title.none.fl_str_mv Exploring the radiomics approach for covid-19 identification in lung computed tomography
title Exploring the radiomics approach for covid-19 identification in lung computed tomography
spellingShingle Exploring the radiomics approach for covid-19 identification in lung computed tomography
Oliveira, Christian Mattjie de
Tomografia Computadorizada
COVID-19
Radiômica
Discretização
Importância de Atributos
CIENCIAS DA SAUDE::MEDICINA
title_short Exploring the radiomics approach for covid-19 identification in lung computed tomography
title_full Exploring the radiomics approach for covid-19 identification in lung computed tomography
title_fullStr Exploring the radiomics approach for covid-19 identification in lung computed tomography
title_full_unstemmed Exploring the radiomics approach for covid-19 identification in lung computed tomography
title_sort Exploring the radiomics approach for covid-19 identification in lung computed tomography
author Oliveira, Christian Mattjie de
author_facet Oliveira, Christian Mattjie de
author_role author
dc.contributor.none.fl_str_mv Silva, Ana Maria Marques da
http://lattes.cnpq.br/5375482124482980
dc.contributor.author.fl_str_mv Oliveira, Christian Mattjie de
dc.subject.por.fl_str_mv Tomografia Computadorizada
COVID-19
Radiômica
Discretização
Importância de Atributos
CIENCIAS DA SAUDE::MEDICINA
topic Tomografia Computadorizada
COVID-19
Radiômica
Discretização
Importância de Atributos
CIENCIAS DA SAUDE::MEDICINA
description The COVID-19 pneumonia outbreak has caused global turmoil and was declared a pandemic by the World Health Organization on March 13, 2020. Chest radiological examinations, such as chest X-rays or CT scans, play a vital role in the diagnosis of COVID-19. Several studies have proposed the use of classification models using radiomic features extracted from the lungs in radiological images, mainly for COVID-19 diagnosis and severity assessment. However, few of these studies explore how feature extraction parameters, such as discretization, impact the extracted features. Therefore, this study aims to implement models for identifying COVID-19 through the radiomic signature while investigating different preprocessing and discretization parameters. Our dataset was composed by 180 (128 COVID and 52 non-COVID) chest CT scans performed at Hospital São Lucas da PUCRS which were divided into training (50\%), validation (25\%), and test (25\%) sets. We performed lung segmentation, applied several filters, and discretized the image with 6 different bin sizes: 1, 5, 10, 25, 50, and 75. Features were extracted from all applied filters and bin sizes. Wavelet and non-wavelet features were merged into 36 combinations of bin sizes with 1774 features for each lung. A classification model was trained with each combination of features and the best three models were chosen for the optimization. We identified some of our limitations and used four alternative strategies to try to overcome them: SMOTE, undersampling, feature selection, and only using features from the original image. The best performance was achieved by SMOTE NW25-1 model with an AUC of 0.800. The best three models for each of these alternative strategies were also optimized. Of the 15 optimized models, the six best were selected for feature importance analysis. The laplacian of gaussian and wavelet filters were the ones that generated the most relevant features. Our results indicate that smaller bin sizes, in a range from 1 to 25 may be further investigated for feature extraction in the original image and most filters. Laplacian of gaussian and wavelet filters may perform better with even smaller bin sizes, with a range from 1 to 10.
publishDate 2022
dc.date.none.fl_str_mv 2022-05-17T13:48:10Z
2022-03-25
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://tede2.pucrs.br/tede2/handle/tede/10223
url https://tede2.pucrs.br/tede2/handle/tede/10223
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Pontifícia Universidade Católica do Rio Grande do Sul
Escola de Medicina
Brasil
PUCRS
Programa de Pós-Graduação em Gerontologia Biomédica
publisher.none.fl_str_mv Pontifícia Universidade Católica do Rio Grande do Sul
Escola de Medicina
Brasil
PUCRS
Programa de Pós-Graduação em Gerontologia Biomédica
dc.source.none.fl_str_mv reponame:Biblioteca Digital de Teses e Dissertações da PUC_RS
instname:Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)
instacron:PUC_RS
instname_str Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)
instacron_str PUC_RS
institution PUC_RS
reponame_str Biblioteca Digital de Teses e Dissertações da PUC_RS
collection Biblioteca Digital de Teses e Dissertações da PUC_RS
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da PUC_RS - Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)
repository.mail.fl_str_mv biblioteca.central@pucrs.br||
_version_ 1850041308594307072