Identifica??o de microprote?nas codificadas por pequenas ORFs em mycolicibacterium smegmatis (mycobacterium smegmatis) por meio de an?lise proteogen?mica

Dalberto, Pedro Ferrari

Identifica??o de microprote?nas codificadas por pequenas ORFs em mycolicibacterium smegmatis (mycobacterium smegmatis) por meio de an?lise proteogen?mica

Detalhes bibliográficos
Ano de defesa:	2021
Autor(a) principal:	Dalberto, Pedro Ferrari
Orientador(a):	Bizarro, Cristiano Valim
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Pontif?cia Universidade Cat?lica do Rio Grande do Sul
Programa de Pós-Graduação:	Programa de P?s-Gradua??o em Biologia Celular e Molecular
Departamento:	Escola de Ci?ncias
País:	Brasil
Palavras-chave em Português:	Tuberculose Espectrometria de Massa Proteogen?mica SmORFs Microprote?nas
Área do conhecimento CNPq:	CIENCIAS BIOLOGICAS::BIOLOGIA GERAL
Link de acesso:	http://tede2.pucrs.br/tede2/handle/tede/9772
Resumo:	Tuberculosis is an infectious disease mainly caused by Mycobacterium tuberculosis. Despite the availability of treatment and vaccine, this disease accounts for millions of deaths annually. Moreover, the emergence of resistant strains to the first-line drugs is increasing. Therefore, the understanding of mycobacterial biology is essential to the development new therapeutic strategies to reduce the tuberculosis incidence in the world. Since the first prokaryotic DNA sequencing twenty-five years ago, it has become possible to delve deeper into bacterial genomes and to better understand the organization and regulation of their genes. Traditionally, annotation pipelines only include in their workflow Open Reading Frames (ORFs) with at least 300 nucleotides, or 100 codons. Hence, the set of ORFs with less than 100 codons, known as small ORFs (smORFs), is excluded by an arbitrary cutoff since countless smORFs sequences may be found in any genome just by chance, with a high probability of being biologically meaningless and not, in fact, a coding sequence. In this work. We investigated the hidden universe of microproteins encoded by smORFs in Mycolicibacterium smegmatis mc?155 (Mycobacterium smegmatis), normally used as a model of M. tuberculosis due to its non-pathogenic and fast-growing characteristics, by applying a proteogenomic approach. Combining genomics, transcriptomics and proteomics we were able to accurately identify and annotate these smORFs. We improved the results performing different methods to enrich low molecular weight proteins, as they may have low abundance in a complex biological sample. Throughout our analysis, we identified 16 unannotated ORFs, one of which has 23 paralogues spread across copies of IS1096, a well-known M. smegmatis transposon, showing that element encodes a total of three ORFs, one more than previously reported. Our workflow also allowed us to extend the sequence of a previously annotated protein and to find the shortest ORF yet in the genome of M. smegmatis. We were able to show that the most prominent start codon in these sequences is GTG, followed by the canonical ATG and the alternatives ATT and TTG. Many of these new ORFs have both annotated and unannotated orthologous sequences in Mycobacteria and other close bacteria. Moreover, we showed how the combination of multiple proteomes can properly exclude known proteins that would otherwise be deemed as novel. We expect this study to contribute to the understanding of mycobacterial genomes and proteomes. Moreover, we believe that the discovery of these proteins will provide insights for new studies dedicated to their structures, functions, and essentiality.

Metadados do item

id	P_RS_31ae807a248d32d74eae9e8bd4cffe61
oai_identifier_str	oai:tede2.pucrs.br:tede/9772
network_acronym_str	P_RS
network_name_str	Biblioteca Digital de Teses e Dissertações da PUC_RS
repository_id_str
spelling	Bizarro, Cristiano Valimhttp://lattes.cnpq.br/8237569228020224http://lattes.cnpq.br/0215025477816008Dalberto, Pedro Ferrari2021-07-01T13:45:32Z2021-03-29http://tede2.pucrs.br/tede2/handle/tede/9772Tuberculosis is an infectious disease mainly caused by Mycobacterium tuberculosis. Despite the availability of treatment and vaccine, this disease accounts for millions of deaths annually. Moreover, the emergence of resistant strains to the first-line drugs is increasing. Therefore, the understanding of mycobacterial biology is essential to the development new therapeutic strategies to reduce the tuberculosis incidence in the world. Since the first prokaryotic DNA sequencing twenty-five years ago, it has become possible to delve deeper into bacterial genomes and to better understand the organization and regulation of their genes. Traditionally, annotation pipelines only include in their workflow Open Reading Frames (ORFs) with at least 300 nucleotides, or 100 codons. Hence, the set of ORFs with less than 100 codons, known as small ORFs (smORFs), is excluded by an arbitrary cutoff since countless smORFs sequences may be found in any genome just by chance, with a high probability of being biologically meaningless and not, in fact, a coding sequence. In this work. We investigated the hidden universe of microproteins encoded by smORFs in Mycolicibacterium smegmatis mc?155 (Mycobacterium smegmatis), normally used as a model of M. tuberculosis due to its non-pathogenic and fast-growing characteristics, by applying a proteogenomic approach. Combining genomics, transcriptomics and proteomics we were able to accurately identify and annotate these smORFs. We improved the results performing different methods to enrich low molecular weight proteins, as they may have low abundance in a complex biological sample. Throughout our analysis, we identified 16 unannotated ORFs, one of which has 23 paralogues spread across copies of IS1096, a well-known M. smegmatis transposon, showing that element encodes a total of three ORFs, one more than previously reported. Our workflow also allowed us to extend the sequence of a previously annotated protein and to find the shortest ORF yet in the genome of M. smegmatis. We were able to show that the most prominent start codon in these sequences is GTG, followed by the canonical ATG and the alternatives ATT and TTG. Many of these new ORFs have both annotated and unannotated orthologous sequences in Mycobacteria and other close bacteria. Moreover, we showed how the combination of multiple proteomes can properly exclude known proteins that would otherwise be deemed as novel. We expect this study to contribute to the understanding of mycobacterial genomes and proteomes. Moreover, we believe that the discovery of these proteins will provide insights for new studies dedicated to their structures, functions, and essentiality.A tuberculose ? uma doen?a infecciosa causada principalmente pelo Mycobacterium tuberculosis. Apesar da disponibilidade de tratamento e vacina, ela ? respons?vel por milh?es de mortes anualmente. Al?m disso, o surgimento de cepas resistentes aos medicamentos de primeira linha vem em constante crescimento. Portanto, o entendimento da biologia micobacteriana ? essencial para o desenvolvimento de novas estrat?gias terap?uticas que reduzam a preval?ncia da tuberculose no mundo. Desde o primeiro sequenciamento de DNA procari?tico, h? mais de vinte e cinco anos, tornou-se poss?vel desvendar os mist?rios dos genomas bacterianos e compreender melhor a organiza??o e a regula??o de seus genes. Tradicionalmente, as pipelines de anota??o incluem apenas em seu fluxo de trabalho ORFs (do ingl?s Open Reading Frame) com pelo menos 300 nucleot?deos ou 100 c?dons. As ORFs inferiores a 100 c?dons, conhecidas como pequenas ORFs (small ORFs - smORFs), s?o exclu?das por um corte arbitr?rio, uma vez que um grande n?mero de smORFs podem ser encontradas em qualquer genoma apenas ao acaso, com uma alta probabilidade de serem biologicamente sem sentido e n?o codificarem prote?nas. Neste trabalho, investigamos o universo oculto de microprote?nas codificadas por smORFS em Mycolicibacterium smegmatis mc?155 (Mycobacterium smegmatis), normalmente utilizado como modelo de M. tuberculosis devido ?s suas caracter?sticas n?o patog?nicas e de r?pido crescimento, por meio de uma abordagem proteogen?mica. Combinando gen?mica, transcript?mica e prote?mica, fomos capazes de identificar e anotar com precis?o smORFs em M. smegmatis. Conseguimos aumentar a efici?ncia de identifica??o de microprote?nas por meio de diferentes m?todos de enriquecimento de prote?nas de baixo peso molecular, pois elas, normalmente, s?o consideradas prote?nas de baixa abund?ncia em uma amostra biol?gica complexa. Identificamos 16 ORFs n?o anotadas, uma delas possuindo 23 par?logos espalhados por c?pias do elemento de inser??o IS1096, um conhecido transposon de M. smegmatis. Desse modo, descrevemos a exist?ncia de uma terceira ORF funcional do elemento IS1096, al?m das duas j? conhecidas e caracterizadas. Nosso trabalho tamb?m nos permitiu estender a sequ?ncia de uma prote?na previamente anotada e identificar a menor sequ?ncia codificadora j? encontrada no genoma de M. smegmatis. Mostramos que o c?don de in?cio mais frequente nessas sequ?ncias ? o GTG, seguido pelo ATG can?nico e posteriormente pelos c?dons alternativos ATT e TTG. Muitas dessas novas ORFs possuem tanto sequ?ncias ort?logas anotadas quanto n?o anotadas em micobact?rias e em outras bact?rias pr?ximas. Al?m disso, mostramos como a combina??o de diferentes bancos de dados de proteomas podem excluir adequadamente prote?nas conhecidas que, de outra forma, seriam consideradas novas. Esperamos que este estudo ir? contribuir para a compreens?o dos genomas e proteomas micobacterianos. Ainda, acreditamos que a descoberta dessas prote?nas ir? proporcionar novos estudos dedicados ? elucida??o de suas estruturas, fun??es e avalia??o de essencialidade.Submitted by PPG Biologia Celular e Molecular (bcm@pucrs.br) on 2021-06-30T15:42:08Z No. of bitstreams: 1 PEDRO_FERRARI_DALBERTO_TES.pdf: 16668307 bytes, checksum: f56c0d217df790c46d1e5b644c7b49a9 (MD5)Approved for entry into archive by Sheila Dias (sheila.dias@pucrs.br) on 2021-07-01T13:35:37Z (GMT) No. of bitstreams: 1 PEDRO_FERRARI_DALBERTO_TES.pdf: 16668307 bytes, checksum: f56c0d217df790c46d1e5b644c7b49a9 (MD5)Made available in DSpace on 2021-07-01T13:45:32Z (GMT). No. of bitstreams: 1 PEDRO_FERRARI_DALBERTO_TES.pdf: 16668307 bytes, checksum: f56c0d217df790c46d1e5b644c7b49a9 (MD5) Previous issue date: 2021-03-29Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior - CAPESapplication/pdfhttp://tede2.pucrs.br:80/tede2/retrieve/181521/TES_PEDRO_FERRARI_DALBERTO_CONFIDENCIAL.pdf.jpgporPontif?cia Universidade Cat?lica do Rio Grande do SulPrograma de P?s-Gradua??o em Biologia Celular e MolecularPUCRSBrasilEscola de Ci?nciasTuberculoseEspectrometria de MassaProteogen?micaSmORFsMicroprote?nasCIENCIAS BIOLOGICAS::BIOLOGIA GERALIdentifica??o de microprote?nas codificadas por pequenas ORFs em mycolicibacterium smegmatis (mycobacterium smegmatis) por meio de an?lise proteogen?micainfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisTrabalho ser? publicado como artigo ou livro60 meses01/07/20263463594373552466096500500600-16345593859312446973590462550136975366info:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da PUC_RSinstname:Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)instacron:PUC_RSTHUMBNAILTES_PEDRO_FERRARI_DALBERTO_CONFIDENCIAL.pdf.jpgTES_PEDRO_FERRARI_DALBERTO_CONFIDENCIAL.pdf.jpgimage/jpeg4083http://tede2.pucrs.br/tede2/bitstream/tede/9772/4/TES_PEDRO_FERRARI_DALBERTO_CONFIDENCIAL.pdf.jpgf647f5b28a894bd2f298303e6b098edcMD54TEXTTES_PEDRO_FERRARI_DALBERTO_CONFIDENCIAL.pdf.txtTES_PEDRO_FERRARI_DALBERTO_CONFIDENCIAL.pdf.txttext/plain1810http://tede2.pucrs.br/tede2/bitstream/tede/9772/3/TES_PEDRO_FERRARI_DALBERTO_CONFIDENCIAL.pdf.txt6085edbda60668471870f9fb65f53ecbMD53ORIGINALTES_PEDRO_FERRARI_DALBERTO_CONFIDENCIAL.pdfTES_PEDRO_FERRARI_DALBERTO_CONFIDENCIAL.pdfapplication/pdf342501http://tede2.pucrs.br/tede2/bitstream/tede/9772/2/TES_PEDRO_FERRARI_DALBERTO_CONFIDENCIAL.pdf38b5f7fb33c45aa488e82e83a71dca77MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-8590http://tede2.pucrs.br/tede2/bitstream/tede/9772/1/license.txt220e11f2d3ba5354f917c7035aadef24MD51tede/97722021-07-01 12:00:24.609oai:tede2.pucrs.br:tede/9772QXV0b3JpemE/P28gcGFyYSBQdWJsaWNhPz9vIEVsZXRyP25pY2E6IENvbSBiYXNlIG5vIGRpc3Bvc3RvIG5hIExlaSBGZWRlcmFsIG4/OS42MTAsIGRlIDE5IGRlIGZldmVyZWlybyBkZSAxOTk4LCBvIGF1dG9yIEFVVE9SSVpBIGEgcHVibGljYT8/byBlbGV0cj9uaWNhIGRhIHByZXNlbnRlIG9icmEgbm8gYWNlcnZvIGRhIEJpYmxpb3RlY2EgRGlnaXRhbCBkYSBQb250aWY/Y2lhIFVuaXZlcnNpZGFkZSBDYXQ/bGljYSBkbyBSaW8gR3JhbmRlIGRvIFN1bCwgc2VkaWFkYSBhIEF2LiBJcGlyYW5nYSA2NjgxLCBQb3J0byBBbGVncmUsIFJpbyBHcmFuZGUgZG8gU3VsLCBjb20gcmVnaXN0cm8gZGUgQ05QSiA4ODYzMDQxMzAwMDItODEgYmVtIGNvbW8gZW0gb3V0cmFzIGJpYmxpb3RlY2FzIGRpZ2l0YWlzLCBuYWNpb25haXMgZSBpbnRlcm5hY2lvbmFpcywgY29ucz9yY2lvcyBlIHJlZGVzID9zIHF1YWlzIGEgYmlibGlvdGVjYSBkYSBQVUNSUyBwb3NzYSBhIHZpciBwYXJ0aWNpcGFyLCBzZW0gP251cyBhbHVzaXZvIGFvcyBkaXJlaXRvcyBhdXRvcmFpcywgYSB0P3R1bG8gZGUgZGl2dWxnYT8/byBkYSBwcm9kdT8/byBjaWVudD9maWNhLgo=Biblioteca Digital de Teses e Dissertaçõeshttp://tede2.pucrs.br/tede2/PRIhttps://tede2.pucrs.br/oai/requestbiblioteca.central@pucrs.br\|\|opendoar:2021-07-01T15:00:24Biblioteca Digital de Teses e Dissertações da PUC_RS - Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)false
dc.title.por.fl_str_mv	Identifica??o de microprote?nas codificadas por pequenas ORFs em mycolicibacterium smegmatis (mycobacterium smegmatis) por meio de an?lise proteogen?mica
title	Identifica??o de microprote?nas codificadas por pequenas ORFs em mycolicibacterium smegmatis (mycobacterium smegmatis) por meio de an?lise proteogen?mica
spellingShingle	Identifica??o de microprote?nas codificadas por pequenas ORFs em mycolicibacterium smegmatis (mycobacterium smegmatis) por meio de an?lise proteogen?mica Dalberto, Pedro Ferrari Tuberculose Espectrometria de Massa Proteogen?mica SmORFs Microprote?nas CIENCIAS BIOLOGICAS::BIOLOGIA GERAL
title_short	Identifica??o de microprote?nas codificadas por pequenas ORFs em mycolicibacterium smegmatis (mycobacterium smegmatis) por meio de an?lise proteogen?mica
title_full	Identifica??o de microprote?nas codificadas por pequenas ORFs em mycolicibacterium smegmatis (mycobacterium smegmatis) por meio de an?lise proteogen?mica
title_fullStr	Identifica??o de microprote?nas codificadas por pequenas ORFs em mycolicibacterium smegmatis (mycobacterium smegmatis) por meio de an?lise proteogen?mica
title_full_unstemmed	Identifica??o de microprote?nas codificadas por pequenas ORFs em mycolicibacterium smegmatis (mycobacterium smegmatis) por meio de an?lise proteogen?mica
title_sort	Identifica??o de microprote?nas codificadas por pequenas ORFs em mycolicibacterium smegmatis (mycobacterium smegmatis) por meio de an?lise proteogen?mica
author	Dalberto, Pedro Ferrari
author_facet	Dalberto, Pedro Ferrari
author_role	author
dc.contributor.advisor1.fl_str_mv	Bizarro, Cristiano Valim
dc.contributor.advisor1Lattes.fl_str_mv	http://lattes.cnpq.br/8237569228020224
dc.contributor.authorLattes.fl_str_mv	http://lattes.cnpq.br/0215025477816008
dc.contributor.author.fl_str_mv	Dalberto, Pedro Ferrari
contributor_str_mv	Bizarro, Cristiano Valim
dc.subject.por.fl_str_mv	Tuberculose Espectrometria de Massa Proteogen?mica SmORFs Microprote?nas
topic	Tuberculose Espectrometria de Massa Proteogen?mica SmORFs Microprote?nas CIENCIAS BIOLOGICAS::BIOLOGIA GERAL
dc.subject.cnpq.fl_str_mv	CIENCIAS BIOLOGICAS::BIOLOGIA GERAL
description	Tuberculosis is an infectious disease mainly caused by Mycobacterium tuberculosis. Despite the availability of treatment and vaccine, this disease accounts for millions of deaths annually. Moreover, the emergence of resistant strains to the first-line drugs is increasing. Therefore, the understanding of mycobacterial biology is essential to the development new therapeutic strategies to reduce the tuberculosis incidence in the world. Since the first prokaryotic DNA sequencing twenty-five years ago, it has become possible to delve deeper into bacterial genomes and to better understand the organization and regulation of their genes. Traditionally, annotation pipelines only include in their workflow Open Reading Frames (ORFs) with at least 300 nucleotides, or 100 codons. Hence, the set of ORFs with less than 100 codons, known as small ORFs (smORFs), is excluded by an arbitrary cutoff since countless smORFs sequences may be found in any genome just by chance, with a high probability of being biologically meaningless and not, in fact, a coding sequence. In this work. We investigated the hidden universe of microproteins encoded by smORFs in Mycolicibacterium smegmatis mc?155 (Mycobacterium smegmatis), normally used as a model of M. tuberculosis due to its non-pathogenic and fast-growing characteristics, by applying a proteogenomic approach. Combining genomics, transcriptomics and proteomics we were able to accurately identify and annotate these smORFs. We improved the results performing different methods to enrich low molecular weight proteins, as they may have low abundance in a complex biological sample. Throughout our analysis, we identified 16 unannotated ORFs, one of which has 23 paralogues spread across copies of IS1096, a well-known M. smegmatis transposon, showing that element encodes a total of three ORFs, one more than previously reported. Our workflow also allowed us to extend the sequence of a previously annotated protein and to find the shortest ORF yet in the genome of M. smegmatis. We were able to show that the most prominent start codon in these sequences is GTG, followed by the canonical ATG and the alternatives ATT and TTG. Many of these new ORFs have both annotated and unannotated orthologous sequences in Mycobacteria and other close bacteria. Moreover, we showed how the combination of multiple proteomes can properly exclude known proteins that would otherwise be deemed as novel. We expect this study to contribute to the understanding of mycobacterial genomes and proteomes. Moreover, we believe that the discovery of these proteins will provide insights for new studies dedicated to their structures, functions, and essentiality.
publishDate	2021
dc.date.accessioned.fl_str_mv	2021-07-01T13:45:32Z
dc.date.issued.fl_str_mv	2021-03-29
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://tede2.pucrs.br/tede2/handle/tede/9772
url	http://tede2.pucrs.br/tede2/handle/tede/9772
dc.language.iso.fl_str_mv	por
language	por
dc.relation.program.fl_str_mv	3463594373552466096
dc.relation.confidence.fl_str_mv	500 500 600
dc.relation.cnpq.fl_str_mv	-1634559385931244697
dc.relation.sponsorship.fl_str_mv	3590462550136975366
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Pontif?cia Universidade Cat?lica do Rio Grande do Sul
dc.publisher.program.fl_str_mv	Programa de P?s-Gradua??o em Biologia Celular e Molecular
dc.publisher.initials.fl_str_mv	PUCRS
dc.publisher.country.fl_str_mv	Brasil
dc.publisher.department.fl_str_mv	Escola de Ci?ncias
publisher.none.fl_str_mv	Pontif?cia Universidade Cat?lica do Rio Grande do Sul
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da PUC_RS instname:Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS) instacron:PUC_RS
instname_str	Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)
instacron_str	PUC_RS
institution	PUC_RS
reponame_str	Biblioteca Digital de Teses e Dissertações da PUC_RS
collection	Biblioteca Digital de Teses e Dissertações da PUC_RS
bitstream.url.fl_str_mv	http://tede2.pucrs.br/tede2/bitstream/tede/9772/4/TES_PEDRO_FERRARI_DALBERTO_CONFIDENCIAL.pdf.jpg http://tede2.pucrs.br/tede2/bitstream/tede/9772/3/TES_PEDRO_FERRARI_DALBERTO_CONFIDENCIAL.pdf.txt http://tede2.pucrs.br/tede2/bitstream/tede/9772/2/TES_PEDRO_FERRARI_DALBERTO_CONFIDENCIAL.pdf http://tede2.pucrs.br/tede2/bitstream/tede/9772/1/license.txt
bitstream.checksum.fl_str_mv	f647f5b28a894bd2f298303e6b098edc 6085edbda60668471870f9fb65f53ecb 38b5f7fb33c45aa488e82e83a71dca77 220e11f2d3ba5354f917c7035aadef24
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5 MD5
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da PUC_RS - Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)
repository.mail.fl_str_mv	biblioteca.central@pucrs.br\|\|
_version_	1796793250097397760

Identifica??o de microprote?nas codificadas por pequenas ORFs em mycolicibacterium smegmatis (mycobacterium smegmatis) por meio de an?lise proteogen?mica

Registros relacionados