Fast recovery in parallel state machine replication

Mendizabal, Odorico Machado

Fast recovery in parallel state machine replication

Detalhes bibliográficos
Ano de defesa:	2016
Autor(a) principal:	Mendizabal, Odorico Machado
Orientador(a):	Dotti, Fernando Lu?s
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	eng
Instituição de defesa:	Pontif?cia Universidade Cat?lica do Rio Grande do Sul
Programa de Pós-Graduação:	Programa de P?s-Gradua??o em Ci?ncia da Computa??o
Departamento:	Faculdade de Inform?tica
País:	Brasil
Palavras-chave em Português:	PROCESSAMENTO DISTRIBU?DO TOLER?NCIA A FALHAS (INFORM?TICA) INFORM?TICA
Área do conhecimento CNPq:	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
Link de acesso:	http://tede2.pucrs.br/tede2/handle/tede/6879
Resumo:	A well-established technique used to design fault-tolerant systems is state machine replication. In part, this is explained by the simplicity of the approach and its strong consistency guarantees. The traditional state machine replication model builds on the sequential execution of requests to ensure consistency among the replicas. Sequentiality of execution, however, threatens the scalability of replicas. Recently, some proposals have suggested parallelizing the execution of replicas to achieve higher performance. Despite the success of parallel state machine replication in accomplishing high performance, the implication of such models on the recovery is mostly left unaddressed. Even for the traditional state machine replication approach, relatively few studies have considered the issues involved in recovering faulty replicas. The motivation of this thesis is clarifying the challenges and performance implications involved in checkpointing and recovery for parallel state machine replication. The thesis also aims to advance the state-of-the-art by proposing novel algorithms for checkpointing and recovery in the context of parallel state machine replication. Performing checkpoints efficiently in such parallel models is more challenging than in classic state machine replication because the checkpoint operation must account for the execution of concurrent commands. In this thesis, we review checkpointing techniques for parallel approaches to state machine replication and compare their impact on performance through simulation. Furthermore, we propose two checkpoint techniques for one of these parallel models. Recovering a replica requires (a) retrieving and installing an up-to-date replica checkpoint, and (b) restoring and re-executing the log of commands not reflected in the checkpoint. Parallel state machine replication render recovery particularly challenging since throughput under normal execution (i.e., in the absence of failures) is very high. Consequently, the log of commands that need to be applied until the replica is available is typically large, which delays the recovery. We present two novel techniques to optimize recovery in parallel state machine replication. The first technique allows new commands to execute concurrently with the execution of logged commands, before replicas are completely updated. The second technique introduces ondemand state recovery, which allows segments of a checkpoint to be recovered concurrently. We experimentally assess the performance of our recovery techniques using a full-fledged parallel state machine replication prototype and compare the performance of these techniques to traditional recovery mechanisms under different scenarios.

Metadados do item

id	P_RS_7f31eff65081d3df3db895d75aedec15
oai_identifier_str	oai:tede2.pucrs.br:tede/6879
network_acronym_str	P_RS
network_name_str	Biblioteca Digital de Teses e Dissertações da PUC_RS
repository_id_str
spelling	Dotti, Fernando Lu?s502.796.290-87http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4782513J6Pedone, Fernandohttp://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K8164827J6978.941.170-72http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4777868A7Mendizabal, Odorico Machado2016-08-04T16:39:32Z2016-05-16http://tede2.pucrs.br/tede2/handle/tede/6879A well-established technique used to design fault-tolerant systems is state machine replication. In part, this is explained by the simplicity of the approach and its strong consistency guarantees. The traditional state machine replication model builds on the sequential execution of requests to ensure consistency among the replicas. Sequentiality of execution, however, threatens the scalability of replicas. Recently, some proposals have suggested parallelizing the execution of replicas to achieve higher performance. Despite the success of parallel state machine replication in accomplishing high performance, the implication of such models on the recovery is mostly left unaddressed. Even for the traditional state machine replication approach, relatively few studies have considered the issues involved in recovering faulty replicas. The motivation of this thesis is clarifying the challenges and performance implications involved in checkpointing and recovery for parallel state machine replication. The thesis also aims to advance the state-of-the-art by proposing novel algorithms for checkpointing and recovery in the context of parallel state machine replication. Performing checkpoints efficiently in such parallel models is more challenging than in classic state machine replication because the checkpoint operation must account for the execution of concurrent commands. In this thesis, we review checkpointing techniques for parallel approaches to state machine replication and compare their impact on performance through simulation. Furthermore, we propose two checkpoint techniques for one of these parallel models. Recovering a replica requires (a) retrieving and installing an up-to-date replica checkpoint, and (b) restoring and re-executing the log of commands not reflected in the checkpoint. Parallel state machine replication render recovery particularly challenging since throughput under normal execution (i.e., in the absence of failures) is very high. Consequently, the log of commands that need to be applied until the replica is available is typically large, which delays the recovery. We present two novel techniques to optimize recovery in parallel state machine replication. The first technique allows new commands to execute concurrently with the execution of logged commands, before replicas are completely updated. The second technique introduces ondemand state recovery, which allows segments of a checkpoint to be recovered concurrently. We experimentally assess the performance of our recovery techniques using a full-fledged parallel state machine replication prototype and compare the performance of these techniques to traditional recovery mechanisms under different scenarios.A replica??o m?quina de estados ? uma t?cnica bem estabelecida para desenvolvimento de sistemas tolerantes a faltas. Em parte, isso ? explicado pela simplicidade da abordagem e sua garantia de consist?ncia forte. O modelo de replica??o m?quina de estados tradicional baseia-se na execu??o sequencial de requisi??es para garantir consist?ncia forte entre as r?plicas. A sequencialidade da execu??o, no entanto, compromete a escalabilidade. Recentemente, algumas propostas sugeriram paralelizar a execu??o de algumas requisi??es visando um aumento na vaz?o. Apesar do sucesso da replica??o m?quina de estados paralela em obter alto desempenho, as implica??es deste modelo em procedimentos de recupera??o s?o desprezadas. Mesmo para a abordagem de replica??o m?quina de estados tradicional, poucos estudos t?m considerado as quest?es envolvidas na recupera??o de r?plicas defeituosas. A motiva??o desta tese ? elucidar os desafios e implica??es no desempenho decorrentes de mecanismos de pontos de verifica??o e recupera??o em replica??o m?quina de estados paralela. A tese tamb?m avan?a no estado-da-arte, propondo novos algoritmos para pontos de verifica??o e recupera??o no contexto de replica??o m?quina de estados paralela. Criar pontos de verifica??o de forma eficiente em tais modelos ? mais desafiador do que na replica??o m?quina de estados cl?ssica porque deve-se considerar a execu??o concorrente de comandos. Nesta tese, n?s revisitamos as t?cnicas para pontos de verifica??o em abordagens paralelas de replica??o m?quina de estados e comparamos o impacto destas no desempenho atrav?s de simula??o. Al?m disso, n?s propomos duas t?cnicas de ponto de verifica??o para um destes modelos paralelos. Recuperar uma r?plica requer: (a) obter e instalar o estado de um ponto de verifica??o de uma r?plica atualizada, e (b) recuperar e re-executar os comandos n?o refletidos no ponto de verifica??o. T?cnicas paralelas para replica??o m?quina de estado tornam a recupera??o de r?plicas particularmente dif?cil uma vez que a vaz?o de processamento durante a execu??o normal (isto ?, na aus?ncia de falhas) ? muito alta. Consequentemente, o registo de comandos que precisa ser re-executado ? tipicamente grande, o que atrasa a recupera??o. N?s apresentamos duas novas t?cnicas para otimizar a recupera??o em replica??o m?quina de estados paralela. A primeira t?cnica permite que novos comandos sejam executados em paralelo com a re-execu??o dos comandos n?o refletidos no ponto de verifica??o. Isto ocorre antes da r?plica estar completamente atualizada. A segunda t?cnica introduz recupera??o de estado sob-demanda, permitindo que segmentos de um ponto de verifica??o possam ser recuperados apenas quando necess?rios, ou ainda, concorrentemente. N?s avaliamos o desempenho de nossas t?cnicas de recupera??o usando um prot?tipo completo para replica??o m?quina de estados paralela e comparamos o desempenho destas t?cnicas com mecanismos tradicionais de recupera??o em diferentes cen?rios.Submitted by Setor de Tratamento da Informa??o - BC/PUCRS (tede2@pucrs.br) on 2016-08-04T16:39:32Z No. of bitstreams: 1 TES_ODORICO_MACHADO_MENDIZABAL_COMPLETO.pdf: 1253774 bytes, checksum: 8ab2360ff12ca83b15b415cba7eda7de (MD5)Made available in DSpace on 2016-08-04T16:39:32Z (GMT). No. of bitstreams: 1 TES_ODORICO_MACHADO_MENDIZABAL_COMPLETO.pdf: 1253774 bytes, checksum: 8ab2360ff12ca83b15b415cba7eda7de (MD5) Previous issue date: 2016-05-16application/pdfhttp://tede2.pucrs.br:80/tede2/retrieve/165919/TES_ODORICO_MACHADO_MENDIZABAL_COMPLETO.pdf.jpgengPontif?cia Universidade Cat?lica do Rio Grande do SulPrograma de P?s-Gradua??o em Ci?ncia da Computa??oPUCRSBrasilFaculdade de Inform?ticaPROCESSAMENTO DISTRIBU?DOTOLER?NCIA A FALHAS (INFORM?TICA)INFORM?TICACIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOFast recovery in parallel state machine replicationinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesis1974996533081274470600600600-30085425104011491443671711205811204509info:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da PUC_RSinstname:Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)instacron:PUC_RSTHUMBNAILTES_ODORICO_MACHADO_MENDIZABAL_COMPLETO.pdf.jpgTES_ODORICO_MACHADO_MENDIZABAL_COMPLETO.pdf.jpgimage/jpeg3462http://tede2.pucrs.br/tede2/bitstream/tede/6879/5/TES_ODORICO_MACHADO_MENDIZABAL_COMPLETO.pdf.jpgeca6bbd41d643644e781486b18159dd6MD55TEXTTES_ODORICO_MACHADO_MENDIZABAL_COMPLETO.pdf.txtTES_ODORICO_MACHADO_MENDIZABAL_COMPLETO.pdf.txttext/plain223967http://tede2.pucrs.br/tede2/bitstream/tede/6879/4/TES_ODORICO_MACHADO_MENDIZABAL_COMPLETO.pdf.txt19c74684056816f30763f73436eb70c1MD54LICENSElicense.txtlicense.txttext/plain; charset=utf-8610http://tede2.pucrs.br/tede2/bitstream/tede/6879/3/license.txt5a9d6006225b368ef605ba16b4f6d1beMD53ORIGINALTES_ODORICO_MACHADO_MENDIZABAL_COMPLETO.pdfTES_ODORICO_MACHADO_MENDIZABAL_COMPLETO.pdfapplication/pdf1253774http://tede2.pucrs.br/tede2/bitstream/tede/6879/2/TES_ODORICO_MACHADO_MENDIZABAL_COMPLETO.pdf8ab2360ff12ca83b15b415cba7eda7deMD52tede/68792016-08-04 20:00:51.213oai:tede2.pucrs.br:tede/6879QXV0b3JpemHDp8OjbyBwYXJhIFB1YmxpY2HDp8OjbyBFbGV0csO0bmljYTogQ29tIGJhc2Ugbm8gZGlzcG9zdG8gbmEgTGVpIEZlZGVyYWwgbsK6OS42MTAsIGRlIDE5IGRlIGZldmVyZWlybyBkZSAxOTk4LCBvIGF1dG9yIEFVVE9SSVpBIGEgcHVibGljYcOnw6NvIGVsZXRyw7RuaWNhIGRhIHByZXNlbnRlIG9icmEgbm8gYWNlcnZvIGRhIEJpYmxpb3RlY2EgRGlnaXRhbCBkYSBQb250aWbDrWNpYSBVbml2ZXJzaWRhZGUgQ2F0w7NsaWNhIGRvIFJpbyBHcmFuZGUgZG8gU3VsLCBzZWRpYWRhIGEgQXYuIElwaXJhbmdhIDY2ODEsIFBvcnRvIEFsZWdyZSwgUmlvIEdyYW5kZSBkbyBTdWwsIGNvbSByZWdpc3RybyBkZSBDTlBKIDg4NjMwNDEzMDAwMi04MSBiZW0gY29tbyBlbSBvdXRyYXMgYmlibGlvdGVjYXMgZGlnaXRhaXMsIG5hY2lvbmFpcyBlIGludGVybmFjaW9uYWlzLCBjb25zw7NyY2lvcyBlIHJlZGVzIMOgcyBxdWFpcyBhIGJpYmxpb3RlY2EgZGEgUFVDUlMgcG9zc2EgYSB2aXIgcGFydGljaXBhciwgc2VtIMO0bnVzIGFsdXNpdm8gYW9zIGRpcmVpdG9zIGF1dG9yYWlzLCBhIHTDrXR1bG8gZGUgZGl2dWxnYcOnw6NvIGRhIHByb2R1w6fDo28gY2llbnTDrWZpY2EuCg==Biblioteca Digital de Teses e Dissertaçõeshttp://tede2.pucrs.br/tede2/PRIhttps://tede2.pucrs.br/oai/requestbiblioteca.central@pucrs.br\|\|opendoar:2016-08-04T23:00:51Biblioteca Digital de Teses e Dissertações da PUC_RS - Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)false
dc.title.por.fl_str_mv	Fast recovery in parallel state machine replication
title	Fast recovery in parallel state machine replication
spellingShingle	Fast recovery in parallel state machine replication Mendizabal, Odorico Machado PROCESSAMENTO DISTRIBU?DO TOLER?NCIA A FALHAS (INFORM?TICA) INFORM?TICA CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short	Fast recovery in parallel state machine replication
title_full	Fast recovery in parallel state machine replication
title_fullStr	Fast recovery in parallel state machine replication
title_full_unstemmed	Fast recovery in parallel state machine replication
title_sort	Fast recovery in parallel state machine replication
author	Mendizabal, Odorico Machado
author_facet	Mendizabal, Odorico Machado
author_role	author
dc.contributor.advisor1.fl_str_mv	Dotti, Fernando Lu?s
dc.contributor.advisor1ID.fl_str_mv	502.796.290-87
dc.contributor.advisor1Lattes.fl_str_mv	http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4782513J6
dc.contributor.advisor-co1.fl_str_mv	Pedone, Fernando
dc.contributor.advisor-co1Lattes.fl_str_mv	http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K8164827J6
dc.contributor.authorID.fl_str_mv	978.941.170-72
dc.contributor.authorLattes.fl_str_mv	http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4777868A7
dc.contributor.author.fl_str_mv	Mendizabal, Odorico Machado
contributor_str_mv	Dotti, Fernando Lu?s Pedone, Fernando
dc.subject.por.fl_str_mv	PROCESSAMENTO DISTRIBU?DO TOLER?NCIA A FALHAS (INFORM?TICA) INFORM?TICA
topic	PROCESSAMENTO DISTRIBU?DO TOLER?NCIA A FALHAS (INFORM?TICA) INFORM?TICA CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
dc.subject.cnpq.fl_str_mv	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description	A well-established technique used to design fault-tolerant systems is state machine replication. In part, this is explained by the simplicity of the approach and its strong consistency guarantees. The traditional state machine replication model builds on the sequential execution of requests to ensure consistency among the replicas. Sequentiality of execution, however, threatens the scalability of replicas. Recently, some proposals have suggested parallelizing the execution of replicas to achieve higher performance. Despite the success of parallel state machine replication in accomplishing high performance, the implication of such models on the recovery is mostly left unaddressed. Even for the traditional state machine replication approach, relatively few studies have considered the issues involved in recovering faulty replicas. The motivation of this thesis is clarifying the challenges and performance implications involved in checkpointing and recovery for parallel state machine replication. The thesis also aims to advance the state-of-the-art by proposing novel algorithms for checkpointing and recovery in the context of parallel state machine replication. Performing checkpoints efficiently in such parallel models is more challenging than in classic state machine replication because the checkpoint operation must account for the execution of concurrent commands. In this thesis, we review checkpointing techniques for parallel approaches to state machine replication and compare their impact on performance through simulation. Furthermore, we propose two checkpoint techniques for one of these parallel models. Recovering a replica requires (a) retrieving and installing an up-to-date replica checkpoint, and (b) restoring and re-executing the log of commands not reflected in the checkpoint. Parallel state machine replication render recovery particularly challenging since throughput under normal execution (i.e., in the absence of failures) is very high. Consequently, the log of commands that need to be applied until the replica is available is typically large, which delays the recovery. We present two novel techniques to optimize recovery in parallel state machine replication. The first technique allows new commands to execute concurrently with the execution of logged commands, before replicas are completely updated. The second technique introduces ondemand state recovery, which allows segments of a checkpoint to be recovered concurrently. We experimentally assess the performance of our recovery techniques using a full-fledged parallel state machine replication prototype and compare the performance of these techniques to traditional recovery mechanisms under different scenarios.
publishDate	2016
dc.date.accessioned.fl_str_mv	2016-08-04T16:39:32Z
dc.date.issued.fl_str_mv	2016-05-16
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://tede2.pucrs.br/tede2/handle/tede/6879
url	http://tede2.pucrs.br/tede2/handle/tede/6879
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.program.fl_str_mv	1974996533081274470
dc.relation.confidence.fl_str_mv	600 600 600
dc.relation.department.fl_str_mv	-3008542510401149144
dc.relation.cnpq.fl_str_mv	3671711205811204509
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Pontif?cia Universidade Cat?lica do Rio Grande do Sul
dc.publisher.program.fl_str_mv	Programa de P?s-Gradua??o em Ci?ncia da Computa??o
dc.publisher.initials.fl_str_mv	PUCRS
dc.publisher.country.fl_str_mv	Brasil
dc.publisher.department.fl_str_mv	Faculdade de Inform?tica
publisher.none.fl_str_mv	Pontif?cia Universidade Cat?lica do Rio Grande do Sul
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da PUC_RS instname:Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS) instacron:PUC_RS
instname_str	Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)
instacron_str	PUC_RS
institution	PUC_RS
reponame_str	Biblioteca Digital de Teses e Dissertações da PUC_RS
collection	Biblioteca Digital de Teses e Dissertações da PUC_RS
bitstream.url.fl_str_mv	http://tede2.pucrs.br/tede2/bitstream/tede/6879/5/TES_ODORICO_MACHADO_MENDIZABAL_COMPLETO.pdf.jpg http://tede2.pucrs.br/tede2/bitstream/tede/6879/4/TES_ODORICO_MACHADO_MENDIZABAL_COMPLETO.pdf.txt http://tede2.pucrs.br/tede2/bitstream/tede/6879/3/license.txt http://tede2.pucrs.br/tede2/bitstream/tede/6879/2/TES_ODORICO_MACHADO_MENDIZABAL_COMPLETO.pdf
bitstream.checksum.fl_str_mv	eca6bbd41d643644e781486b18159dd6 19c74684056816f30763f73436eb70c1 5a9d6006225b368ef605ba16b4f6d1be 8ab2360ff12ca83b15b415cba7eda7de
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5 MD5
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da PUC_RS - Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)
repository.mail.fl_str_mv	biblioteca.central@pucrs.br\|\|
_version_	1796793221642190848

Fast recovery in parallel state machine replication

Registros relacionados