Please use this identifier to cite or link to this item: http://localhost:8081/xmlui/handle/123456789/1759
Title: EFFICIENT CHECKPOINTING PROTOCOLS FOR DISTRIBUTED COMPUTING SYSTEMS
Authors: Kumar, Lalit
Keywords: ELECTRONICS AND COMPUTER ENGINEERING;CHECKPOINTING PROTOCOLS;DISTRIBUTED COMPUTING SYSTEMS;COMPUTING SYSTEM
Issue Date: 2002
Abstract: Computing systems are vulnerable to many failure modes. In distributed systems likelihood of failure increases with the number of processes and a single failure often renders the entire system unusable. Checkpointing and rollback recovery is a common technique used for increasing the system reliability against various anticipated and unanticipated failures. Checkpointing can be used both for centralized and distributed system. Recent integration of mobile computing devices to the general distributed systems elevates the problem of checkpointing distributed systems manifold and renders the conventional checkpointing protocols useless. The research work in the area of uniprocessor checkpointing primarily focuses on the performance evaluation of such systems i.e., finding optimal checkpoint interval and average number of transactions in the system. We studied the performance of a centralized transaction system with checkpointing, failures and recovery and proposed an exact solution for finding average number of transactions and optimal checkpoint interval in such system under more general assumptions. The proposed solutions use generating function and spectral expansion methods. For distributed computing systems, the focus is on the design of efficient nonblocking coordinated checkpointing protocols. These protocols require extra information to be piggybacked on each message. We designed a non-blocking coordinated checkpointing protocol for distributed systems that minimizes piggybacked information on each message. The main emphasis of our research was on designing a checkpointing protocol for distributed systems with mobile hosts. Coordinated checkpointing can be useful for MDCSs provided only minimum number of processes checkpoint and the protocol is nonblocking. But, if minimum process checkpointing is combined with non-blocking, the resulting protocol may force many useless checkpoints that are discarded at the completion of checkpoint. We designed an efficient coordinated checkpointing protocol that is non-blocking, requires coordination of only minimum number of processes and reduces the overhead of useless checkpoints greatly. The simulation studies show that our protocol reduces the number of useless checkpoints almost to zero. We also investigated the possibility of an efficient probabilistic quasi-synchronous checkpointing protocol for MDCS that reduces checkpointing overheads. Quasi- synchronous checkpointing is non-blocking and requires no extra synchronization messages but there is no control on number of checkpoints that are forced by such protocols. In case of a fault, only a small subset of these checkpoints is used for recovery. We designed a checkpointing protocol that allows Mobile Hosts to skip some of the checkpoints. This reduces the checkpointing overhead at mobile nodes. Additionally messages for mobile nodes are logged on MSSs. In case of a fault, the processes roll back to a consistent global state. If the checkpoints in the consistent global state contain skipped checkpoints then these checkpoints are constructed using previous actual checkpoints and message logs. Simulation studies show that out of the total checkpoints needed for recovery less than 20% are dummy checkpoints, whereas more than 50% of total checkpoints are skipped. The simulations were carried out on Pentium IV machine on LINUX and WINDOW environments.
URI: http://hdl.handle.net/123456789/1759
Other Identifiers: Ph.D
Research Supervisor/ Guide: Joshi, R. C.
Misra, Manoj
metadata.dc.type: Doctoral Thesis
Appears in Collections:DOCTORAL THESES (E & C)

Files in This Item:
File Description SizeFormat 
EFFICIENT CHECKPOINTING PROTOCOLS FOR DISTRIBUTED COMPUTING SYSTEM.pdf6.97 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.