Please use this identifier to cite or link to this item: http://localhost:8081/xmlui/handle/123456789/2356
Authors: Dubey, Ashutosh
Issue Date: 2008
Abstract: Introduction and Statement of the Problem 1.1 Introduction Performance and reliability of the computer systems have increased in recent years but not for all the components at the same rate. Some of the components like processor and primary memory have improved faster. On the other hand the reliability of I/O systems has improved at a much lower rate. Still disk drives fail and their failure policies have undergone a rapid change. In the light of new failure policies proposed for the modem disk drives, many new kinds of errors and faults can occur inside the disk. The present popular file systems have not been able to cope up with these kinds of failures. In case of journaling file systems, they have shown to commit failed transactions in these cases. This is a bad practice for a file system as important data may be lost in doing so or even worse the file system can not be mounted later. Various solutions have been proposed to increase the robustness of the file systems [33]. Data duplication, using parity and checksums are the few very important and popular methods to prevent these faults and errors. 1.2 Motivation The importance of building dependable systems can not be overstated. One of the fundamental requirements in computer systems is to store and retrieve information reliably. Disk drives have been primarily used as a primary storage medium for several decades in many systems including but not limited to, personal computers, distributed file, systems, database systems, high end storage arrays, archival systems and mobile devices. Unfortunately, disk failures can occur. Traditionally, systems have been built with the assumption that disks operate in a "fail stop" manner [1]; within this classic model, the disks either are working perfectly, or fail absolutely and in an easily detectable manner. Based on this assumption, storage systems such as RAID arrays have been built to tolerate whole disk failures [2]. For example, a file system or database system can store its data on a RAID array and withstand failure of an entire disk drive. The fault model presented by modern disk drives, however, is much more complex. For example, modern drives can exhibit latent sector faults, where a block or set of blocks are inaccessible. Under latent sector fault, the sector fault occurs sometime in the past but the fault is detected only when the sector is accessed for storing or retrieving information. Blocks sometimes become corrupted and worse, this can happen silently without the disk being able to detect it. Finally, disks sometimes exhibit transient performance problems [3]. 1 Several reasons are responsible for these kinds of errors. One of the most important reasons is the tendency of packing more bits per square inch in the drive by the drive industry to increase the areal density and consequently improve the profit. The increase in areal density of the drive causes bit spillover on the adjacent tracks. Secondly the increased use of low end desktop drives worsens the reliability problem. Finally, amount of software used on storage stack has increased. A lot of bugs are usually present in the firmware present on the desktop drives, which cause many types of disk errors.[4] The above mentioned failures in the disk drives, trends that cause these failures and the need of reliable storage systems pose two important questions in front of us. 1) What measures are employed in the present file systems to handle these kinds of errors? And 2) how can we improve the robustness of the prevalent file systems? To answer these two questions forms the broad part of this research. We discuss various robustness measures for the present file systems and how they can be implemented. We design a parity based approach and implement it in the popular Ext3 file system as part of this dissertation. 1.3 Problem Statement The problem statement of this dissertation comprises of flowing components: 1. To aggregate knowledge from various sources about the failure models of the modern disk drives and to arrive at a more realistic Fail-Partial model. 2. To analyze the failure policy of the journaling file systems which are already robust to some extent. 3. To analyze the effectiveness of various robustness measures. 4. To design a framework to introduce these robustness measures in the prevalent journaling file systems. 5. To implement above mentioned framework to a popular journaling file system namely Ext3 under Linux Operating System. 1.4 Organization of the Thesis The rest of this dissertation is organized as follows. Chapter 2 gives the background of the problem and presents the literature review. In chapter 3, we present the common causes of disk failures following it with the Fail-Partial failure model. Chapter 4 gives a brief introduction to journaling file systems and we analyze the failure policies of these journaling file systems and how they handle them. In this chapter we also talk about various robustness measures that can be employed to journaling file systems.....
Other Identifiers: M.Tech
Research Supervisor/ Guide: Joshi, R. C.
metadata.dc.type: M.Tech Dessertation
Appears in Collections:MASTERS' DISSERTATIONS (E & C)

Files in This Item:
File Description SizeFormat 
ECDG22842.pdf3.52 MBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.