Distributed Storage
Evan Danaher
Abstract:
In many computer labs, there are large numbers of computers with unused drive
space. There may also be relatively large quantities of data to be backed up.
The goal of this project will be to develop a system for storing data
distributed over many computers, with enough redundancy so that data can still
be recovered if several of the machines are unavailable (due to inevitable
hardware failure). It could be enhanced to provide encryption or some other
means of privacy for this distributed data.
Distributed Storage
This project will distribute data over computers such that the data can be
recovered even if some machines are no longer available.
In order to efficiently use existing computer hardware and safely store data, a
system must be devised for storing that data. This project aims to provide
such a system.
The basic project would be to choose a method for redundantly storing the data,
implement it, and implement a method for distributing and retrieving the data
over multiple computers. If there is extra time, the project could be extended
to include privacy features, distributed compatation of the redundancy, or
other useful extras.
An article, A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like Systems, describes the use of a varient of Raid-Solomon for use in certain applications. The idea is that Raid-Solomon can be used relatively simply when you know which data is lost; normally it is used in devices such as CD-ROMS, where it is impossible to know which data is correct and which was mangled. But in "RAID-like Systems," it is known which device failed, so data recovery is simpler. In particular, I can use this algorithm for recovering data from multiple computers when some computers are dead.
- Learn about existing methods for redundant storage. One possibility for
this is Reed-Solomon encoding, a common method. Then this should be
implemented either using new code, or using an existing library.
- Once files can be stored redundantly, split the file up among smaller
files, such that the original data can be reconstructed for various subsets of
the files.
- Write code to distribute the files onto and retrieve them from different
computers.
- If time is left over, add additional features,
Ideally, this will result in a working system that could be applied to various labs, possibly also providing a basis for future improvements.
Distributed Storage
This document was generated using the
LaTeX2HTML translator Version 2K.1beta (1.48)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 -no-navigation proposal.tex
The translation was initiated by Evan Danaher on 2003-10-09
Evan Danaher
2003-10-09