Introduction
An article, A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like Systems, describes the use of a varient of Raid-Solomon for use in certain applications. The idea is that Raid-Solomon can be used relatively simply when you know which data is lost; normally it is used in devices such as CD-ROMS, where it is impossible to know which data is correct and which was mangled. But in "RAID-like Systems," it is known which device failed, so data recovery is simpler. In particular, I can use this algorithm for recovering data from multiple computers when some computers are dead.
RSraid
Reed-Solomon coding is designed for use in systems where data loss is expected, and it is impossible to tell where the data is lost. However, in this project, it will be known which data was lost (which computer died), and so the full power is not needed. However, a simpler version, dubbed "RSraid" or "Reed-Solomon for Raid-like devices" is put forward in [1].
The basic idea is to put the data, d0 through dn - 1 into an n x 1 matrix, and multiply some other (n + m) x n matrix by it, resulting in another m x n matrix. Each row in this matrix represents the data to be sent to a certain device. As a result of this, if n devices out of the n + m are available, an n x n matrix can be constructed and the resulting equation can be solved for the original data. There are two tricky parts to this: that multiplying matrices tends to leads to large numbers, and how to find an initial n + m x n matrix to multiply by so that the data can be recovered from any n rows of the final matrix.
The first problem, that of large numbers, is surmounted by using galois fields, also known as finite fields. They are most rigorously thought of as polynomials up to a certain degree, taken modulus some prime polynomial. However, then can be efficiently implemently using binary arithmetic, where "addition" and "subtraction" are both done as binary XOR, and multiplication is done using a table of "logarithms". While this is an odd way to treat numbers, the result is a consistent set of rules that can be used for multiplication and addition of numbers less than 2n with results also in this range the perfect solution for multiplying matricies on binary-based computers.
The second is solved as shown in [1] - using the matrix where ai,j$ is equivalent to ij. This matrix is calculated in the finite field, giving a set of values all less than 2n. Then, to speed computation, it is reduced by a Gaussian elimination type so that the top n x n matrix is the identity matrix; this means that n out of the n+m rows are simply the original data; this has a significant savings in computing time.
Communication
The other part is to distribute these pieces over various computers - this will be done by sockets, with some simple system - probably simple transmission of data with a short prefix and suffix to signal beginning and end of data.
References
A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like Systems
The Distributed Internet Backup System