People think of education as something they can finish. I. Asimov
Imagine you want to store two files called a and b. Assume each file has size 1GB and we are going to use 3 storage disks. Also assume that each disk has capacity 1GB.
How can you store so that if any one of the 3 disks fails, you call still recover both your files?
The answer is of course, you store a and b in the first two disks and their bitwise XOR in the third disk. The bitwise XOR is called a parity and has size exactly equal to the size of a and b. If we do arithmetic modulo 2, the bitwise XOR is written as a+b.
To introduce some notation, n=3 disks and k=2 files and this is called an MDS code. Also called a single parity (and used quite a bit, for example in RAID 5).
Clearly, this code has the property it can tolerate any single disk failure. This means that even after one disk fails, a data collector (shown as a laptop) can communicate the information from the remaining two nodes and reconstruct the file. This is shown here:
In the picture it is assumed that each block (what we called file before, lets call them blocks from now on) has size GB and the arrows show how much information is communicated. This is simply reconstruction of the whole data object. Clearly reconstruction costs GB of communication in this example.
We are now interested in repair which is slightly different. For repair, a new node is no longer interested in reconstructing all the data, but rather only a single lost block. This is also called the disk rebuild problem.
Two different notions of repair are shown. Exact repair corresponds to building exactly the lost block. We used to call that
systematic repair but we don't do that anymore.
Functional Repair corresponds to simply reconstructing a new block that combined with the existing ones still forms an (n,k) MDS code. In linear algebra terms this means that the new block is in general position (maximally linearly independent) with respect to the existing blocks. Exact repair is a special case of functional repair and is strictly harder.
While dealing with repair, apart from the parameters n and k, a third parameter is used to specify the number of nodes a replacement node connects to during the repair process. In the following examples, the parameter is assumed to be , i.e., the node replacing a failed node connects to all the remaining nodes for repair.
For the case of a single parity code, repairing also requires communication of GB– both blocks have to be communicated, even when one block is really needed. Can we do better ? Not for a single parity but for two parities we can!
Beyond the single parity case, codes can be constructed to tolerate any number of failures if enough redundancy is allowed. The standard way to construct these parities is to use a Reed-Solomon code, but we won't worry about this yet. Here we show an Evenodd code ((by Blaum and Bruck) which uses only XORing (i.e. it is a binary code). Here we have storage nodes (i.e. disks) and blocks of stored information. Notice that we can tolerate any node failures which means that this is a MDS code. A node failing means that both blocks in that node are lost.
One key trick we used is that each node is now storing two blocks. This sub-packetization is necessary to create binary MDS codes with more than one parity. Assuming each block has size GB so that each node is storing GB, reconstructing the whole file would, of course, require blocks of total size GB of communication.
The key idea is that repairing a node failure can be done by communicating only 1.5GB. See .
The next question is perhaps we could further reduce the repair communication. Maybe we can further sub-packetize and make each node store 100 sub-blocks and perhaps it will be possible to make the repair communication approach 1GB. It certainly cannot get below 1GB since we are reconstructing 1GB of useful information. It turns out that for nodes and blocks, 1.5 blocks is the information theoretic minimum to repair communication. We call this the cut-set bound and in general it is .
Interestingly Evenodd codes can be repaired by matching the cutset bound for the , case. Another case of repairing another node failure is shown here:
Observe that in this case the two packets at the second node are XORed and is communicated. This idea of making linear combinations locally at the nodes and then transmitting linear combinations allows the reduction in repair bandwidth.
The concept of Regenerating Codes was proposed in  where it is shown conceptually that an increase in the storage per node as compared to the MDS case can lower the amount of data downloaded for repair. Also in  , lower bounds on the repair communication are derived using cut-set arguments.
The case when the code is MDS is called the Minimum Storage Regenerating (MSR) case.  showed that for rate the cut-set bound can be matched when repairing the systematic parts of the code. Subsequently  developed a complete framework that guarantees exact regeneration of all nodes while achieving the cut-set bound for MSR codes. The results in  and  cover the case of .
The explicit codes presented in  cover the case of . These codes, constructed using a new Product-Matrix framework, is the first exact-MSR codes that do not have the constraint of requiring a replacement node to download data from all the remaining nodes during repair (i.e., allow ). This property is valuable in geographically distributed / peer-to-peer systems where connectivity may be low, and when the number of nodes in the system may vary with time and need not be restricted by the parameter .
The high rate case of and is covered in  ,  and  when . Such codes are especially useful in co-located storage systems which typically have low redundancy levels and a fixed number of nodes. The special case of in the high rate regime was earlier covered in  ,  and  .  and  show asymptotic existence of exact-MSR codes for all , while  shows the non-existence of scalar linear exact-MSR codes for the high rate case of .
The case when the amount of download for repair is minimized (by allowing greater storage than MDS codes) is the Minimum Bandwidth Regenerating (MBR) case. Explicit codes for are constructed in  , and explicit codes for all values of the parameters are constructed in  . The latter code, constructed using the Product-Matrix framework, again works for the case when .
 shows the impossibility of constructing exact-repair codes meeting the cut-set lower bound for essentially all cases between MSR and MBR.
Apart from the above, there are many other codes in the literature that address different aspects of the repair problem. A list of code constructions addressing the repair problem in distributed storage can be found here.
In multimedia we include some recent tutorial videos.