Description
Proposal
I have an idea of how to handle auto-repair on read and other features that use Btrfs's error detection of files.
As the time of writing this, Btrfs supports the profiles DUP, RAID1-like, and (experimentally) RAID5/6. These profiles can be used on the data, metadata, and system block groups. From what I understand, these all function similarly where:
- Data is stored with 2 copies.
- During a read operation, if one file fails the checksum, the other can be used if it passes the checksum.
- During a read operation, if both files fail the checksum, the file is unrecoverable.
The problem is that if both files are corrupt (even if they are corrupt on different blocks), there is no way to recover the file. I would like to propose a new profile that uses 3 files to provide error detection AND error correction.
Execution
The execution is similar to the DUP profile, but with extra steps:
- Data is stored with 3 copies.
- During a read operation, if one file fails the checksum, any of the others can be used if they pass the checksum.
- During a read operation, if all 3 files fail the checksum, Btrfs attempts to make a new file to pass the checksum using the 3 corrupted files.
If all 3 files are corrupt, Btrfs attempts to make a new file by using the majority rule on a block-by-block basis -- Btrfs compares the first block of each file and uses the one that appears at least two times for the new file. Btrfs then compares the second block of each file and so on until all of the blocks are compared. In the end, Btrfs makes a new file that then gets checksummed just like the original 3 files; if this new file passes the checksum, it is used and the original 3 files are replaced with this new one. If it fails, then it's unrecoverable.
If all three blocks are the same, use that block for the new file.
If two blocks are the same, but one is different, use a block from the matching two for the new file.
If all three blocks are different, Btrfs can stop here because the file is deemed unrecoverable.
Pros
- Superior data corruption resistance. Unlike the DUP profile, this is not just adding another file (which is just more corruption detection), this is adding logic to repair files. This means that all 3 files can be corrupt and it's still possible to recover the data. The only time data becomes unrecoverable is when more than one block is corrupt at the same location -- which is far less likely to happen than the file having any corruption.
- Performance should be the same as DUP when not making a new file.
Cons
- We can only use 1/3 of the original space.
Additional comments
I figured I'd bring this idea to the Btrfs project because I think that sacrificing 2/3 of storage for error correction is better than sacrificing 1/2 of storage for error detection. I don't see too many people using this for the data block group, but I can 100% see this be the default for the metadata and system block groups as they're already set to DUP by default for single drives.