Old RAID controllers may destroy your data

This just happened to a friend of mine. He had two 3 TB disks in a RAID-1 configuration, attached to an older Marvell RAID controller. (But the manufacturer really doesn't matter). Suddenly Windows started complaining about being unable to access folders and then the disk completely went away. Disk management just shows it as "unformatted". However both disks report as OK. What happened?

The problem is that some older controllers don't know how to handle drives larger than 2TB and they deal with it the worst way possible: A wrap around. So if the system accesses a sector exactly 2TB from the start of the disk it actually gets the first sector, the Master Boot Record. And this not just happens during reading but also during writing. The worst thing about this is that this problem doesn't show up immediately, only after you've copied around 2TB worth of data to the disk, which you will suddenly loose once you hit that watermark as important filesystem information is being overwritten :( First folders and files will become unreadable then the partition isn't recognized anymore.

If you have such a setup there are easy way to test this: If you've just bought the disks try filling them up all the way with some data you can afford to loose, like take any large HD movie and copy it to the disk a few hundred times. If you have more than 2TB on your disk and it is still there after a reboot, you're probably in the clear.

To be 100% certain you have to examine the disk.

Linux

Open a terminal and become root (sudo -i), then enter the following, but replace /dev/sda with your RAID array. If you don't know which disk to test try "cat /proc/partitions" to get a list:

dd if=/dev/sda of=/tmp/part1 bs=512 count=10
dd if=/dev/sda of=/tmp/part2 bs=512 count=10 skip=4294967296

If you get a message like this from the second "dd" command then everything's OK as your disk is smaller than 2TB (or you have the wrong disk, check again):

dd: ‘/dev/sda’: cannot skip: Invalid argument

If it succeded compare the two with this command:

diff /tmp/part1 /tmp/part2

The diff should print out something like this:

Binary files /tmp/part1 and /tmp/part2 differ

If diff doesn't print anything then it means that the files are identical and your system most likely suffers from the wrap around (read below). Once you're done remove the temporary files:

rm /tmp/part1 /tmp/part2

Windows

On Windows we have to get some extra software. Download the HxD Hex-Editor (Freeware) from here: https://mh-nexus.de/en/downloads.php

Then click the button to open your disk. Use the disks labeled "Hard disk 1" or "Hard disk 2", NOT the drive (C: or similar). Open the disk in read only mode. Make a screenshot of the first sector, then type in sector number 4294967296 and press enter. Compare the contents from this sector to the one from your screenshot. If they are identical your system most likely suffers from the wrap around.

My system has this bug. What next?

First of all, make a backup. Seriously, even if you are far from the 2TB mark, copy everything to a new, preferrably external, disk. Don't postpone it, buy a USB disk immediately if you don't have one with enough capacity to spare.

Then, wipe the disks and decide what to do next:

  • If it's a PCI/PCIe controller, throw it away and buy a new one. Best, break the controller card in half so that no one accidentally uses it anymore. (It's what I did to my controller as this happened to me a couple of years ago as well)
  • If it's on the mainboard and you don't want to buy a new mainboard, buy a (new!) PCIe controller card and attach the disks to it instead
  • Or, you could also put the disks in USB 3.0 enclosures as this will bypass the faulty controller.
  • If you absolutely can't afford any of the previous solutions, delete the partition and create one with a size of just below 2TB. The rest of the capacity must stay unpartitioned and can not be used for anything.

 

My system has this bug and I've lost data. What next?

Well, that's bad. Really bad. Recovery is very hard and you will certainly not get 100% of your data back. I can recommend OnTrack EasyRecovery but if you had very valuable data on it better give it to a recovery specialist. 

 

Add new comment