This article uses an example to describe the necessary steps involved in exchanging a defective drive in a software RAID (mdadm
).
IMPORTANT NOTE: All commands are just examples. You should adjust them accordingly!
Example scenario
Here is the example configuration:
# cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sda4[0] sdb4[1]
1822442815 blocks super 1.2 [2/2] [UU]
md2 : active raid1 sda3[0] sdb3[1]
1073740664 blocks super 1.2 [2/2] [UU]
md1 : active raid1 sda2[0] sdb2[1]
524276 blocks super 1.2 [2/2] [UU]
md0 : active raid1 sda1[0] sdb1[1]
33553336 blocks super 1.2 [2/2] [UU]
unused devices: <none>
There are four partitions in total:
- /dev/md0 as swap
- /dev/md1 as /boot
- /dev/md2 as /
- /dev/md3 as /home
/dev/sdb
is the defective drive in this case. A missing or defective drive is shown by [U_]
and/or [_U]
. If the RAID array is intact, it shows [UU]
.
# cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sda4[0] sdb4[1](F)
1822442815 blocks super 1.2 [2/1] [U_]
md2 : active raid1 sda3[0] sdb3[1](F)
1073740664 blocks super 1.2 [2/1] [U_]
md1 : active raid1 sda2[0] sdb2[1](F)
524276 blocks super 1.2 [2/1] [U_]
md0 : active raid1 sda1[0] sdb1[1](F)
33553336 blocks super 1.2 [2/1] [U_]
unused devices: <none>
You can perform the changes to the software RAID while the system is running. If proc/mdstat
shows that the drive is failing, like in the example here, then you can make an appointment with our support technicians to replace the drive (see further below).
# cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sda4[0]
1822442815 blocks super 1.2 [2/1] [U_]
md2 : active raid1 sda3[0]
1073740664 blocks super 1.2 [2/1] [U_]
md1 : active raid1 sda2[0]
524276 blocks super 1.2 [2/1] [U_]
md0 : active raid1 sda1[0]
33553336 blocks super 1.2 [2/1] [U_]
unused devices: <none>
Removal of the defective drive
Before you can add a new drive, you need to first remove the old defective drive from the RAID array. You need to do this for each individual partition.
# mdadm /dev/md0 -r /dev/sdb1
# mdadm /dev/md1 -r /dev/sdb2
# mdadm /dev/md2 -r /dev/sdb3
# mdadm /dev/md3 -r /dev/sdb4
The following command shows the drives that are part of an array:
# mdadm --detail /dev/md0
In some cases, a drive may only be partly defective, so for example, only /dev/md0
is in the [U_]
state, whereas all other devices are in the [UU]
state. In this case, the command
# mdadm /dev/md1 -r /dev/sdb2
fails because the /dev/md1
array is ok.
In this event, you need to execute the command
# mdadm --manage /dev/md1 --fail /dev/sdb2
first to move the RAID into [U_]
status.
Arranging an appointment with the support team to exchange the defective drive
To exchange the defective drive, you need to make an appointment with the support team in advance. The support team will need to take the server off-line for a short time.
Preparing the new drive
Both drives in the array need to have the exact same partitioning. Depending on the partition table type you are using (MBR
or GPT
), you need to use appropriate utilities to copy the partition table. The GPT partition table is usually used in drives that are larger than 2TiB (for example, 3TB HDDs in EX4
and EX6
).
Backing up the MBR/GPT
Before copying the MBR/GPT to a new drive, you need to back it up. That way, if something goes wrong during the copying, you will still be able to restore the original.
Backup with MBR
sfdisk --dump /dev/sda > sda_parttable_mbr.bak
sfdisk --dump /dev/sdb > sdb_parttable_mbr.bak
sfdisk --dump /dev/sdX > sdX_parttable_mbr.bak
Restore with MBR
sfdisk /dev/sda < sda_parttable_mbr.bak
sfdisk /dev/sdb < sdb_parttable_mbr.bak
sfdisk /dev/sdX < sdX_parttable_mbr.bak
Backup with GPT
sgdisk --backup=sda_parttable_gpt.bak /dev/sda
sgdisk --backup=sdb_parttable_gpt.bak /dev/sdb
sgdisk --backup=sdX_parttable_gpt.bak /dev/sdX
Restore with GPT
sgdisk --load-backup=sda_parttable_gpt.bak /dev/sda
sgdisk --load-backup=sdb_parttable_gpt.bak /dev/sdb
sgdisk --load-backup=sdX_parttable_gpt.bak /dev/sdX
Drives with GPT
There are several redundant copies of the GUID partition table (GPT) stored on the drive, so you need to use tools that support GPT
(for example parted
or GPT fdisk
) to edit the table. You can use the sgdisk
tool from GPT fdisk to easily copy the partition table to a new drive. Here’s an example of copying the partition table from sda to sdb:
sgdisk --backup=sda_parttable_gpt.bak /dev/sda
sgdisk --load-backup=sda_parttable_gpt.bak /dev/sdb
You then need to assign the drive to a new random UUID
:
sgdisk -G /dev/sdb
After this, you can add the drive to the array. As a final step, you need to install the bootloader.
Drives with MBR
You can simply copy the partition table to a new drive using sfdisk
# sfdisk -d /dev/sda | sfdisk /dev/sdb
where /dev/sda
is the source drive and /dev/sdb
is the target drive.
(Optional): If the partitions are not detected by the system, then the partition table has to be reread from the kernel:
# blockdev --rereadpt /dev/sdb
Naturally, you can create the partitions manually using fdisk
, cfdisk
or other tools. The partitions should be Linux raid autodetect (ID fd)
types.
Integration of the new drive
Once you have removed the defective drive and installed the new one, you need to integrate it into the RAID array. You need to do this for each partition.
# mdadm /dev/md0 -a /dev/sdb1
# mdadm /dev/md1 -a /dev/sdb2
# mdadm /dev/md2 -a /dev/sdb3
# mdadm /dev/md3 -a /dev/sdb4
The new drive is now part of the array and will be synchronized. Depending on the size of the partitions, this procedure can take some time. You can check the status of the synchronization using cat /proc/mdstat
.
# cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sdb4[1] sda4[0]
1028096 blocks [2/2] [UU]
[==========>..........] resync = 50.0% (514048/1028096) finish=97.3min speed=65787K/sec
md2 : active raid1 sdb3[1] sda3[0]
208768 blocks [2/2] [UU]
md1 : active raid1 sdb2[1] sda2[0]
2104448 blocks [2/2] [UU]
md0 : active raid1 sdb1[1] sda1[0]
208768 blocks [2/2] [UU]
unused devices: <none>
Bootloader installation
Since the serial number of the disk changed, we need to generate a new device map with GRUB2:
grub-mkdevicemap -n
If you are doing this repair in a booted system, then for GRUB2, running grub-install
on the new drive is enough. For example:
grub-install /dev/sdb
In Grub1 (grub-legacy
), depending on which drive was defective, you may need to do more steps.
- Start the GRUB console:
grub
- Specify the partition where
/boot
is located:root (hd0,1) (/dev/sda2 = (hd0,1))
- Install the bootloader in MBR:
setup (hd0)
- To also install the bootloader on the second drive:
- Map the second drive as
hd0
:device (hd0) /dev/sdb
- Repeat steps 2 and 3 exactly (don’t change the commands!)
- Map the second drive as
- Exit the GRUB console:
quit
Probing devices to guess BIOS drives. This may take a long time.
GNU GRUB version 0.97 (640K lower / 3072K upper memory)
[ Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists the possible
completions of a device/filename.]
grub> device (hd0) /dev/sdb
device (hd0) /dev/sdb
grub> root (hd0,1)
root (hd0,1)
Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd0)
setup (hd0)
Checking if "/boot/grub/stage1" exists... yes
Checking if "/boot/grub/stage2" exists... yes
Checking if "/boot/grub/e2fs_stage1_5" exists... yes
Running "embed /boot/grub/e2fs_stage1_5 (hd0)"... 26 sectors are embedded.
succeeded
Running "install /boot/grub/stage1 (hd0) (hd0)1+26 p (hd0,1)/boot/grub/stage2 /boot/grub/grub.conf"... succeeded
Done.
grub> quit
#
The installed system also perform the following rebinds:
mount --bind /dev /mnt/dev
mount --bind /proc /mnt/proc
mount --bind /sys /mnt/sys
You need to then do all the above GRUB installation steps in the chroot
environment. You can safely ignore the warning grub-install couldn't find physical volumes
.