Replacing a disk in Software RAID1 (Linux).

Tuesday November 28th, 2023
Posted by Igor Shats

In this article, we will describe the steps necessary to replace a faulty hard disk in a software RAID 1 array on various operating systems, such as Linux (CentOS, Debian, Ubuntu).

Problem Identification

To begin with, let’s understand the problem. You have a physical server with CentOS 7 installed on it, equipped with 2 HDDs of 2 TB each: /dev/sda and /dev/sdb. These disks are configured into a software RAID 1. Let’s assume that the disk sdb has failed. When you check the disk in the array, you’ll see the following:

# cat /proc/mdstat

Replacing a disk in Software RAID1 (Linux). zamena diska v software raid 1 linux

We have three arrays:

# /dev/md125 – /boot

# /dev/md126 – swap

# /dev/md127 – /

In this case, you can see that the disks are indeed configured in a RAID 1. When the array is healthy, it is displayed as [UU]. Since the disks are mirrored, each partition combines with its counterpart and is named accordingly. For example, md125 consists of sda2 and sdb2. In this case, md125 is /boot. You can get more detailed information about the disk layout using the following command:

# lsblk

Replacing a disk in Software RAID1 (Linux). zamena diska v software raid 1 linux2

If you want detailed information about the array and its contents, use the command:

# mdadm –detail /dev/md125

Replacing a disk in Software RAID1 (Linux). zamena diska v software raid 1 linux 3

Removing the Faulty Disk

To install a new disk in a RAID 1 array, you must first remove the faulty disk. This procedure is carried out for each partition.

# mdadm /dev/md125 -r /dev/sdb2

# mdadm /dev/md126 -r /dev/sdb1

# mdadm /dev/md127 -r /dev/sdb3

In some cases, the hard disk may be partially damaged. For example, the status is [U_] for the /dev/md127 array, while other arrays have a status of [UU]. In this case, you need to specify only one command:

# mdadm /dev/md127 -r /dev/sdb3

As a result, the other partitions will be displayed as /dev/sdb1 and /dev/sdb2, which are intact. After attempting to remove the partition from the array, you will see an error.

To correct this and remove them, you will need to execute the following commands:

# mdadm –manage /dev/md125 –fail /dev/sdb2

# mdadm –manage /dev/md126 –fail /dev/sdb1

This will change their status to [U_]. Continue the procedure as you did with the md127 array.

Check the disks and partitions included in the array to ensure that the disk has been fully removed:

# mdadm –detail /dev/md125

# mdadm –detail /dev/md126

# mdadm –detail /dev/md127

# cat /proc/mdstat

Now the disk is ready for replacement. You will need to submit a request through our ticket system to replace the disk and coordinate the timing of the work with a technician.

P.S. The server will be down for some time!

Preparing the New Disk

Determining the partition table (GPT, MBR) and transferring it to the new disk.

A new disk, when part of the array, must have exactly the same partitioning. Depending on the types of partition table used (GPT/MBR), you need to use the appropriate utilities to copy the partition table.

GPT – sgdisk

MBR – sfdisk

Since we have 2 TB HDDs, we will use the sgdisk utility. You can also see what exactly you will be copying to the second disk. Use the command:

# gdisk -l /dev/sda

Replacing a disk in Software RAID1 (Linux). zamena diska v software raid 1 linux 4

You can download the utility using your operating system’s repository. Depending on the OS, you need to specify the correct package manager.

CentOS: yum install sgdisk/sfdisk

Debian/Ubuntu: apt install sgdisk/sfdisk

Creating and Restoring MBR/GPT Backup

Before copying the partition table to the new disk, it is recommended to make a backup. In case of any problems, you can restore the original partition table.

For MBR

Create:

# sfsdisk –dump /dev/sdx > sdх_parttable_mbr.bak

Restore:

# sfdisk /dev/sdb < sdх_parttable_mbr.bak

For GPT

Create:

# sgdisk –backup=sdх_parttable_gpt.bak /dev/sda

Replacing a disk in Software RAID1 (Linux). zamena diska v software raid 1 linux 5

Restore:

# sgdisk –load-backup=sdх_parttable_gpt.bak /dev/sdb

sda – the disk from which the copy is made.

sdb – the disk onto which the copy of the table is loaded.

Adding a Hard Disk to the Array After Replacement

First, insert the copied partition table from the first disk into the new one using the command above. Once the faulty disk has been removed from the array, you can add the new one. This must be done for each partition.

# mdadm /dev/md125 -a /dev/sdb2

# mdadm /dev/md126 -a /dev/sdb1

# mdadm /dev/md127 -a /dev/sdb3

Now the new disk is part of the array. You can monitor the synchronization of the disks by entering the following command:

# cat /proc/mdstat

Replacing a disk in Software RAID1 (Linux). zamena diska v software raid 1 linux 7

Next, reboot the server, and you will see that all partitions are correctly mounted:

# lsblk

Replacing a disk in Software RAID1 (Linux). zamena diska v software raid 1 linux 8

Conclusion

Replacing a disk in a Software RAID 1 is a necessary procedure to maintain data security and integrity on both a Cloud KVM Server and a Dedicated Server. Before performing this procedure, make sure you have sufficient knowledge and experience, or seek assistance from a professional. Following the recommendations of your hosting provider and regularly replacing disks can prevent data loss and ensure uninterrupted operation of your business applications.