Software RAID Howto

Redhat Enterprise 3 doesn't contain a good guide on how to install and manage a RHEL3 system to a pair of mirrored disks using software RAID. Here's is my guide. This guide should work equally well for the clones of RHEL, e.g. Whitebox linux, CentOS, Tao Linux ...

Installing RHEL

My hardware for installing to was a Pentium 4 machine, with two 80GB Maxtor IDE hard disks, 1GB RAM. I booted RHEL off disk 1, and started working through the installer.

At the point where disk partitioning takes place, I chose Disk Druid (instead of fdisk / auto) to partition the disks. I created two 100MB software RAID primary paritions, one on each disk, two 512MB linux swap partitions, two 79GB paritions to fill the rest of the disk. I made the two 100MB partitions a single RAID 1 device, mounted on /boot, and the other two a RAID 1 device mounted on /. The rest of the install proceeds as normal.

When the machine reboots back into RHEL, it will have working software RAID, however the boot loader will only be installed on the first disk (/dev/hda). To install this on the second disk (/dev/hdc), we need to run grub.

$ grub

grub> device (hd0) /dev/hdc
grub> root (hd0,0)
 Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd0)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd0)"...  15 sectors are embedded.
 succeeded
 Running "install /grub/stage1 (hd0) (hd0)1+15 p (hd0,0)/grub/stage2
 /grub/grub.conf"... succeeded
 Done.
  

The next thing to do is to take a backup of the parition table on the disk - you will need this to restore to an extra disk. You can get this by running fdisk, and picking option 'p' (print partition table).


/dev/hda
Device	Boot	Start	End	Blocks	id	System
/dev/hda1	*	1 	203 	102280+	fd	Linux raid autodetect
/dev/hda2		204	1243	524160	82	Linux swap
/dev/hda3		1244	158816	79416792	fd	Linux raid autodetect

Monitoring the RAID array

This is for my setup with two disks, /dev/hda, /dev/hdc both with identical data on.

$ cat /proc/mdstat
Personalities : [raid1] 
read_ahead 1024 sectors
Event: 1                   
md0 : active raid1 hda2[0] hdc2[1]
      119925120 blocks [2/2] [UU]
...	  

This will give the status of the raid array, if both disks are operating it looks like this

md0 : active raid1 hdc3[1] hda3[0]

If it's broken and only one disk is operating it looks like this

md0 : active raid1 hdc3[1]

If it's recovering from a failed disk it looks like this

md0 : active raid1 hda3[1] hdc[1]
....
[.>.........] recovery = 3% (.../...) finish=128min speed=10000k/sec
...

More information comes from

mdadm --query --detail /dev/md0

.... lots of stuff ...

Number	Major	Minor 	RaidDevice	State
0	0	0	0		faulty, removed
1	222	3	1		active sync /dev/hdc3

This tells us that device 0 is missing - device 1 is working fine.

In theory the mdmonitor

How to restore from a broken raid array

In this case /dev/hda has failed and I'm inserting a replacement disk. I start by rebooting the machine from CD on disk 1, and running the rescue mode by typing 'linux rescue' at the command prompt on the CD.

Do not mount any disks, or set up the network. You will be dropped into a command prompt.

Partition the new disk with the same partition table as the old disk. It is very important to make sure you partition the correct disk. You may wish to unplug the working disk during this step to insure yourself against user error.

$ fdisk /dev/hda

n (new)
p 1 (patition #1)
1 203 (start and end cylinders)
t 1 fd (set the partition type to linux raid)

n
p 2
204 1243
t 2 82 (set the partition type to linux swap)

n
p 3
1244 158816
t 3 fd (set the partition type to linux raid)

I then boot the machine from it's working disk. I then need to add the replacement disk into the array and trigger the rebuild.

mdadm --manage --add /dev/md0 /dev/hda3
mdadm --manage --add /dev/md1 /dev/hda1

The new disk has no boot sector - that's not covered by the RAID array. We need to write this back to the disk as earlier.

$ grub

grub> device (hd0) /dev/hdc
grub> root (hd0,0)
 Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd0)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd0)"...  15 sectors are embedded.
succeeded
 Running "install /grub/stage1 (hd0) (hd0)1+15 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.

At this point our system is fully restored.

Notes

It's entirely possible to do the recovery by booting from the working disk rather than a rescue CD. This increases the chance of accidently destroying all your data. I'd recommend not doing that, until you can perform a recovery with a CD without referencing this guide at any point.

Home Mythic Beasts, shell accounts, cvs
hosting, co-location, virtual servers