Ben Stienstra

Linux, Unix, network, radio and more...

User Tools

Site Tools


mdadm_raid_1_replace_disk

~~TOC~~

mdadm - replace failed RAID1 disk

Check and remove disk

  • Check status:
    # cat /proc/mdstat
    Personalities : [raid1]
    md125 : active raid1 sdb3[1]
          450747264 blocks super 1.2 [2/1] [_U]
    
    md126 : active raid1 sdb2[1]
          36949888 blocks super 1.2 [2/1] [_U]
          bitmap: 1/1 pages [4KB], 65536KB chunk
    
    md127 : active raid1 sdb1[1]
          524224 blocks super 1.0 [2/1] [_U]
    
    unused devices: <none>
  • /dev/sda seems missing, check dmesg:
    # dmesg |grep sda
    [2212380.750253] end_request: I/O error, dev sda, sector 2048
    [2212380.750263] md/raid1:md127: Disk failure on sda1, disabling device.
  • Or with smartctl:
    # smartctl /dev/sda
    smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-229.1.2.el7.x86_64] (local build)
    Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
    
    Smartctl open device: /dev/sda failed: No such device
  • Remove the drive from the array if it isn't already removed. Seems md already diabled my drive:
    messages-20150504.gz:May  3 00:41:06 kvm2 kernel: md/raid1:md126: Disk failure on sda2, disabling device.
    messages-20150504.gz:May  3 00:41:06 kvm2 kernel: md/raid1:md125: Disk failure on sda3, disabling device.
    messages-20150504.gz:May  3 01:02:01 kvm2 kernel: md/raid1:md127: Disk failure on sda1, disabling device.
  • Check the serial number for the good drive:
    ]# smartctl -a /dev/sdb |grep -i serial
    Serial Number:    9WJ0MGBM

Replace drive

  • Replace the failed drive, double check the serial number on the drive.
  • Copy boot sector / partition table to new drive:
    sfdisk -d /dev/sdb | sfdisk /dev/sda

Rebuild RAID1 array

  • Put new drive in array:
    mdadm /dev/md125 -a /dev/sda3
    mdadm /dev/md126 -a /dev/sda2
    mdadm /dev/md127 -a /dev/sda1
    grub2-install --recheck /dev/sda
  • Watch /proc/mdstat:
    # cat /proc/mdstat 
    Personalities : [raid1] 
    md125 : active raid1 sda3[2] sdb3[1]
          450747264 blocks super 1.2 [2/1] [_U]
          [>....................]  recovery =  0.0% (121856/450747264) finish=6938.0min speed=1082K/sec
          
    md126 : active raid1 sda2[2] sdb2[1]
          36949888 blocks super 1.2 [2/1] [_U]
            resync=DELAYED
          bitmap: 1/1 pages [4KB], 65536KB chunk
    
    md127 : active raid1 sda1[2] sdb1[1]
          524224 blocks super 1.0 [2/1] [_U]
            resync=DELAYED
          
    unused devices: <none>

Speed up rebuild

  • Show current min/max limit:
    sysctl dev.raid.speed_limit_min
    sysctl dev.raid.speed_limit_max
  • Increase speed:
    sysctl -w dev.raid.speed_limit_min=100000
mdadm_raid_1_replace_disk.txt · Last modified: 2015/05/14 17:12 by admin