Help with recovering a botched raid 0 grow op with mdadm

Understanding and Recovering Data from a Faulty RAID 0 Array with mdadm: A Guide for Data Preservation

Introduction

Managing RAID configurations, especially array expansions or modifications, can be complex and fraught with risks. When working with RAID0 arrays, which strip data across multiple drives for performance, the failure or misconfiguration of even a single disk can jeopardize the entire data set. This article addresses a real-world scenario involving an attempted RAID0 array expansion with mdadm, a Linux software RAID management tool—highlighting potential recovery strategies and best practices.

Scenario Overview

The user initially configured a two-drive NVMe RAID0 array, providing approximately 2TB of storage. An attempt was made to add a third 2TB NVMe drive to grow the array using mdadm’s manual ‘grow’ operation. However, the third drive was faulty, leading to a kernel panic around 1.5% into the process.

Current Status

The system now perceives a three-drive configuration:

  • Two drives (designated as /dev/nvme1n1p1 and /dev/nvme0n1p1) recognize themselves as part of a four-drive RAID4 array.
  • The third drive appears unresponsive or damaged and is considered a writeoff.

It appears that the array is in a partially reshaped state, with some data successfully migrated onto the two healthy disks, while the third remains problematic. Fortunately, only a small portion (~200 GB) of data has likely been written during the problematic reshaping process, which could be recoverable.

Understanding Drive Metadata

Examining the output from mdadm --examine reveals that:

  • Both drives recognize themselves as active devices within a RAID4 array with four total devices.
  • The array state is marked as ‘clean’ on both drives, indicating that the disks believe their data integrity is intact.
  • There is a ‘reshape position’ suggesting an ongoing expansion from three to four drives.

Implications

Given that only a fraction of data has been processed, the majority of the data, especially on the remaining healthy drives, may still be intact. The key is to carefully interpret the current metadata and avoid further writes that could complicate data recovery.

Recommended Recovery Steps

  1. Stop any ongoing array operations:
    To prevent unintended writes, halt the array and avoid executing any commands that could further modify it.

  2. Backup current disk metadata:
    Save the output of `mdadm

Share this content:

Leave a Reply

Your email address will not be published. Required fields are marked *