Child pages
  • Retiring Volumes
Skip to end of metadata
Go to start of metadata


Due to current sophisticated disk storage devices and interfaces, the underlying disk system performs many error detection steps, bad sector re-mapping, and retry attempts. If a physical error propagates up to the Netmail Store software level, there is little chance that a deterministic set of steps can be performed to work around the failure.

Additionally, there is no guarantee that the extent of the error can be isolated or that the continued use of the failing device will allow the node to continue to operate normally with its peripheral storage devices. As a result, Netmail Store takes the conservative approach of retiring when it receives a configurable number of I/O errors.

If the configurable number of additional errors are received during the retire (disk.ioErrorTolerance), Netmail Store immediately marks the volume as Unavailable and kicks off both the volume recovery process (FVR) and the erasure coding recovery process (ERC) to relocate all the volume's objects.

Netmail Store changes a volume's state to Retiring when any of the following occur:

  • You click Retire next to a volume on the node status page in the Admin Console.

Note: If you click Retire Node, all volumes on the node are retired at the same time.

  • The number of I/O errors specified by disk.ioErrorToRetire occur in the time period specified by disk.ioErrorWindow.

A Retiring volume accepts no new or updated objects. A volume remains in the Retiring state until all of the objects stored on that volume (including replicas) are moved to other volumes in the cluster. The Retiring stat persists even if the node is rebooted. You may see the object count increase.

When all objects are moved, the volume state is changed to Retired and Netmail Store does not use the volume anymore. At that point, remove the volume for repair or discard it.

Note: If there are continued I/O errors that exceed the number specified by disk.ioErrorTolerance when the volume is in the Retiring state, the volume state is changed to Unavailable, regardless of whether or not Netmail Store has finished moving objects to other volumes.

Canceling an Ongoing Retire

You can cancel an ongoing retire by using the castorCancelVolumeRetire SNMP action. It takes a string to name a specific volume, or all.

Canceling retire on a specific volume:

caringo@Zebstrika:~/Caringo/CAStor/code/trunk/caringo/castor/protocol/snmp$
snmpset -v2c -c ourpwdofchoicehere -m ./CARINGO-MIB.txt:./CARINGO-CASTOR-MIB.txt
192.168.99.100 castorCancelVolumeRetire s "/dev/sda"
CARINGO-CASTOR-MIB::castorCancelVolumeRetire = STRING: "/dev/sda"

Canceling retire on all volumes:

caringo@Zebstrika:~/Caringo/CAStor/code/trunk/caringo/castor/protocol/snmp$
snmpset -v2c -c ourpwdofchoicehere -m ./CARINGO-MIB.txt:./CARINGO-CASTOR-MIB.txt
192.168.99.100 castorCancelVolumeRetire s "all"
CARINGO-CASTOR-MIB::castorCancelVolumeRetire = STRING: "all"

  • No labels