The Netmail Store SNMP agent implementation allows you to monitor the health of cluster nodes, collect usage data, and control node actions. This appendix describes how to integrate a Netmail Store cluster into an enterprise SNMP monitoring infrastructure.
SNMP MIB Reference
If you boot from a CSN, see the following MIBs located at /usr/share/snmp/mibs:
- CASTOR-MGR-MIB.txt. An aggregate MIB for all cluster nodes.
- CARINGO-CASTOR-MIB.txt. A standard Netmail Store hardware MIB provided with the Netmail Store SNMP agent
If you do not boot from a CSN, see the CARINGO-CASTOR-MIB.txt MIB located in the root directory of the Netmail Store software distribution.
Netmail Store now allows you to access the standard hardware MIBs distributed with the Net-SNMP package. These MIBs provide hardware reporting for areas such as processor load, memory availability, and network bandwidth. For complete information on the available OIDs, see the Net-SNMP MIB documentation.
Managing Netmail Store Nodes
Netmail Store cluster nodes are controlled through the SNMP action commands. The following OIDs enable you to disable nodes and volumes with nodes from a Netmail Store cluster:
- castorShutdownAction. Disable nodes and volumes within nodes for servicing.
- castorRetireAction. Disable nodes and volumes within nodes for retirement.
Shutdown Action for Nodes
To gracefully shut down a Netmail Store node, the string “shutdown” is written to the castorShutdownAction OID. Similarly, writing the string “reboot” to this OID causes a Netmail Store node to reboot.
When a node receives a shutdown or reboot action, it initiates a graceful stop by unmounting all of its volumes and removing itself from the cluster. For a shutdown, the node is powered off if the hardware supports this action. For a reboot, the node will reboot to machine, re-read the node or cluster configuration files, and startup Netmail Store.
A graceful shutdown is required to perform a quick reboot. Performing an ungraceful shutdown forces the node to perform consistency checks on all its volumes before it can rejoin the cluster.
Before shutting down or rebooting a node, check the node status page or the SNMP castorErrTable OID for critical error messages. Any logged critical messages will be cleared upon reboot.
Retire Action for Nodes and Volumes
The Retire action is used to permanently remove a node or a volume within a node from the cluster. This action is intended for retiring legacy hardware or pre-emptively pushing content away from a volume with a history of I/O errors. Retired volumes and nodes are visible in the Netmail Store Admin Console until the cluster is rebooted. See Retiring Volumes for more information about retiring volumes.
Note: The Retire action may take an extended amount of time to complete and requires at least three health processor cycles.
When a volume is retired, all of its stored objects are moved to other nodes in the Netmail Store cluster. After you initiate a volume retirement, the volume becomes a read-only volume and no additional objects can be stored on it. After all of the objects are moved to other locations in the cluster, the volume is idled with no further read/write requests. Each volume is given a unique name within its node – the device string from the
vols line in the configuration file. To retire a volume, its name is written as a string to the castorRetireAction OID. The volume retirement process is initiated immediately upon receipt and the action cannot be aborted after it starts.
To manually retire a volume using the Netmail Store Admin Console, click the targeted node IP address in the console interface. In the Actions column, click Retire next to the targeted volume.
Retiring a node means all volumes on the node are retired at the same time. After all volumes in the node are retired and the node data is copied elsewhere in the cluster, the node is permanently out of service and will not respond to further requests.
To retire a node and all of its volumes, the
all string is written to the castorRetireAction OID. The node retirement process is initiated immediately upon receipt and the action cannot be aborted after it starts.
Warning: Ensure that the cluster has both enough free space and nodes to store the objects on a retiring volume. When subclusters are in use, these requirements apply to the subcluster where the retiring volume resides. If the number of nodes in the cluster or subcluster do not have enough space to store at least two replicas of all objects, the retiring node cannot complete the retirement process until you add additional nodes. The Retire action does not require that the configured minreps is maintained to complete retirement. If there are not enough nodes to maintain minreps, retire will log messages stating that sufficient replicas cannot be created.
SNMP Tools and Monitoring Systems
Any standard SNMP query tool and monitoring system can be used to interact with Netmail Store. The examples in this section use the open source Net-SNMP (formerly UCD-SNMP) package that is available for UNIX and Microsoft Windows® platforms. Before using most tools and monitoring packages, install the Netmail Store MIB definition file. See the instructions included with the tool or package for more information.
Open Source Tools
The following tools can be useful to monitor and manage Netmail Store. Netmail does not endorse the applicability nor the fitness of these products when used within any environment.
- Net-SNMP (http://net-snmp.sourceforge.net). Provides command-line tools for UNIX and Windows environments to send and receive SNMP requests.
- Nagios (http://www.nagios.org). Provides web-based monitoring system for UNIX environments that can monitor systems and send alerts through email and pager.
- Zenoss (http://www.zenoss.com). An SNMP-based system for IT monitoring and management.
SNMP Examples with Netmail Store
Before you use the examples in this section, perform the following procedures:
- Record the IP address of a Netmail Store cluster node. If the cluster is not in your subnet, record the SCSP Proxy.
In the examples below, the node's IP address is 172.16.0.32
- Run the command from the directory that contains CARINGO-CASTOR-MIB.txt.
For example, copy CARINGO-CASTOR-MIB.txt from the root directory of the USB flash drive or distribution to a local directory.
- Record the following passwords:
- read-only-password. The password value for the
snmpuser defined in the
security.operatorsconfiguration variable. By default, this password is public.
- read-only-password. The password value for the
- read-write-password. The password value for the
snmpuser defined in the
security.administratorsconfiguration variable. By default, this password is
- read-write-password. The password value for the
For more information, see “Managing Netmail Store Administrators and Users”.
The following example shows an SNMP walk of all the Netmail Store values on a node.
snmpwalk -v 2c -c read-only-password -m +./CARINGO-CASTOR-MIB.txt 172.16.0.32 caringo
The following example shows a request for a specific SNMP variable from a Netmail Store node.
snmpget -v 2c -c read-only-password -m +./CARINGO-CASTOR-MIB.txt 172.16.0.32 reads
The following example shows a set request that will shut down a Netmail Store node.
snmpset -v 2c -c read-write-password -m +./CARINGO-CASTOR-MIB.txt 172.16.0.32
castorShutdownAction s shutdown CARINGO-CASTOR-MIB::castorShutdownAction =
The following example shows a set request that changes the cluster's
sleepAfter setting to 7260 seconds (121 minutes).
snmpset -v2c -c read-write-password -m +./CARINGO-CASTOR-MIB.txt 172.16.0.32 sleepAfter i 7260
SNMP Action OIDs
The “action” OIDs in Netmail Store are the SNMP objects that affect the operation of a node or the cluster. To prevent conflicts for cluster level parameters such as volumeRecoverySuspend, the action should only be written to a single node to allow updates to the persisted settings UUID from a single node.
Writing to this object allows you to restart a feed on a node using SNMP. When you set the OID value to a specific feed value, the feed restarts on all nodes in the cluster.
The castorFeedTable OID allows you to view the Netmail Store feed information for a specific node. Each entry indicates a feed running on the selected node. The Netmail Store Admin Console allows you to view the SNMP Repository Dump page, which provides node-specific information. See “SNMP Repository Dump” for more information.
Writing to this object allows you to change the logging level. When a node is booted, it sets the logging level based on the
loglevel parameter. You can increase the logging level to debug an issue and then return the level it to its previous value when completed.
Writing to this object allows you to remove the contents of a disk volume or an entire node in an orderly fashion. Instead of removing disks, consider retiring disks to save content that may not be saved on another disk. The device name from the node configuration
vols parameter or the
all string is written to this OID. You can simultaneously retire volumes from multiple nodes in the cluster.
Writing to this object allows you to gracefully shutdown or reboot a node or an entire cluster. The supported values are:
- shutdown. Shuts down this node only.
- reboot. Reboots this node only.
- clustershutdown. Shuts down all nodes in the cluster.
- clusterreboot. Reboots all nodes in the cluster.
Writing to this object allows you to change the logging host for writing log messages. When a node is booted, it sets the logging host based on the
loghost parameter. Additionally, you can redirect syslog messages to your workstation to debug an issue.
Writing to this object allows you to suspend volume recovery behavior in the cluster during an upgrade or a network outage.
Practical SNMP with Netmail Store
This section outlines some practical approaches in using the built-in SNMP agent to monitor the health and operational aspects of a Netmail Store cluster. Although you can set up a simple ICMP ping monitor of a Netmail Store node, using the SNMP variables provides detailed indications of disk and capacity problems.
The following variables can be used to monitor the basic health of a Netmail Store node. The volume table will have n from 1 to the number of disk volumes.
caringo.castor.castorState. Should equal “OK.”
caringo.castor.castorVolTable.volEntry.volState.n. Should equal “OK.”
caringo.castor.castorVolTable.volEntry.volErrors.n. Should be zero.
If the monitoring console receives timeouts when trying to read these variables, there is something wrong with the node. If the state values are anything other than “ok,” the node or the disks are transitioning from their normal state.
The valid states for a node are:
The valid states for a disk volume are:
Any non-zero value in the volume error count indicates that a hard error has surfaced from the disk hardware through the OS driver and to the Netmail Store process.
The following variables can be monitored and collected for capacity alerting and reporting. The volume table will have n from 1 to the number of disk volumes.
caringo.castor.castorFreeSlots. Should be greater than zero.
The castorFreeSlots variable indicates how many more objects a node can hold before it exhausts its memory index. If this occurs, the node is unable to store additional objects until objects are deleted or moved to other cluster nodes (or more RAM is added to the node). The free slots is indicates how much RAM is required per object. For more information, see the section on memory effects on node storage in the chapter on hardware considerations in the Netmail Store Getting Started Guide.
To compute the amount of disk space that is available for writing content, add the values volFreeMbytes and volTrappedMbytes. Thus, the percent free space on a disk volume is:
(volFreeMbytes + volTrappedMbytes) / volMaxMbytes
Similarly, the percent of space being used by current content is:
volUsedMbytes / volMaxMbytes
These disk usage variables can be totaled for all disk volumes in a node and all nodes in a cluster to produce capacity utilization reports.
Client Activity Reporting
You can collect and report the amount of client activity received by the nodes to understand the end-user usage patterns and identify nodes that may be receiving significantly more activity than others. The resulting value can indicate a poor primary access node selection mechanism in the client application code.
The following SNMP variables indicate client request activity on a Netmail Store node.
SNMP Repository Dump
The SNMP Repository Dump page provides node-specific information that is not available in the Netmail Store Admin Console.
To access the SNMP Repository Dump page for a cluster node:
1. Open the Netmail Store Admin Console.
2. In the Node IP column, click the IP address of the target node.
3. Scroll down and maximize Node Info.
4. Scroll down and click SNMP Repository.