The SNMP management information base (MIB) for the Netmail Store product begins at OID .iso.org.dod.internet.private.enterprises.caringo.castor (.184.108.40.206.4.1.24659.1). Access to the SNMP agent is controlled by the community string in the node or cluster configuration file. The SNMP MIB definition file for Netmail Store is located on the USB flash device in a file called CASTORMIB.txt and is included below for reference.
Any standard SNMP query tool and monitoring system may be used to interact with Netmail Store. Examples in this section use the open source Net-SNMP (formerly UCD-SNMP) package which is available for Unix and Windows platforms. Before using most tools and monitoring packages, it will be necessary to install the Netmail Store MIB definition file. Please refer to the tool or package’s specific instructions for doing this.
An administrator may find the following tools useful in monitoring and managing Netmail Store. Messaging Architects does not endorse the applicability nor the fitness of these products when used within any environment; rather, these tools may be helpful in certain environments, particularly when used by an administrator already familiar with the tools’ capabilities.
The following example shows an SNMP walk of all the Netmail Store values on a node.
snmpwalk –v 2c –c pwd –m +CASTOR-MIB 192.168.1.101 caringo
The following example shows the request of a specific SNMP variable from a Netmail Store node.
snmpget –v 2c –c pwd –m +CASTOR-MIB 192.168.1.101
The following example shows a set request that will shutdown a Netmail Store node.
snmpset –v 2c –c pwd –m +CASTOR-MIB 192.168.1.101
caringo.castor.castorShutdownAction = “shutdown”
The following example shows a set request that will retire the volume “/dev/sda”.
snmpset –v 2c –c pwd –m +CASTOR-MIB 192.168.1.101
caringo.castor.castorRetireAction = “/dev/sda”
The action OIDs in Netmail Store are the SNMP objects that can be written to in order to affect the operation of a node or the cluster. To prevent conflicts for cluster level parameters like volumeRecoverySuspend, the action should only be written to a single node, allowing updates to the persisted settings UUID from a single node.
Writing to this object allows for the graceful shutdown or reboot of a node or and entire cluster. Value values are:
Writing to this object allows for the orderly removal of the contents of a disk volume or an entire node. It is important to retire disks in case they contain the only copies of content. The device name, from the “vols” parameter of the node configuration, or the string “all” is written to this OID. You may simultaneously retire volumes from multiple nodes in the cluster.
Writing to this object allows the logging level to be changed. When a node is booted, it sets the logging level based on the “loglevel” parameter. An administrator may wish to increase the logging level in order to debug an issue and then return the level it to its previous value when finished.
Writing to this object allows an administrator to change the location to which log messages are written. When a node is booted, it sets the logging host based on the “loghost” parameter. It may be desirable to redirect syslog messages to an administrator’s workstation in order to debug an issue.
Writing to this object allows an administrator to suspend volume recovery behavior in the cluster during an upgrade or a network outage.
This section outlines some practical approaches to using the built-in SNMP agent in order to monitor the health and operational aspects of a Netmail Store cluster. Although an administrator may set up a simple ICMP ping monitor of a Netmail Store node, using the SNMP variables allows detailed indications of disk and capacity problems.
The following variables are useful for monitoring the basic health of a Netmail Store node. The volume table will have n from 1 to the number of disk volumes.
If the monitoring console receives timeouts when trying to read these variables, there is something wrong with the node. If the state values are anything other than ok, then the node or the disks are transitioning from their normal state.
The valid states for a node are: ok, retiring, retired.
The valid states for a disk volume are: ok, retiring, retired, unavailable.
Any non-zero value in a volume’s error count indicates that a hard error has surfaced from the disk hardware through the OS driver and to the Netmail Store process.
The following variables can be monitored and collected for capacity alerting and reporting. The volume table will have n from 1 to the number of disk volumes.
The castorFreeSlots variable indicates how many more objects a node can hold before it exhausts its memory index. If this happens, the node will be unable to store additional objects until streams are deleted or moved to other cluster nodes.
In order to compute the amount of disk space that is available for writing content, add the values volFreeMbytes and volTrappedMbytes. Thus, the percent free space on a disk volume is:
(volFreeMbytes + volTrappedMbytes) / volMaxMbytes
Similarly, the percent of space being used by current content is:
volUsedMbytes / volMaxMbytes
These disk usage variables can be totaled for all disk volumes in a node and all nodes in a cluster in order to produce capacity utilization reports.
It can be useful to collect and report the amount of client activity received by nodes in order to understand end-user usage patterns. It is also useful for identifying nodes that may be receiving significantly more activity than others. This may indicate a poor PAN selection mechanism in the client application code.
The following SNMP variables indicate client request activity on a Netmail Store node: