Netmail Archive 5.x
Overview of Netmail Archive backup and restore solutions.
Preserving the integrity of archived data and the corresponding indexes is critical. Like all electronic data, archived data is subject to risks unless properly managed with respect to integrity and recoverability. Indexes can easily be rebuilt from the source data (i.e., archives), but if source data is destroyed, it cannot be regenerated from the indexes. When considering the various data backup solutions available for protecting archived data, the type of data and recovery requirements are the critical factors for choosing the most appropriate solution. A discussion of all available backup solutions and how to use them is beyond the scope of this document. Instead, it will focus on the backup/restore solutions most commonly used for backing up archived data generated by Netmail Archive.
Backup vs. Archiving
Archiving and backing up data are two different operations. When data is archived, items are simply moved from the live mailbox to the archive system. Whether the data resides on the live system or in the archive, there still exists in only one copy in one location. A backup, however, creates a replica of the data which can be stored locally or off-site.
Netmail Archive Data Formats
Netmail Archive separates archived data into three components:
An XML representation of the message body, aka “the XML file.”
An XML representation of auditing information, such as the actions taken against that message whilst in the archives, aka “the ADT file.”
A binary copy of the attachments (if any), aka “the ATT files.”
The ADT file is the entry point for any Netmail operation, so it is very important that this file exists and that it has a one-to-one correlation with the XML file. The ADT file will point to the XML file, which will point to the ATT file(s).
Email messages tend to be fairly small if only the message text and message metadata are considered. A typical message is usually about 4kB in size (equivalent to almost 500 words). The added attachments, graphics, and templates are what account for almost 99% of the actual message size. Audit files contain message metadata indicating message chain of custody, message security, and message auditing data, and are typically only 1-2kB in size.
The average user sends and receives approximately 7500 messages per year, and in a company with 1000 users, this amounts to over 7.5 million messages per year. Since retention policies often require many years of storage, the number of messages, and consequently the number of files, retained on the file system quickly becomes large. This needs to be taken into consideration, as many backup solutions are based on file backup.
Backing Up Your Archives
Recommended Backup Strategy for Netmail Archive
Every organization’s email environment is unique. As such, each Netmail Archive environment will have a different setup. Nevertheless, the basic components that make up the Netmail environment will remain the same: Index Server, Archive Server, Client Access Server (ie. Netmail Search). Depending on the size of the environment, an organization may have multiple of each. The following outlines a sample Netmail Archive environment with multiple Archive Servers and Client Access Servers, and the backup strategy for each.
|Index01 – Netmail Index Server|
|Archive01 - Netmail Archive Server 1|
|Archive02 - Netmail Archive Server 2|
|Archive03 - Netmail Archive Server 3|
|RP01 - Netmail Remote Provider 1|
RP02 - Netmail Remote Provider 2
What Files to Back Up
Netmail Archive works with a number of different file types that are needed for the archiving of data. It is therefore important to know what each file type is responsible for and whether or not it is recoverable. The following is a list of Archive file types that must be considered in every backup strategy:
|File Type||Backup Strategy|
The actual mailbox contents. If data is lost, it cannot be recovered from within Netmail. Backup is critical.
|ATT||The email attachments. If data is lost, it cannot be recovered from within Netmail. Backup is critical.|
|ADT||Contains auditing information. If data is lost, that information is not recoverable, however the files can be re-created in a virgin state. Backup is very important.|
|LDAP||Contains your Netmail Archive configuration information. Backup is important.|
|IDX||Contains your indexes (required for viewing and searching archives). If lost, indexes can be rebuilt, but with several days of down time. Backup is important.|
Netmail Store is a content addressable storage (CAS) system. This means that data is not stored as traditional files on a file system but is stored on the device as objects with each object having a set of metadata which defines the properties of the data.
Netmail Store is self managing and automatically backs up data without administrator intervention. As each piece of data is written to the system, the system creates a hash value for the data so it can maintain the integrity of the data and determine if the data has changed or become corrupt. The system also automatically replicates the data into two or more copies and distributes them across several different nodes or disk drives, which can be local or located in a separate disaster recovery site. In the event of a hardware failure, the system will begin to replicate all files which existed on the failed device to ensure multiple copies are always maintained. In the event of corruption, the system will automatically repair the corrupted file by using a valid copy. This particular system automatically maintains a real-time backup, ensuring that a copy of the files always exists. The system only allows for writing, reading and deleting of files and does not allow for the modification of existing files, thus eliminating any risk of file overwrites or changes. If required, it supports WORM storage whereby a file can be locked against deletion for a specified period of time.
Netmail Store also provides the best method for records management—when a file is destroyed according to organizational retention policies, all other copies are destroyed along with it.
Tape Backup Systems
Today’s tape technology has much higher capacities and transfer rates than in the past, but tape devices still use linear storage as opposed to random write and seek which is provided by hard drives. The main issue with tape backup is with the software. Tape backup software provides for different types of backups including full backup and incremental backup.
With incremental backups, traditional tape backup systems use an “Archive” status bit. This is a file level attribute applied to a file when it is written or changed by the file system. When the backup software backs up the file, it resets the status bit so it knows the file was backed up. This means that the software must scan all files on the target system, and since file systems can contain millions of files, this can take a very long time. Because of time constraints, it may be impossible to obtain daily incremental backups.
Newer tape backup software packages can make use of the Windows file journaling feature. With file journaling, the file system itself uses a table to keep track of modified files instead of modifying the status bit on every file. The tape backup software therefore only has to read the table to obtain a list of modified files and can perform an incremental backup far more efficiently.
One negative aspect of using tape backup is that this creates a separate copy of the data which survives archival destruction policies and could therefore be a potential discovery liability. With incremental backups, an original full backup must be maintained, presenting loopholes in the records destruction policy.
Disk to Disk Backup
Disk to disk backup is the current standard for larger organizations that have terabytes of data to back up. Because disk is a faster media than tape, backups are much faster, reducing the backup window and impact on production systems. Once the backups are on a secondary storage system, they can be moved to tape for off-site storage. Like tape backup, the critical component is the software. Products designed for this such as Commvault® and other enterprise software packages provide for journal based backup.
Again, the one cautionary note when conducting any backup of data is that it creates an external copy of the data, which may therefore be contrary to record destruction policies.
Note: If you cannot guarantee limited retention of the backed up items on disk or tape, this media should not be used.
SAN Replication and Snapshot
Many of today’s sophisticated storage area networks (SANs) have built-in fault tolerance and redundancy capabilities. While most SANs can be configured with advanced RAID (Redundant Array of Inexpensive Disk) configurations, it does not adequately address protection of data integrity and protection against data corruption. Since backups create a static copy while replication creates a dynamic copy, corruption with the dynamic copy can be replicated to the secondary device. In this case, one needs to use backup, or if the SAN architecture permits, snapshots. A snapshot is a backup function of the SAN that allows the SAN to take a block level image recording of the storage area at any given point in time. If there is corruption, the system can be restored to that exact point in time. Snapshots are much more efficient than backups, can be performed very quickly, and can be performed on live data systems. The only issue with snapshots is the storage overhead they require. Snapshooting a SAN LUN generally takes more space than performing a snapshot on a file system, which is why snapshots on devices like NetApp Filer with a CIFS (Windows File System) module will take up less space than a snapshot of a SAN Volume.
Organizations whose SAN architectures support snapshots should consider taking daily or weekly snapshots. Generally, data is archived when it is older than 7-15 days, so if a restore of data is required the source data on the email system is still available to re-archive. Check with your policies and requirements to determine the best frequency for taking snapshots. It should correspond to the criticality of restoration and data accessibility.
Backup of archive data is critical not only because it includes irretrievable corporate information but because organizations maintaining data under litigation hold can be severely penalized by the courts for not taking appropriate actions to protect the data. Organizations need to adopt the best and most effective methods for guaranteeing data integrity. From a preferred list of solutions, Content Addressable Storage (Netmail Store) is the first choice for Archive data backup, with SAN Replication/Snapshot second, and disk/tape backup as a last choice. Not backing up the data is not an option. For indexes, most existing backup solutions will suffice as long as care is taken not to backup during system operations.