Monday, October 5, 2009

Exchange 2010 - Archiving and Compliance

There have been a number of discussions I've been in lately which surround the compliance and archiving requirements of customers as they pertain to email, specifically Exchange Server. Many organizations who require messages to be retained and searchable for legal reasons resort to 3rd party technologies such as Symantec's Enterprise Vault platform. These add-on applications provide enhancement to the base functionality offered by Microsoft Exchange (and other email software), but often do so at the expense of the end user....and the pocketbook of the company implementing them.

Let's talk about compliance first...

In previous versions of Exchange you were at the mercy of 3rd party tools to provide a half-baked solution to the legal and regulatory requirements for mail retention and ease of search. One of the most concerning "holes" in Exchange was the ability for a user to send/recieve a message then delete it (soft [deleted items] or hard [shift+delete]), then open the "Recover Deleted Items" feature (a.k.a. the "Dumpster") and purge the messages. If you performed this task before a backup, the message(s) were never retained. If the user's mailbox was moved, the Dumpster information is not retained.

If you were a compliance officer and needed to search a mailbox for content, the process wasn't easy. There were security issues, and knowledge of PowerShell was required. If you needed to search data which had been archived to tape the process could be cumbersome and very time consuming.

Along comes "Dumpster 2.0" in Exchange 2010. Now messages that are purged from the Dumpster end up in a folder called "Purged" which is hidden from view. The user thinks the message has been purged however it remains. A compliance officer can now search the mailbox and discover the data. Since the Dumpster (including the purged items) are part of the user's mailbox now, they move when the mailbox moves and data is always preserved.

Using the new multi-mailbox search tools in Outlook Web Access, a compliance officer can now be assigned a role for discovery (using RBAC) of data and use a familiar, easy to use interface, to perform searches.

Moving along to archiving...

This one has been a hot topic lately. Archiving in Exchange 2010 consists of enabling the "archive" mailbox for a user either using PowerShell or the EMC which resides on the same disk as the primary mailbox. I know when I was in Dallas for Exchange Ignite earlier in 2009 there were a lot of people who were disappointed to hear about the technical details of archiving in Exchange 2010 (including myself....initially). Over time though I've looked back at what our customers are looking for strictly from a requirements perspective and what problems we were attempting to solve. Unfortunately in the process you sometimes trade one headache for another which is commonly the case with 3rd party tools and add-ons.

Looking strictly at why we recommended these tools in the past, the reasons were primarily to overcome issues with performance, stability, data retention, and compliance. In other words it was more about providing a solution to several problems than simply "archiving" for the purpose of recovery. So let's look at these requirements in more detail:

Archive because of PERFORMANCE: Sure, back in Exchange 2003 (and possibly in 2007) it made sense to move mail off of "expensive" fibre channel SAN disk and onto less expensive SATA disk. This also helped with memory management in that Exchange was limited to a 4kb page size; less data to cache, better performance.

The Exchange 2010 answer...don't archive because of performance. You can now run your primary and archive mailboxes on the same disk with the improvements in database schema and disk I/O. Also, the long sync time for Outlook clients no longer becomes an issue since the locally cached copy of the data contains only the data from the primary mailbox and not the archive. You can still access the archive data while "online" through Outlook Anywhere or OWA as well. The biggest misconception here though was that if you removed the data, the problem was solved. Well perhaps it helped due to the page size limitations however it's more about "item count" than actual message size. A folder with 1000 items will suffer long delays when sorting and searching no matter how large each message is. When the count increases, so does the delay. These delays are caused by high IOPS against the ESE and disk.

Archive because of STABILITY: Again, back in previous versions of Exchange you could have users with very large mailboxes and with a very very large number of items per folder. You may have even run up against the 10,000 item count limit in Exchange 2007. Mailbox corruption was a concern and the idea of reducing the size gave the corporate world the sleepy pills to rest with ease.

The Exchange 2010 answer...don't archive because of stability. Mailbox corruption has pretty much been eliminated; and I say pretty much because I've never come across it with 2007 and I've never heard of it....but then again I'm not "all seeing and all knowing"....yet. Item count in Exchange 2010 has been increasaed to 100,000 and the underlying features to help with corruption extend to the various layers from the page entry to the item level. If you've implemented Exchange 2010's new Database Availability Group feature you have multiple levels of corruption protection and self healing.

Archive because of DATA RETENTION: Ah-hah! Now we're onto something. If you need to retain data for longer periods of time you might want to save this information for quite some time. For some organizations this data is an asset up to the first day past the seventh year, then it becomes a liability. You may have deployed an archiving solution to preserve (and purge) data according to your company's retention policy.

The Exchange 2010 answer...Not much different other than the message would be to keep as much mail in Exchange as you can or want to. Use the new retention tags/policies in Exchange 2010 to manage the mailbox size over time.

Archive because of COMPLIANCE: Due to the lack of any sane method of e-discovery in previous versions of Exchange, you would typically install a tool to enhance this portion of the product. These tools often give someone the ability to perform searches using a web interface and will return data in the database. Remember, you won't always get an accurate representation if the user purged the data.

The Exchange 2010 answer...Don't archive because of compliance. Exchange 2010 includes a web interface for e-discovery and using role based security, you can assign permissions to do these tasks.

The four concerns I have with 3rd party tools are as follows:

  1. The cost. These tools are not cheap. Maintenance alone can eat you alive and in times of cost cutting, you could easily win the employee of the month coupon for saving some dough here.

  2. Interoperability. Have you ever been blocked from upgrading to a service pack or new version because the integrated software isn't compatible? These types of blockers can be a real pain for the IT department and could put the organization at risk.

  3. The user experience. This one is the most concerning. You shouldn't implement a platform which changes the user's method for searching and dealing with messages which have been archived. You lose out on existing, and new features such as message preview, the reading pane, and conversation view. Even more annoying is the inability to search/retrieve messages on your mobile phone! Don't degrade the user experience!

  4. Administration. All IT staff are gods anyway so this one doesn't hold much water. The boundless nature of one's brain is truly astounding. Seriously though, another tool, more training....yadda yadda. Junk it!

So that's all for now. I'm tired. Need sleep.


No comments:

Post a Comment