Exchange 2007 Storage – Disk Sizing information

This post is based on a design I am currently undertaking. I hope it is useful to others who might be involved in something similar. If you have any questions around area where I haven’t been clear, please get in touch!

Additional Info on the topic can be found in various locations including the Exchange 2007 help file and the MSExchange Team blog.

http://technet.microsoft.com/en-us/library/bb124518.aspx

http://msexchangeteam.com/archive/2007/01/15/432207.aspx

?

In particular the user profiles are shown below (key at the top):

?

User type (usage profile)

Send/receive per day approximately 50-kilobyte (KB) message size

Database cache per user?

Estimated IOPS per user

Light

5 sent/20 received

2MB

0.11

Average

10 sent/40 received

3.5MB

0.18

Heavy

20 sent/80 received

5MB

0.32

Very heavy

30 sent/120 received

5MB

0.48

?

Storage Requirement/Drive Setup

Databases LUN
IOPS Sizing

To calculate the number of IOPS required for the cluster I did the following:

The average user profile recommends IOPS = 0.18. I decided to up this to 0.3 IOPS per user given the lack of sufficient traffic data.

I then calculated the number of potential users on the cluster as follows:

Number of assumed users as stated in assumptions document = 1650. Multiply this by 1.2 to factor in the 20% growth figure as set out in the requirement document gives a figure of 1980 users.

I then included the concurrency rating of 60% to get a total of the users who would be on the system at any one time. 1980 * 0.6 = 1188 users

Therefore the number of IOPS for the database LUN is calculated as follows:

1188 users * 0.3 IOPS = 357 (rounded)

Because all the majority of mailbox access will be from clients in cached mode, after the normal stress of the caching period, the database read / write ratio will settle down to 1:1 . Given the previous statement, is might be assumed that the controller cache should be configured for 50%/50% Read / Write use, however, due to the random nature of Exchange data access more benefit will be gained by setting the cache to 75% Write / 25% Read (unless the cache is absolutely huge in which case 50% / 50% would suffice).

One last area to consider in the creation of IOPS are third party solutions in particular BlackBerry. BlackBerry is rated to create 3.64 IOPS per user. Therefore given that 50 BlackBerry users will be housed on the cluster I will add 182 IOPS to the total calculated above.

The final statement for the DB LUN sizing, including a 20% safety factor and BlackBerry users, reads as follows:

The Exchange cluster will create 610 IOPS and requires a read time with latency less than 20ms.

Capacity Sizing

To calculate the database LUN sizing information I used the following steps:

1650 users + 20% = 1980 users.

1980 users * 200MB = 396GB

5% Content Indexing = 396GB * 0.05 = 20GB (rounded)

20% Safety factor = 396 * 0.2 = 80GB (rounded)

10% Free Space = 396 * 0.1 = 40GB (rounded)

30 days deleted items retention = amount of incoming mail per day (4950MB) * 30 = 150GB

Therefore, the total DB LUN size is 396 + 20 + 80 + 40 + 150 = 690GB (rounded)

In order to provide space to restore a single storage group I recommend a recovery LUN is configured with space to hold 1.5 times the size of a single SG. This works out at 120GB (rounded).

?
Logs LUN
IOPS Sizing

To calculate the number of IOPS for the Logs LUN I used the widely accepted figure of 20% of the Database LUN IOPS figure which gives a result of 122 IOPS (rounded)

Because the Log drive simply performs sequential writes and no data is read the controller cache should be set to 100% write caching.

The final statement for the LOG LUN sizing, including a 20% safety factor, reads as follows

The Exchange cluster will create 146 IOPS and requires a write time with latency less than 10ms.

Capacity Sizing

To calculate the log LUN sizing information I used the following steps:

Users in the medium usage profile send and receive a total of approximately 50 emails a day. The average size of email is taken as 50KB. Therefore per user this equates to 2.5MB of email per day. I then multiplied the figure of 2.5 by the number of users (1980) to estimate that there will be 4950MB of email/log data per day. To allow for backup failures I have allowed for 4 times the amount of required daily log space.

On top of the figures above, I have calculated that because we will migrate up to 100 users per day, we should allow for another 20GB for migration transaction logs again multiplied by 4 to allow for backup failures.

The total space requirement for the logs LUN is 100GB (rounded)

?
Other Sizing Factors

There are a few other factors which play a part in the actual requirements for the storage. These are outlined below.

Depending on the RAID level used, there is a penalty factor which must considered when specifying the number of disks required. The IOPS figured stated in the previous sections are what the Exchange cluster servers generate. However, the RAID level affects the actual number of IOPS the controller and disks have to cope with.

RAID10

When using RAID10 calculations are as follows

DB IOPS = 610

1:1 Read/Write Ratio

Therefore 305 IOPS are Reads and 305 IOPS are Writes (rounded)

When reading data from the disks this is done only once however, when writing, because of the mirroring, the write must happen twice. RAID10 therefore has a 2:1 Write/Read penalty.

Therefore when we put in the figures above, we now have the following IOPS requirements at the SAN level:

Writes (305 * 2) + Reads 305 = 915

RAID5

When using RAID5 calculations are as follows

DB IOPS = 305

1:1 Read/Write Ratio

Therefore 305 IOPS are Reads and 305 IOPS are Writes (rounded)

When reading data from the disks this is done only once however, when writing, because of the calculation of parity the following happens;

First one disk block read occurs, then a parity read occurs, next a disk block is written and then a parity write occurs. This means that RAID5 therefore has a 4:1 Write/Read penalty.

Therefore when we put in the figures above, we now have the following IOPS requirements at the SAN level:

Writes (305 * 4) + Reads 305 = 1525

In view of the above, it is much more efficient to use RAID10 than RAID5.

?
Disk Layout

One final consideration is the layout of the physical disk providing the storage. Exchange Log files and Database files are accessed in entirely different ways. The log files receive continued sequential writes and the database files a randomly written and read from. Therefore to optimise the drives underlying the LUNs it is important to separate the disks which provide log storage from those which provide disk storage.

I therefore propose two sets of disks, both configured as RAID10. The Logs set of disks would have a single LUN and the Database set of disks would be configured with a two LUNs, one for databases and one for recovery.

Note: It should be noted that this configuration of using a single LUN with all the DBs in subfolders is only an option when using API based streaming backup. If VSS backup were used then each DB would need to be snapped separately and therefore separate LUNs should be configured.

?
Summary

The Exchange cluster will create 610 IOPS on the DB LUN and requires a read time with latency less than 20ms.

The Exchange cluster will create 146 IOPS on the Logs LUN and requires a write time with latency less than 10ms.

The total space requirement for the logs LUN is 100GB (rounded)

The total space requirement for the DB + RSG LUN is 810GB(rounded)

Two physical sets of disks are required, both configured as RAID10. The Logs set of disks would have a single LUN and the Database set of disks would be configured with a two LUNs, one for databases and one for recovery.

Drives should be configured as in the table below:

Drive Letter

Physical Disk Spindle Set

LUN

Purpose

Disk Size

RAID level

Cache

L

Set 1

1

Logs

100GB

RAID10

100% Write

S

Set 2

2

DBs

690GB

RAID10

75% Write ? 25% Read

R

Set 2

3

Recovery

120GB

RAID5 (or RAID10)

75% Write ? 25% Read

Finally, once the build has been completed you should run tests using Jetstress and Loadgen to ensure that the solution will perform adequately.

Hope this helps someone!

Cheers

Nathan