Author |
Topic |
davidbenoit
Starting Member
4 Posts |
Posted - 2009-04-08 : 17:34:22
|
We are experimenting with the following configuration: - SQL Server 2008 64-Bit RTM- Windows Server 2008 64-Bit- VM Ware ESX Dedicated Host - 4 Quad Core Processors- 32 GB of Memory with 29 GB given to the DB Instance- SAN Storage (less than optimal configuration of drives but see later for more information) - Dedicated disks. What we have done is moved a single database from our 2005 / 32-Bit Windows 2003 non VM configuration which we had solid performance metrics with to the above described configuration to see what we could get for performance. The drive configuration is a 4 disk RAID 5 Configuration. I know this is less than optimal and will not be what we have in production. The thing that I want to be clear though is that this is the same volume that we were using on the 32-Bit system. With this database on the new platform we are seeing less write throughput to the disk than what we were seeing on 32-Bit and interestingly enough we are not getting the high Avg Disk Queue Lengths that we were on the 32-Bit platform (which normally would be good IF we were getting better write throughput). This actually makes me suspect that we are suffering from CPU bottlenecks. To add some additional information, we decided to try row and page compression on a couple of the tables that we bulk insert into thinking we could get some improved insert times but we actually saw a decrease in write performance. We made a change to the ESX host configuration and were able to get some improved performance but only when tables were not compressed. Still not at the performance level we were seeing with 32-Bit. The fact that we had to uncompress the tables to see the better performance after the ESX configuration change again makes me think that we are CPU bottlenecked. Lastly, the signal wait times on the new box are high (~ 19 - 20%), again pointing to CPU, but looking at overall processor utilization the box seems to be doing really well. I'm not sure if the signal times are high due to this being a VM. Another question for those with experience in this type of architecture.So, I am hoping that someone has had some experience on VMWare where they might have seen something similar. I know that we have introduced a lot of variables into the equation making troubleshooting hard but any thoughts that you have would be GREATLY appreciated. Thanks in advance. David |
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2009-04-08 : 22:33:00
|
Do you have any actual numbers for the disk metrics (MB/s, reads/s, writes/s)?On the SAN hardware, are you using fiber? Is the VMWare host using the same controller configuration as the original SQL Server? Cache settings the same? Fiber/duplex settings the same? Check these first for any differences. Is the SAN shared with another server or disk activity, compared to the original server? Assuming hardware settings are the same, find out if VMWare has its own caching subsystem on top of the SAN hardware cache. I would imagine there's an extra layer somewhere that might flush to disk more frequently, which would likely explain the reduction in your queue lengths but also reduce total throughput per unit time. Or a different balance of read vs. write cache.The SQL Server I/O subsystem uses a number of close-to-the-metal features like scatter/gather that may be hindered by the VM layer. I'm pretty sure VMWare has configuration settings that should allow these to more directly pass through to the hardware. |
 |
|
davidbenoit
Starting Member
4 Posts |
Posted - 2009-04-09 : 07:36:16
|
Thanks Rob! It is the same SAN, same drives (volume) and the disk is not shared so, that makes it a bit easier in that I know it is only the load from this server on these disks. It is different controllers so, I will have those configurations checked out as well as do some investigation to see whether ESX server uses another caching layer. The recommendations are truly appreciated so if you have other thoughts that would be great!David |
 |
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2009-04-09 : 08:04:53
|
I'm not well versed with VMWare ESX, our corporate IT knows it better. I'm just assuming that VMWare has its own performance settings that may be the cause; they may not. I do find a lot of cases where SQL Server doesn't play well with other components, and I know it's written with the expectation that it will be king of the hill on the server. |
 |
|
robvolk
Most Valuable Yak
15732 Posts |
|
davidbenoit
Starting Member
4 Posts |
Posted - 2009-04-09 : 23:15:12
|
Thanks Rob! I will look at this and let you know what I find. David |
 |
|
nwalter
Starting Member
39 Posts |
Posted - 2009-04-24 : 11:50:54
|
Are the LUNs presented as raw disks to the guest SQL server? Or are the LUNs presented to Vmware which then takes and formats it for VMFS and then makes VMDKs which are presented to the guest SQL Server? If the latter, there's pretty nothing much you can do about the performance side of things, writing a SQL file to a another file to finally get it on the disk is going to add overhead and latencies no matter how it's configured. One thing you might want to check to make it more bearable is that the VMFS, VMDK, and NTFS partitions are all using the same allocation size. If for instance the NTFS partition is using 64k blocks but the LUN has been formatted using 32k blocks then each block that the OS and SQL thinks its writing has to be broken into two for the disk to actually write data which will double the number of writes to disk.If at all possible you want to avoid writing your SQL databases inside of VMDK disks and use raw LUNs instead. Of course then your VMware administrators will whine about not being able to use vmotion, so the business (NOT the vmware admins) need to decide if the cost savings is worth the performance trade off. Remember, despite the hype and vmware marketing not everything is appropriate to virtualize and database servers top the list of inappropriate vm machines. |
 |
|
davidbenoit
Starting Member
4 Posts |
Posted - 2009-04-25 : 03:06:43
|
Thanks for that input. Being that we can use VMotion my guess is that we are not using the optimal configuration you recommend. The volume that was presented was previously presented to a standard windows server with no reconfiguration to support the VM configuration. At this point we have moved to a physical server for some side by side testing. I will update the post once we get some concrete information. Thanks again.David |
 |
|
BrianKl
Starting Member
3 Posts |
Posted - 2009-09-28 : 08:20:33
|
quote: Originally posted by davidbenoit Thanks for that input. Being that we can use VMotion my guess is that we are not using the optimal configuration you recommend. The volume that was presented was previously presented to a standard windows server with no reconfiguration to support the VM configuration. At this point we have moved to a physical server for some side by side testing. I will update the post once we get some concrete information. Thanks again.David
Hi.Any news on this?I've done some testing with a client of ours, and it would seem that Windows Server 2008 is the culprit.We have a very simple test that creates a table, insert 100,000 rows, makes updates to just below 1300 rows, and then reads all the rows.We have compared:SQL 2005 / Server 2003SQL 2008 / Server 2003SQL 2005 / Server 2008SQL 2008 / Server 2008We've tried physical boxes, ESX virtual machines, VMware Workstation virtual machines(!), etc.The result is clear: Whenever Server 2008 is involved, performance for insert operations drop dramatically.My 512 MB VMware Workstation virtual machine running on my laptop in the internal drive, running SQL 2008 on Server 2003, was more than twice as fast as a physical machine with datafiles on a SAN.Go figure...So far I haven't been able to figure anything out.Any clues?Kind regardsBrian Klausen |
 |
|
tripodal
Constraint Violating Yak Guru
259 Posts |
Posted - 2009-09-29 : 12:55:53
|
Is SQL using the entire available physical IO?I really like disk idle time and transfers per second in performance monitor to track this down. |
 |
|
BrianKl
Starting Member
3 Posts |
Posted - 2009-09-30 : 01:54:54
|
quote: Originally posted by tripodal Is SQL using the entire available physical IO?I really like disk idle time and transfers per second in performance monitor to track this down.
Well - I haven't measured those. But I would find it safe to say, that when my virtual machine (Server 2003/SQL 2008) sitting on my 7200 rpm laptop drive on a Windows 7, can do it in 45-50 seconds, and the 6 15.000 rpm drives quad core server (Server 2008/SQL 2005) does it just under 7 minutes - then raw IO's arent the problem as such.I actually did measure disk queing on the Server 2008-box, and it was only for the log drive that I saw any activity (makes sense). However, it rose from 0 to 1, with a very occasional spike to 2.I have 2003/2005-combo's that I can get over 20 on disk queing...Thanks for the input though - but it would seem to me that Server 2008 has some serious performance issues with files.Through me research I have found other people having issues with using Server 2008 for file servers. |
 |
|
YellowBug
Aged Yak Warrior
616 Posts |
Posted - 2009-09-30 : 14:59:34
|
quote: then raw IO's arent the problem as such.
I'm currently on the VSphere Fast-track course. And the instructor advises VMDKs for SQL Server; and RDM only for clusters with MSCS.Here's the theory: "A RDM is recommended when a vitural machine must interact with a real disk on the SAN. A RDM is usually used when you want to cluster a virtual machine with a physical machine using Microsoft Cluster Service." |
 |
|
BrianKl
Starting Member
3 Posts |
Posted - 2009-10-06 : 06:44:02
|
Slight update on my part here.I was with the customer, and had the opportunity to try SQLIO against the same SAN through various hardware, and virtual machines, and compare with the fast 2003/2005-combo they had.Long story short, it wasn't exactly the 2008 that performed badly. Regardless of whether it was physical or virtual it had about the same write performance against the SAN, and it was about equal to similar 2003 servers.The reason for the perceived bad performance, was that the 2003/2005-combo had internal storage, and a more than efficient storage controller, with a large amount of cache.The amount of data was less than the cache, so actually we were testing cache performance instead of disk performance.Performance on the 2008/2008 servers were right on par with what the disk system could deliver :-). |
 |
|
|