Author |
Topic |
influent
Constraint Violating Yak Guru
367 Posts |
Posted - 2010-11-22 : 18:49:05
|
All of a sudden yesterday my SQL 2008 (10.0.2531) instance started having query timeout issues and I have no idea what changed. I monitored the server with Perfmon for a while and didn't see anything unusual. I've been watching Activity Monitor all day today and randomly (but very often now) a few processes will show up and be listed multiple times (e.g. Session ID 58 shows 6 times with the same info), all with a state of SUSPENDED, all blocking themselves, not each other (i.e. SPID 58 is Blocked By SPID 58). I've tried using Profiler but haven't learned anything from it. This is a major problem, can anybody help me? |
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2010-11-22 : 18:53:54
|
Ignore the blocking when it's to themselves. It isn't a problem.Run a trace and capture those that have a duration over a few seconds under your application's query timeout value. For instance, if your timeout value is 30 seconds, add a filter to the trace to look for Duration > 28000 (duration is in milliseconds in Profiler, it's in microseconds on the backend though).Once you've capture the queries, check the execution plans. Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/Subscribe to my blog |
 |
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
influent
Constraint Violating Yak Guru
367 Posts |
Posted - 2010-11-22 : 19:44:31
|
Great suggestions, but of the 5 queries I ended up with, all of them run in less than 2 seconds on their own, even when the suspended processes are showing up. I'm so confused. Is it somehow possible that we've always had processes showing up as SUSPENDED in Activity Monitor and I just never noticed, and that the problem is with our web servers? That seems very unlikely since I've used Activity Monitor several times with this server... |
 |
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2010-11-22 : 19:49:56
|
Check the wait stats. If the query is running, suspended just indicates it's waiting on something. Running the queries in SSMS is not the same thing as running it from the application. You'd have to specify the exact connection settings in order to duplicate it. Due to this, I use the showplan xml event in Profiler to grab the actual execution plans. You can't use a duration filter though. You can also get the execution plans from the DMVs if you are quick enough.Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/Subscribe to my blog |
 |
|
influent
Constraint Violating Yak Guru
367 Posts |
Posted - 2010-11-23 : 20:23:23
|
I think I may have made an important discovery. The drive that holds the backups filled up 5 days ago, so we weren't getting backups, including transaction log backups. The log file is up to 57 GB, even though the mdf and log are usually around 3 GB each. I just created a transaction log backup with "truncate the transaction log" selected but the log is still 56 GB. ??? Is there a good/useful way to analyze the log? |
 |
|
chadmat
The Chadinator
1974 Posts |
Posted - 2010-11-23 : 20:32:27
|
That is irrelevant to your performance problems. On another note, if this is a production system, you should NOT be running profiler on it. You can run a server side trace, or better yet, you can get the info you are looking for by querying sys.dm_exec_query_stats and cross applying it with sys.dm_exec_query_plan(plan_handle).Truncating the Log doesn not shrink the log, you have to do a shrink file to reclaim that space.-Chad |
 |
|
influent
Constraint Violating Yak Guru
367 Posts |
|
chadmat
The Chadinator
1974 Posts |
Posted - 2010-11-23 : 21:50:09
|
Absolutely, you should shrink the file. However, the size has no bearing on the performance of your queries whether you believe it or not. The log growths may have impacted performance during the time when the growths were actually happening, but a very large T-Log file would have no ill effects on the performance of the server. -Chad |
 |
|
influent
Constraint Violating Yak Guru
367 Posts |
Posted - 2010-11-24 : 12:20:15
|
Replication was the problem with the log file size. Using sp_repldone cleared out some stuck transactions, which got the log from 57 GB down to 30 GB. It still said 87% of the space was free in the log, and then it shrunk down to 9 GB using shrinkfile. Once again it said 87% free, so I disabled replication. Then it shrunk to 8 GB with shrinkfile. Then I tried again, just for kicks, and it shrank down to nothing. ?? Note that I was always creating transaction log backups prior to shrinkfile. Anyway, replication is enabled again and has nothing to do with the SUSPENDED queries I'm still seeing in Activity Monitor. But nobody is experiencing timeouts anymore... |
 |
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
influent
Constraint Violating Yak Guru
367 Posts |
Posted - 2010-11-24 : 19:40:00
|
I don't understand how to read the wait stats, but if I add OPTION(MAXDOP 1) to the end of the sproc that seems to always show up as suspended, even though it runs more slowly (1500ms instead of 300ms), it doesn't show up in Activity Monitor as suspended. What are the drawbacks to MAXDOP? |
 |
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
Kristen
Test
22859 Posts |
Posted - 2010-11-25 : 14:41:21
|
WE have a similar issue when our server goes over some tipping point. I can;t put my finger on what is causing it, but the net effect is that there are a queue of people outside waiting for web pages, and impatiently pressing REFRESH which makes things worse for SQL.The tipping point for us is 500 TCP connections, but I have no idea what this translates to in the associated real world but it does seem to be a metric that impacts us. |
 |
|
chadmat
The Chadinator
1974 Posts |
Posted - 2010-11-27 : 04:15:27
|
Is the Wait_Type CXPacket? -Chad |
 |
|
influent
Constraint Violating Yak Guru
367 Posts |
Posted - 2010-11-29 : 15:21:41
|
Not sure who you're asking Chad but my wait types are CXPACKET. FWIW, we haven't had any timeout issues since the log file went back down to a normal size. |
 |
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
chadmat
The Chadinator
1974 Posts |
Posted - 2010-11-29 : 15:28:05
|
CXPacket isn't necessarily a bad thing, it just means you have a parallel query running, and the threads are waiting to sync. -Chad |
 |
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
|