I am seeing a very troubling situation on our QA server. Every couple of days, our QA WebLogic cluster will fail with every thread blocked in DefaultScriptSessionManager. I have 50 threads, and here's how it seems to wind up:
1 thread is blocked here:
"[STUCK] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock java.lang.Object@164b23e BLOCKED org.directwebremoting.impl.DefaultScriptSessionManager.invalidate(DefaultScriptSessionManager.java:125)
1 thread is blocked here (blocked, I believe, on the thread above):
"[STUCK] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock java.lang.Object@70207c BLOCKED
And 48 threads are blocked here:
"[STUCK] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock java.lang.Object@164b23e BLOCKED
As you can see, both the invalidate() call and the checkTimeouts() call are blocking on the same java object. Looking in the code, both checkTimeouts and invalidate block on the same sessionLock member variable. This is highly problematic because the call to invalidate() happens after the synchronized block in checkTimeouts(). Here's what happens: (by the way, I think this is only a problem if you have multiple in-flight DWR requests for the same user)
Thread A calls checkTimeouts, makes it through the synchronized block and calls invalidate() on a session. That grabs the invalidLock on the session. Thread A then gets interrupted.
Thread B calls checkTimeouts and holds the sessionLock, and then calls into isInvalidated on the same session that Thread A is interrupted on. Thread B tries to acquire the invalidLock but can't, since thread A holds it.
Thread A resumes and calls the second line of invalidate, which is manager.invalidate(). That method tries to grab the sessionLock, but cannot since Thread B is holding it waiting for the invalidLock held by thread A.
And deadlock ensues.
The easist thing I can see to do is to move the for loop at the end of checkTimeouts() inside the synchronized block. That way it'd be guaranteed not to deadlock.
Thanks Rob. In 2.x we cannot use ConcurrentMaps. For 3.x we require more recent versions of Java and we have already refactored the class in question to take advantage of the Java 1.5+ concurrency features.
We can fix this for you in 2.x though, can you create a separate issue for us? Thanks.
I am aware of that library. The problem is that adding it requires all pre 1.5 users to have it in their runtime and we have always tried to run with as few dependencies as possible.
Per David's request I've created DWR-536.