Suspend/resume not working as expected

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Suspend/resume not working as expected

Adrian Miron
The following issue was discovered while doing some work with hibernate, but it boils down to the following scenario:

1. start a transaction
2. create a prepared statement (and keep a reference to it for later use)
3. execute the statement
4. suspend the transaction
5. resume the transaction
6. execute the prepared statement a second time

By doing some debugging I saw that suspend will call xaresource.end(tmsuccess) for all resources in the transaction, but resume won’t start them again. (probably because they will be lazily started with a tmjoin later when they are used – via JdbcConnectionHandle)

But in this case, the second execution of that prepared statement is not “intercepted” by bitronix, and it will execute on a connection that was previously ended.  From my observations results may vary: sometimes the statement execution just blocks, other times it is successful but done in a new transaction.

I did a wild guess :) and added the following line:
xaResourceHolderState.start(XAResource.TMJOIN);
in XAResourceManager# resume() right before the end of the loop and things seem to work, but I really don’t know if it’s the right thing to do.


I'm using:
Bitronix 1.3.2
MSSQL 2005
MSSQL JDBC driver v2.0
Reply | Threaded
Open this post in threaded view
|

Re: Suspend/resume not working as expected

Ludovic Orban
Administrator
Hi,

Strictly speaking this isn't a bug but a limitation of the current connection pool but I agree the result is the same: it does not work as you expect it to.

Your patch is just going to work in this exact situation and only if the underlying database properly supports transaction joining. I've searched for the best way to cope with that problem and the only proper way is to wrap the driver's Statement / PreparedStatement / CallableStatement objects and trigger enlistment from there which is quite a serious change.

A simpler (but largely less elegant) fix XAResourceManager#resume() could re-enlist the resource. This isn't as easy as it sounds as some logic to check if TMJOIN is disabled or not must be added and some internal safeguards must be relaxed.

I'll try to build a patched version with those changes in over the weekend, I'd be glad if you could open an issue in JIRA in the meantime.

Thanks for the report,
Ludovic
Reply | Threaded
Open this post in threaded view
|

Re: Suspend/resume not working as expected

Adrian Miron
Thank you for your response.

I’ve added the following JIRA issue: http://jira.codehaus.org/browse/BTM-49


Thanks,
Adrian Miron
Reply | Threaded
Open this post in threaded view
|

Re: Suspend/resume not working as expected

Ludovic Orban
Administrator
I've finally found the time to work on this issue. I've committed the fix in the SVN trunk and prepared a snapshot build (see link in the JIRA issue).

Could you please give it a try and let me know if it helped ?

Thanks,
Ludovic
Reply | Threaded
Open this post in threaded view
|

Re: Suspend/resume not working as expected

Adrian Miron
Sorry for my delayed response, finally I was able to try the fix.

I did find a problem:

The XAResourceManager#resume method iterates over resources (type Scheduler) and in the same time the enlist method will remove and add objects into the resources collection (a case of concurrent modification).

In my particular case, I had two resources to be resumed and only one of them got re-enlisted. The iterator from the resume method returned twice the same object.
Reply | Threaded
Open this post in threaded view
|

Re: Suspend/resume not working as expected

Ludovic Orban
Administrator
Thanks for the feedback.

Would it be possible for you to collect debug logs while reproducing the problems and send me the file?

This would help tremendously figuring out exactly what you did and what went wrong.

Thanks,
Ludovic
Reply | Threaded
Open this post in threaded view
|

Re: Suspend/resume not working as expected

Ludovic Orban
Administrator
I found a problem in the code in case the resource supports join but does not want to join at this time. Maybe this is the problem you hit?

I've prepared a snapshot build with the fix: http://snapshots.repository.codehaus.org/org/codehaus/btm/btm/1.3.3-20090829/

Please try it and let me know if it helped. If not, please send me the debug logs you collected with that snapshot version (not the RC1).

Thanks,
Ludovic
Reply | Threaded
Open this post in threaded view
|

Re: Suspend/resume not working as expected

Adrian Miron
Yes, that was it. Moving the re-enlistment out of the loop did the trick.

Thanks,
Adrian Miron