Initializing JMS PoolingConnectionFactory and multi-threading

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Initializing JMS PoolingConnectionFactory and multi-threading

polly.c.chang
Hi,

First, I just want to say that I've only used Bitronix for a week, but I already love it.  The code is really clean, well-documented, tested on different databases (!), and it pretty much works right out of the box.  I used Geronimo+Jencks+TranQL+XAPool before, and they were a royal pain!  So I just wanted to say thank you for creating such a nice library.

Now, I discovered something interesting in Bitronix that I thought I should share.  I am using Bitronix to manage XA transactions between a database and JMS.  We integrate everything using Spring and use Spring's JDBC and JMS templates to write data and send messages.  The code that we have is multi-threaded because speed is very important.  When I integrated Bitronix into the mix, I unknowingly configured the JMS connection pool with the default behavior, which is to initialize itself at the first request for a connection.  So when multiple threads requested a connection for the first time, they got into a race condition in the PoolingConnectionFactory.buildXAPool() method.  Two threads would sometimes enter that method at the same time, and the ResourceRegister.register() method would throw an IllegalArgumentException because one thread already registered first.  Then my unit tests would fail because things were in an inconsistent state.

Once I figured out what was happening, I read the documentation a little more closely, and I realized that I could use Spring to initialize the JMS PoolingConnectionFactory just like your example for the JDBC PoolingDataSource.  So I configured it this way:

        <bean id="xaJmsConnectionFactory"
                class="bitronix.tm.resource.jms.PoolingConnectionFactory"
                init-method="init" destroy-method="close">
                ...
        </bean>

This does indeed initialize the PoolingConnectionFactory when the Spring context is starting up, and it solves the race condition.  However, I get a weird exception during startup now.  I get:

19:00:57,328 ERROR [Recoverer] unable to rollback aborted in-doubt branch on resource ibus2 - error=XAER_NOTA. Forgotten heuristic ?
javax.transaction.xa.XAException
        at progress.message.jimpl.xa.XAResource.rollback(Unknown Source)
        at bitronix.tm.recovery.Recoverer.rollback(Recoverer.java:464)
        at bitronix.tm.recovery.Recoverer.rollbackAbortedBranchesOfResource(Recoverer.java:443)
        at bitronix.tm.recovery.Recoverer.rollbackAbortedTransactions(Recoverer.java:415)
        at bitronix.tm.recovery.Recoverer.run(Recoverer.java:113)
        at bitronix.tm.BitronixTransactionManager.<init>(BitronixTransactionManager.java:47)
        at bitronix.tm.TransactionManagerServices.getTransactionManager(TransactionManagerServices.java:41)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:115)
        at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:387)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:936)
... (stack trace pruned for space)

I've tested this by deleting the transaction logs, so there should be no unfinished transactions that need recovering.  However, Bitronix always seems to think that it needs to do recovery if Spring initializes it.  Bitronix did not do this when I let it initialize upon the first request for a connection.  The good thing is that this error doesn't actually seem to hurt anything.  Bitronix ignores this error and continues on just fine, so it's not bothering me too much.  But I thought that I should let you know and see if there's something else that I'm misconfiguring.

Please let me know what you think.  

Thanks!
--Polly
Reply | Threaded
Open this post in threaded view
|

Re: Initializing JMS PoolingConnectionFactory and multi-threading

Ludovic Orban
Administrator
Hi,

Thanks for your feedback. I'm glad you appreciate the hard work I've put into BTM.

You are actually right, there is a race condition in the PoolingConnectionFactory: it might get initialized multiple times if you don't call init() manually. I'll open a bug report and get it fixed for next release. BTW, PoolingDataSource suffers the exact same bug. I'll fix it as well. Thanks for the bug report.

The recovery error you're getting is really strange but as you noticed, deleting the recovery logs won't help. This is something you should never do: recovery logs only contain the answer to this question:

 if resource XYZ has an in-doubt transaction, should that transaction be committed or rolled back ?

It is the resource itself that reports to have an in-doubt transaction. According to your stack trace, BTM tries to resolve it with a rollback but the resource answers back with a 'this transaction does not exist' error. Strange, isn't it ?

If I'm not mistaken, you're using SonicMQ. Have you tried to manually force an in-doubt transaction to terminate ? I have never tried SonicMQ so I don't know about its capabilities nor its limitations nor its interpretation of the XA spec.

Sending me the BTM debug logs might help me find an answer to this question as well as the exact version of your JMS bus and a brief description of what you've tried to do. This will allow me to trace back what exactly is happening and try to reproduce the issue.

Thanks,
Ludovic
Reply | Threaded
Open this post in threaded view
|

Re: Initializing JMS PoolingConnectionFactory and multi-threading

polly.c.chang
Ludovic Orban wrote
You are actually right, there is a race condition in the PoolingConnectionFactory: it might get initialized multiple times if you don't call init() manually. I'll open a bug report and get it fixed for next release. BTW, PoolingDataSource suffers the exact same bug. I'll fix it as well. Thanks for the bug report.
Great!  Thanks for fixing the race condition for the next release.

Ludovic Orban wrote
The recovery error you're getting is really strange but as you noticed, deleting the recovery logs won't help. This is something you should never do: recovery logs only contain the answer to this question:

 if resource XYZ has an in-doubt transaction, should that transaction be committed or rolled back ?
ah, ok.  I won't try to delete the logs anymore.  :)

Ludovic Orban wrote
It is the resource itself that reports to have an in-doubt transaction. According to your stack trace, BTM tries to resolve it with a rollback but the resource answers back with a 'this transaction does not exist' error. Strange, isn't it ?
Yes, especially because this only happens when I use Spring to do the initialization like this:

<bean id="xaJmsConnectionFactory"
                class="bitronix.tm.resource.jms.PoolingConnectionFactory"
                init-method="init" destroy-method="close">

As soon as I remove the init-method and destroy-method settings to allow Bitronix to initialize the pool upon the first request, the rollback error does not happen anymore.

Ludovic Orban wrote
If I'm not mistaken, you're using SonicMQ. Have you tried to manually force an in-doubt transaction to terminate ? I have never tried SonicMQ so I don't know about its capabilities nor its limitations nor its interpretation of the XA spec.

Sending me the BTM debug logs might help me find an answer to this question as well as the exact version of your JMS bus and a brief description of what you've tried to do. This will allow me to trace back what exactly is happening and try to reproduce the issue.
Yes, that is correct.  We are using SonicMQ 7.0.1 Build 184.  

A basic description is just this:  due to the race condition in init(), I need to use Spring to call init() while my application is starting up because it's single-threaded at that point.  After the application starts up, our application spawns multiple threads to read data from the database and send messages using Spring's JmsTemplate.  The problem is that when I configure Spring to call init(), I get the rollback error.  I will send you via email my debug log files and the Spring configuration file.  Hopefully that will help.

Thanks for looking into this!

--Polly
Reply | Threaded
Open this post in threaded view
|

Re: Initializing JMS PoolingConnectionFactory and multi-threading

Ludovic Orban
Administrator
Hi,

I looked into your logs and it seems my assumption was right: SonicMQ reports an in-doubt transaction during recovery but chokes when instructed to roll it back.

I've found a relevant chapter in the documentation:
 http://www.sonicsoftware.com/products/documentation/docs/mq_config_manage.pdf
 Chapter 16, page 538: XA transactions.

Follow those instructions to check if you can see a transaction in that table, you should have one. Then force it to be rolled back via that console and that should fix the problem.

Now I wonder how this transaction got stuck. Maybe you killed the JVM while it was busy sending messages ? Then why does SonicMQ not allow the TM to fix it ? Maybe there's a timeout on the transaction's lifespan but that's forbidden by the XA spec.

I don't know for sure what's going on here but I think I've provided you with a way to get rid of that error (I hope so) and I know enough to reproduce it here and figure out exactly what's going on.

Let me know if you managed to get rid of this error or not.

Ludovic
Reply | Threaded
Open this post in threaded view
|

Re: Initializing JMS PoolingConnectionFactory and multi-threading

polly.c.chang
Hi Ludovic,

You are right!  The transaction must have been left hanging from when I killed a unit test.  I found the transaction and rolled it back just like you instructed, and the error went away.

Thanks for all your help.  :)

--Polly


Ludovic Orban wrote
Hi,

I looked into your logs and it seems my assumption was right: SonicMQ reports an in-doubt transaction during recovery but chokes when instructed to roll it back.

I've found a relevant chapter in the documentation:
 http://www.sonicsoftware.com/products/documentation/docs/mq_config_manage.pdf
 Chapter 16, page 538: XA transactions.

Follow those instructions to check if you can see a transaction in that table, you should have one. Then force it to be rolled back via that console and that should fix the problem.

Now I wonder how this transaction got stuck. Maybe you killed the JVM while it was busy sending messages ? Then why does SonicMQ not allow the TM to fix it ? Maybe there's a timeout on the transaction's lifespan but that's forbidden by the XA spec.

I don't know for sure what's going on here but I think I've provided you with a way to get rid of that error (I hope so) and I know enough to reproduce it here and figure out exactly what's going on.

Let me know if you managed to get rid of this error or not.

Ludovic