High cpu usage on commit (during delist)

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

High cpu usage on commit (during delist)

Amit
Hello,
        I have been facing a high cpu usage on our application server that uses bitronix transaction manager. Our database server is SQL Server 2012 X64 SP1. The cpu shoots up to 100%. The thread dumps indicate that the application has more than 20 threads stuck at the commit operation (performing the delist operation)

   java.lang.Thread.State: RUNNABLE
        at bitronix.tm.internal.XAResourceManager.findXAResourceHolderState(XAResourceManager.java:173)
        at bitronix.tm.internal.XAResourceManager.delist(XAResourceManager.java:110)
        at bitronix.tm.BitronixTransaction.delistResource(BitronixTransaction.java:104)
        at bitronix.tm.BitronixTransaction.delistUnclosedResources(BitronixTransaction.java:340)
        at bitronix.tm.BitronixTransaction.commit(BitronixTransaction.java:164)
        at bitronix.tm.BitronixTransactionManager.commit(BitronixTransactionManager.java:97)

We are using bitronix version 1.3.3. I understand that its quite a old version but since its a legacy application we do not plan to upgrade the bitronix version and retest it.

I am looking for pointers of where to look into for troubleshooting this further. Any help would be appreciated.
Reply | Threaded
Open this post in threaded view
|

Re: High cpu usage on commit (during delist)

chrisjr
I would seriously consider retesting with 3.0.0-SNAPSHOT from github, as BTM's concurrency has been completely rearchitected since 1.3.3:

http://docs.codehaus.org/display/BTM/BTM-2.2

If you have 20 threads all "stuck" then my guess would be that you have one slow thread, and 19 threads waiting for your slow thread to release a monitor. So if you really must stay with BTM 1.3.3 then you will need to find your slow thread and discover what it is waiting for.

Cheers,
Chris
Reply | Threaded
Open this post in threaded view
|

Re: High cpu usage on commit (during delist)

Amit
Thanks for the reply.

The method (findXAResourceHolderState) from the stacktrace where all the threads seemed to be stuck (the thread status is RUNNABLE and not BLOCKED) does not have synchronized code.

        public XAResourceHolderState findXAResourceHolderState(XAResource xaResource)
                throws BitronixSystemException
          {
167 Iterator it = this.resources.iterator();
168 while (it.hasNext()) {
169  XAResourceHolderState xaResourceHolderState = (XAResourceHolderState)it.next();
170
171  if (xaResourceHolderState.getXAResource() == xaResource) {
172 return xaResourceHolderState;
173  }
174 }
175 return null;
          }

All 20 threads seem to be executing line number 173. Any idea what could be going?

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: High cpu usage on commit (during delist)

Ludovic Orban-2
XAResourceManager.java:173 in BTM 1.3.3 points to a closing curly brace in resume(). If you stack trace is correct, that means you're not using BTM 1.3.3, or at least not the official one.

The line-numbered code sample you posted also doesn't match any official release source code, and it doesn't seem to match your stack trace either as line 173 points again to a closing curly brace.

That's quite puzzling, isn't it?

I highly recommend you to try to reproduce the problem in a test environment and collect debug logs, this is the only guaranteed way to get a proper answer to what's really going on. If you cannot do that, you'll have to solve mysteries one by one: first what exact version of BTM you're using, then what exact source code line is the head of your stack pointing to.


About the problem itself: findXAResourceHolderState() is a linear search method: its timing is proportional to the amount of enlisted resources, so when you have only 2 or 3 resources it's extremely cheap, so if you see your VM spending time in there that suggests that you're enlisting dozens of resources. Is that expected from your app?

I also noticed that the root cause of this method call is BitronixTransaction.delistUnclosedResources(). That method is called when you did not close your SQL connection before calling commit() on the TM. A simple way to avoid this call is to make sure you close all your connections before calling commit().






On Thu, Apr 10, 2014 at 10:42 AM, Amit <[hidden email]> wrote:
Thanks for the reply.

The method (findXAResourceHolderState) from the stacktrace where all the
threads seemed to be stuck (the thread status is RUNNABLE and not BLOCKED)
does not have synchronized code.

        public XAResourceHolderState findXAResourceHolderState(XAResource
xaResource)
                throws BitronixSystemException
          {
167             Iterator it = this.resources.iterator();
168             while (it.hasNext()) {
169               XAResourceHolderState xaResourceHolderState =
(XAResourceHolderState)it.next();
170
171               if (xaResourceHolderState.getXAResource() == xaResource) {
172                     return xaResourceHolderState;
173               }
174             }
175             return null;
          }

All 20 threads seem to be executing line number 173. Any idea what could be
going?

Thanks!



--
View this message in context: http://bitronix-transaction-manager.10986.n7.nabble.com/High-cpu-usage-on-commit-during-delist-tp1636p1638.html
Sent from the Bitronix Transaction Manager mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Reply | Threaded
Open this post in threaded view
|

Re: High cpu usage on commit (during delist)

Amit
Yes. I realized that we had modified bitronix back then for performance. Hence the line number don't match.

About the point on resources, I enabled debug logs for XAResourceManager class and ran a load test. This verified that we have only 1 resource getting enlisted. Since the threads seem to be stuck in runnable state at line number 173 probably indicate some issue in the iterator. What do you think?

I also upgraded bitronix to the latest stable version 2.1.4 and that seems have resolved the issue. I had to change my bitronix configuration source code since retryUnrecoverableResourcesRegistrationInterval() is no more present in 2.1.4.
1. Are there any other changes I should be making to have a correct/valid bitronix configuration?
2. Do you suggest any specific testing that I should be doing from the bitronix standpoint to make sure my legacy application works as expected?
Reply | Threaded
Open this post in threaded view
|

Re: High cpu usage on commit (during delist)

Ludovic Orban-2
There might have been a bug in the iterator that got fixed long ago, or maybe you introduced that with your changes, it's hard to say without a deep investigation.

If you can upgrade to 2.1.4, by all means do it. Dozens of bugs have been fixed and the core transaction manager's engine has received some attention to make it much more performant since support for JDK 1.4 was dropped. Backward compatibility has been maintained almost 100% with BTM 1.0 so if everything seems to work after a quick test, the chances are very high that everything will actually do work as they should.

retryUnrecoverableResourcesRegistrationInterval was removed in 1.3.3 and replaced with a more sophisticated mechanism that is more reliable and fully automated. That was documented in the release notes: http://svn.codehaus.org/btm/tags/2.0.1/release-notes/release-notes-1.3.3-RC1.txt

My advice about configuration is always the same: leave everything to its default value unless you have a specific need to change some setting. Don't try to mess around with them at random or you may get unexpected and nasty side effects.




On Sun, Apr 13, 2014 at 6:28 AM, Amit <[hidden email]> wrote:
Yes. I realized that we had modified bitronix back then for performance.
Hence the line number don't match.

About the point on resources, I enabled debug logs for XAResourceManager
class and ran a load test. This verified that we have only 1 resource
getting enlisted. Since the threads seem to be stuck in runnable state at
line number 173 probably indicate some issue in the iterator. What do you
think?

I also upgraded bitronix to the latest stable version 2.1.4 and that seems
have resolved the issue. I had to change my bitronix configuration source
code since retryUnrecoverableResourcesRegistrationInterval() is no more
present in 2.1.4.
1. Are there any other changes I should be making to have a correct/valid
bitronix configuration?
2. Do you suggest any specific testing that I should be doing from the
bitronix standpoint to make sure my legacy application works as expected?



--
View this message in context: http://bitronix-transaction-manager.10986.n7.nabble.com/High-cpu-usage-on-commit-during-delist-tp1636p1640.html
Sent from the Bitronix Transaction Manager mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email