BTM.getCurrentTransaction() a bottleneck at load

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

BTM.getCurrentTransaction() a bottleneck at load

dukehoops
I profiled our app running a load test where 40 users where reading same data in a loop, each at 50 ms think time. I noticed non-linear perf deterioration as number of users increased: avg request latency 100ms @ 10 users; 600ms @ 40. Profiling the app I discovered that it spend 83% of all blocked time in BitronixTransactionManager.getCurrentTransaction(). Our app spent 23,816 seconds blocked in that method while the second worst hot spot managed only 888s. The *entire app's* time spent in runnable thread status was about 1600 seconds.

Looking at getCurrentTransaction() I see that a SynchronizedMap is used to store current Transaction - which explains these poor results. Why not use a ThreadLocal?

I am pasting Hot Spot methods in CVS below, the entire JProfiler5 snapshot is available here: BTM_CacheReadsSnapshot.jps

App stack:
Tomcat 6 NIO
JVM 1.6
BTM 1.3.2
Spring 2.5.6
Hibernate 3.3.1.GA
JbossCache3 3.0.3.GA (transactional using JTA; MVCC)

"Hot spot","Inherent time (microseconds)","Average Time","Invocations"
"bitronix.tm.BitronixTransactionManager.getCurrentTransaction",23816420527,"n/a","n/a"
"com.doppelganger.domain.User.setInvitationSingleton",888787270,"n/a","n/a"
"bitronix.tm.BitronixTransactionManager.getCurrentContext",630230561,"n/a","n/a"
"com.doppelganger.domain.CustomSpaceProperties$$EnhancerByCGLIB$$1da498e6.setCustomSpace",442470036,"n/a","n/a"
"com.doppelganger.domain.User$$EnhancerByCGLIB$$a316f4f7.getAllTimeSocialStatus",427503627,"n/a","n/a"
"com.doppelganger.service.impl.relations.UserRelationsServiceImpl.getHomeLocation",415180132,"n/a","n/a"
"com.doppelganger.dao.impl.AbstractHibernateDAO2.createQuery",319004722,"n/a","n/a"
"com.doppelganger.service.impl.relations.UserRelationsServiceImpl.doGetFriends2",283248117,"n/a","n/a"
"com.doppelganger.domain.AllTimeUserRanking$$EnhancerByCGLIB$$9959d419.getProfileViewCountRanking",272748512,"n/a","n/a"
"com.doppelganger.domain.AllTimeUserSocialStatus$$EnhancerByCGLIB$$d5aafc56.getRanking",221201141,"n/a","n/a"
"com.doppelganger.domain.UserProfile$$EnhancerByCGLIB$$119cedec.isLocationVisible",187341810,"n/a","n/a"
"com.doppelganger.domain.User.setUserSessions",172422912,"n/a","n/a"
"bitronix.tm.BitronixTransactionManager.setCurrentContext",105150283,"n/a","n/a"
"com.doppelganger.dao.impl.HibernateUserDAO.findByName",67883123,"n/a","n/a"
"com.doppelganger.log.Log4JNDCFilter.getUserContext",43085437,"n/a","n/a"
"java.lang.Thread.run",29181483,"n/a","n/a"
"com.doppelganger.service.tx.TxRetryAspect.doRetryIfNeeded",27617824,"n/a","n/a"
Reply | Threaded
Open this post in threaded view
|

Re: BTM.getCurrentTransaction() a bottleneck at load

Ludovic Orban
Administrator
Pre-1.3.x versions used to use a ThreadLocal but this changed because of http://jira.codehaus.org/browse/BTM-16

I agree that performance could be improved but not by dropping a useful feature.

Maybe the synchronized map could be replaced with a ConcurrentHashMap in case JDK 1.5 or the 1.4 backport concurrent package is detected in the classpath.

Do you have other suggestions ?
Reply | Threaded
Open this post in threaded view
|

Re: BTM.getCurrentTransaction() a bottleneck at load

dukehoops
Thanks for a quick response, Ludovic.


Yes, ConcurrentHashMap should result in a huge improvement. Why? Because ConcurrentHashMap doesn't block on reads and its' Iterators do not throw concurrent mod exception. Specifically in:
1. in BitronixTransactionManager.contexts
2. in ResourceRegistrar.resources (I was going to bring that up next, but enlist/delistResource are bottlenecks in transactions that do enlist XA resources) - at static synchronized ResourceRegistrar.findXAResourceHolder method.


I've checked out trunk and am implementing #1 right now.

BTW, do any of the tests in BTM test suite simulate load? If not, I will report back with our app-level load tests.

Let me think about BTM-16 and get back to you.

-nikita


Ludovic Orban wrote
Pre-1.3.x versions used to use a ThreadLocal but this changed because of http://jira.codehaus.org/browse/BTM-16

I agree that performance could be improved but not by dropping a useful feature.

Maybe the synchronized map could be replaced with a ConcurrentHashMap in case JDK 1.5 or the 1.4 backport concurrent package is detected in the classpath.

Do you have other suggestions ?
Reply | Threaded
Open this post in threaded view
|

Re: BTM.getCurrentTransaction() a bottleneck at load

dukehoops
I've implemented change #1 (see below) like so in BitrnonixTransactionManager.java:

    //private final Map contexts = Collections.synchronizedMap(new HashMap());
    private final Map contexts = new ConcurrentHashMap();
    //private final Map inFlightTransactions = Collections.synchronizedMap(new HashMap());
    private final Map inFlightTransactions = new ConcurrentHashMap();

All btm tests still passed; re-testing our app under load yielded these results:
- read throughput up 47%
- avg request latency down 30%
- BitronixTransactionManager.getCurrentTransaction() off Hot Spots Methods list completely

Onto #2 - synchronized methods in ResourceRegistar - particularly findXAResourceHolder. Am I correct in understanding that the reason methods in ResourceRegistrar are synchronized is to maintain consistent view of ResourceRegistrar.resources?

dukehoops wrote
Thanks for a quick response, Ludovic.


Yes, ConcurrentHashMap should result in a huge improvement. Why? Because ConcurrentHashMap doesn't block on reads and its' Iterators do not throw concurrent mod exception. Specifically in:
1. in BitronixTransactionManager.contexts
2. in ResourceRegistrar.resources (I was going to bring that up next, but enlist/delistResource are bottlenecks in transactions that do enlist XA resources) - at static synchronized ResourceRegistrar.findXAResourceHolder method.
Reply | Threaded
Open this post in threaded view
|

Re: BTM.getCurrentTransaction() a bottleneck at load

Ludovic Orban
Administrator
I've carefully read the ConcurrentHashMap javadoc and I think it cannot be used safely here.

The reason is that the background recoverer needs a fully consistent view of in-flight transactions whenever it runs or it might terminate one of them while it's running. Since the ConcurrentHashMap is a 'mostly correct' view of the actual data I'm afraid using it would open the door to serious race conditions.

I have to agree that the chance of such race condition to happen is virtually zero but that doesn't make me any more comfortable with it.

I wonder if there is anything else that can be done or if a safe way to use the ConcurrentHashMap with the recoverer could be found but I don't know yet so I have to think about it.

Other suggestions are welcome.
Reply | Threaded
Open this post in threaded view
|

Re: BTM.getCurrentTransaction() a bottleneck at load

dukehoops

Ludovic Orban wrote
I've carefully read the ConcurrentHashMap javadoc and I think it cannot be used safely here.
do you mean in #1 or #2. If #1 do you mean inFlightTransactions only or that and contexts?

Ludovic Orban wrote
 I'm afraid using it would open the door to serious race conditions.
Do you have a test that simulates such conditions? If not, how would you construct one? (maybe i can help)
Reply | Threaded
Open this post in threaded view
|

Re: BTM.getCurrentTransaction() a bottleneck at load

Dennis Brakhane-2
In reply to this post by Ludovic Orban
On Fri, Mar 27, 2009 at 9:58 PM, Ludovic Orban <[hidden email]> wrote:
> I wonder if there is anything else that can be done

If keySet is not needed, HashMap() could be wrapped using an
ReadWriteLock, this way, gets only block if a put is in progress.

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply | Threaded
Open this post in threaded view
|

Re: BTM.getCurrentTransaction() a bottleneck at load

James House

Those aren't available in Java 1.4

Dennis Brakhane wrote:

> On Fri, Mar 27, 2009 at 9:58 PM, Ludovic Orban <[hidden email]> wrote:
>  
>> I wonder if there is anything else that can be done
>>    
>
> If keySet is not needed, HashMap() could be wrapped using an
> ReadWriteLock, this way, gets only block if a put is in progress.
>
> ---------------------------------------------------------------------
> To unsubscribe from this list, please visit:
>
>     http://xircles.codehaus.org/manage_email
>
>
>  


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply | Threaded
Open this post in threaded view
|

Re: BTM.getCurrentTransaction() a bottleneck at load

Dennis Brakhane-2
On Sat, Mar 28, 2009 at 12:18 AM, James House <[hidden email]> wrote:
>
> Those aren't available in Java 1.4

I'd propose to look for backport-util-concurrent when running on a VM
< 1.5 and use it if it's available. If not, BTM could fall back to
synchronizedMap and issue a warning that performance degrades unless
concurrent is available.

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply | Threaded
Open this post in threaded view
|

Re: BTM.getCurrentTransaction() a bottleneck at load

Ludovic Orban
Administrator
In reply to this post by dukehoops

dukehoops wrote
do you mean in #1 or #2. If #1 do you mean inFlightTransactions only or that and contexts?
Only the in-flight transactions map. A bit unrelated but still important: I noticed the inFlightTransactions actually is a duplicate of the thread contexts map.

I think this calls for a refactoring of this specific functionality. The context management should be moved out of the BitronixTransactionManager to a new internal class. I'm sure this would make the picture a lot cleaner.


dukehoops wrote
Do you have a test that simulates such conditions? If not, how would you construct one? (maybe i can help)
Sorry but I don't and I don't think one can be written without ad-hoc modifications. Here's the idea:

The background recoverer can be triggered at any time. When this happens, it builds a list of dangling records in the journal and run recover() on all resources. By comparing these two lists the recoverer can take the decision to either commit or rollback dangling transactions.

The delicate part is that the recoverer must ignore transactions that are still in-flight and managed or it could interfere with the TM's work and terminate a transaction that the TM is busy with. This is why the recoverer must have an accurate list of in-flight transactions before it start its job: to make sure it will ignore in-flight transactions.

To trigger a race condition here you should have a new transaction created, have the recoverer collect the list of in-flight transactions before this new transactions gets registered in the inFlightTransactions map, have the transaction run until the prepare phase then resume the recoverer.

I hope this description helps.
Reply | Threaded
Open this post in threaded view
|

Re: BTM.getCurrentTransaction() a bottleneck at load

Ludovic Orban
Administrator
I think I've found a safe way to get that performance improvement implemented and more:

 - use concurrent hashmap when available (JDK 1.5 and 1.4 + backport concurrent)
 - no more possibility of race conditions in the recoverer
 - riddance of the slow/dodgy BitronixTransactionManager.getInFlightTransactions() method

This change isn't small but quite feasible. It involves rewriting the BTM-39 fix in a totally different way, some serious refactoring of the thread context management code and... a clever idea.

The core of this idea is that the recoverer must always ignore in-flight transactions no matter how. The current implementation I made for BTM-39 locks the in-flight transactions map while its collecting in-flight XIDs avoiding any new transaction from starting while it's collecting. In-flight XIDs are then filtered out of the recovered XIDs by the recoverer.

Instead of collecting an exact list of in-flight XIDs, we could instead collect the start date of the oldest running transaction then assume that any XID newer than that is still in-flight. This can be done as the Bitronix XID implementation contains an encoded timestamp. This way the approximate behavior of the ConcurrentHashMap isn't relevant anymore.


Is there anything I overlooked ? Any comments ?

Thanks,
Ludovic
Reply | Threaded
Open this post in threaded view
|

Re: BTM.getCurrentTransaction() a bottleneck at load

Dennis Brakhane-2
On Tue, Mar 31, 2009 at 9:17 PM, Ludovic Orban <[hidden email]> wrote:
> Instead of collecting an exact list of in-flight XIDs, we could instead
> collect the start date of the oldest running transaction then assume that
> any XID newer than that is still in-flight. This can be done as the Bitronix
> XID implementation contains an encoded timestamp. This way the approximate
> behavior of the ConcurrentHashMap isn't relevant anymore.
>
> Is there anything I overlooked ? Any comments ?

Maybe I'm too tired, but I don't see why this will work. If
transaction A is long running, and TX B and C start afterwards, B can
end while A and C are still active. B is younger than A, but still
must not be ignored by the recoverer.

How/when is the oldest running XID determined? At the time the
recoverer starts? If it ends while the recoverer is running, a new
oldest XID must be determined.

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply | Threaded
Open this post in threaded view
|

Re: BTM.getCurrentTransaction() a bottleneck at load

Ludovic Orban
Administrator
Dennis,

You're right but you missed the point of the idea.

Taking back your example: if A is long running and B and C start afterwards and end while A is still running then yes with this algorithm B and C won't be recovered if the recoverer kicks it at this exact time.

What you have to keep in mind is that long running is very relative in XA: a transaction is hardly going to run longer than a few dozen seconds while the recoverer is launched at regular intervals. I think we can safely skip transactions that could be recovered (they're younger than another long running one) because the long running transaction will eventually terminate and the recoverer will end up being able to recover those skipped transactions.

Or is your point that it's not a good idea to skip in-doubt transactions' recovery because of an older one still being in-flight ? And am I clear enough ?

Thanks,
Ludovic

2009/3/31 Dennis Brakhane <[hidden email]>
On Tue, Mar 31, 2009 at 9:17 PM, Ludovic Orban <[hidden email]> wrote:
> Instead of collecting an exact list of in-flight XIDs, we could instead
> collect the start date of the oldest running transaction then assume that
> any XID newer than that is still in-flight. This can be done as the Bitronix
> XID implementation contains an encoded timestamp. This way the approximate
> behavior of the ConcurrentHashMap isn't relevant anymore.
>
> Is there anything I overlooked ? Any comments ?

Maybe I'm too tired, but I don't see why this will work. If
transaction A is long running, and TX B and C start afterwards, B can
end while A and C are still active. B is younger than A, but still
must not be ignored by the recoverer.

How/when is the oldest running XID determined? At the time the
recoverer starts? If it ends while the recoverer is running, a new
oldest XID must be determined.

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

   http://xircles.codehaus.org/manage_email



Reply | Threaded
Open this post in threaded view
|

Re: BTM.getCurrentTransaction() a bottleneck at load

Dennis Brakhane-2
On Tue, Mar 31, 2009 at 10:35 PM, Ludovic Orban <[hidden email]> wrote:
> What you have to keep in mind is that long running is very relative in XA: a
> transaction is hardly going to run longer than a few dozen seconds while the
> recoverer is launched at regular intervals. I think we can safely skip
> transactions that could be recovered (they're younger than another long
> running one) because the long running transaction will eventually terminate
> and the recoverer will end up being able to recover those skipped
> transactions.

I admit that I do not yet grok the whole recovery logic, but to me it
feels dangerous.

What would happen in this contrieved example:

Transaction A is taking an insane amount of time (let's assume we
communicate with a slow as hell webservice), furthermore, let's assume
that transaction log wrap around happens rather quickly (size is set
to 1 MB, and a huge number of transactions with long resource names
take place).

Now, say TX B is written to log 1 as prepared, but now a DB decides to
go on holiday, so no commit can occur. (Or would it occur but written
as commited with the resource attached to the broken DB connection
missing?) Now the log file rotates to log 2. After a little time, log
2 is full as well, so BTM switches back to 1. Transaction A still
isn't done (and hasn't been rolled back as the timeout was raised
higher than 60s, knowing that some processes take ages). Now the
recoverer will ignore B, and the log entry of B gets overwritten. Now
the application crashes.

I wouldn't consider this a real issue, but if this would lead to data
loss, maybe it's possible to construct a more realistic example from
this one.

Greetings,
  Dennis

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply | Threaded
Open this post in threaded view
|

Re: BTM.getCurrentTransaction() a bottleneck at load

Ludovic Orban
Administrator
Dennis,

BTM does not support web services transactions (even if it could in the long future). Even then it's highly unrecommended to have XA transactions running for too long (they hold hard locks in the resources). What you are describing here is a poor use case for XA transactions.

But let's say someone nevertheless decides to do that anyway. When BTM rotates logs from fragment 1 to fragment 2 all the pending transactions of fragment 1 are copied to the 2nd one such as there is no way to loose any pending transaction's log.

Please keep in mind that even if that problem existed it would be a journal problem, not a recoverer one. But thank you for pointing this out as it helps reviewing the design.

Ludovic

2009/3/31 Dennis Brakhane <[hidden email]>
On Tue, Mar 31, 2009 at 10:35 PM, Ludovic Orban <[hidden email]> wrote:
> What you have to keep in mind is that long running is very relative in XA: a
> transaction is hardly going to run longer than a few dozen seconds while the
> recoverer is launched at regular intervals. I think we can safely skip
> transactions that could be recovered (they're younger than another long
> running one) because the long running transaction will eventually terminate
> and the recoverer will end up being able to recover those skipped
> transactions.

I admit that I do not yet grok the whole recovery logic, but to me it
feels dangerous.

What would happen in this contrieved example:

Transaction A is taking an insane amount of time (let's assume we
communicate with a slow as hell webservice), furthermore, let's assume
that transaction log wrap around happens rather quickly (size is set
to 1 MB, and a huge number of transactions with long resource names
take place).

Now, say TX B is written to log 1 as prepared, but now a DB decides to
go on holiday, so no commit can occur. (Or would it occur but written
as commited with the resource attached to the broken DB connection
missing?) Now the log file rotates to log 2. After a little time, log
2 is full as well, so BTM switches back to 1. Transaction A still
isn't done (and hasn't been rolled back as the timeout was raised
higher than 60s, knowing that some processes take ages). Now the
recoverer will ignore B, and the log entry of B gets overwritten. Now
the application crashes.

I wouldn't consider this a real issue, but if this would lead to data
loss, maybe it's possible to construct a more realistic example from
this one.

Greetings,
 Dennis

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

   http://xircles.codehaus.org/manage_email