DiskJournal lock

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

DiskJournal lock

Gérald Quintana
Hi all,

I am facing strong contention on the DiskJournal class lock (not XAPool as in the other mail thread) : sometimes I get more than 40 threads (out of 200) waiting for this lock. Is it normal? What could explain this symptom?
We tried to enlarge disk journal size, but it didn't help. What can we do to alleviate such contention?

Using BTM 2.1.1 / Java 6U10 / Windows Server 2003

Thanks for your help,
Gérald
Reply | Threaded
Open this post in threaded view
|

Re: DiskJournal lock

Brett Wooldridge-2
Hi Gerald,

The disk journal, by it's nature is a sequential log of transactions, and is going to be bound largely by the performance of your disk I/O.  As implemented it is written sequentially, with a high-level lock guarding the write.  Hence the pile-up on the lock.  I haven't looked to see how or if concurrency could be increased, however you might try these tips:

1. If you really have only one DataSource, rather than distributed transactions, disable the disk journal.  In that scenario, the database's own transaction log is probably sufficient for recovery.  (Though if there is something I'm overlooking maybe Ludovic can weigh in).

2. If there are other things performing heavy I/O on the same disk (possibly the database), put the disk journal on another physical disk.

3. Sometimes decreasing concurrency can actually speed things up.  40 threads piled on the lock could also be an indication of over-concurrency.  Given that your CPU doesn't have 40 cores, or more importantly your I/O subsystem doesn't have 40 spindles, consider a structural change in your code.  This could improve performance whether there is a disk journal involved or not.  For example, can there be one (or few) writers performing SQL, while the other 40+ threads dump things into a work queue that they pull from?  In the case of my company, doing this in a few places removed a lot of contention at the *database* level (nevermind Bitronix) in terms of row locks, page locks, table locks, etc.

Just some thoughts.  I will try to take a look at the disk journal (TransactionLogAppender) to see if there's any easy wins there, but it may take me a few days to get around to it.

Regards,
Brett

On Wed, Oct 3, 2012 at 8:22 PM, Gérald Quintana <[hidden email]> wrote:
Hi all,

I am facing strong contention on the DiskJournal class lock (not XAPool as in the other mail thread) : sometimes I get more than 40 threads (out of 200) waiting for this lock. Is it normal? What could explain this symptom?
We tried to enlarge disk journal size, but it didn't help. What can we do to alleviate such contention?

Using BTM 2.1.1 / Java 6U10 / Windows Server 2003

Thanks for your help,
Gérald

Reply | Threaded
Open this post in threaded view
|

Re: DiskJournal lock

Gérald Quintana
Hi Brett,

Thanks for you answer, very interesting.

1. I don't understand your point, if I use Bitronix it's precisely because I have 2 resources (namely Oracle DB and Sonic MQ) to coordinate.

2. We're working on I/O monitoring.

3. All of 40 threads are either JMS listeners or HTTP request handlers (Web UI). The Oracle DB is not particularly stressed and we do not correlate thread locks with database peak activity. As a result, limiting activity at higher level (JMS or HTTP) would be a pity to me.

Gérald

2012/10/3 Brett Wooldridge <[hidden email]>
Hi Gerald,

The disk journal, by it's nature is a sequential log of transactions, and is going to be bound largely by the performance of your disk I/O.  As implemented it is written sequentially, with a high-level lock guarding the write.  Hence the pile-up on the lock.  I haven't looked to see how or if concurrency could be increased, however you might try these tips:

1. If you really have only one DataSource, rather than distributed transactions, disable the disk journal.  In that scenario, the database's own transaction log is probably sufficient for recovery.  (Though if there is something I'm overlooking maybe Ludovic can weigh in).

2. If there are other things performing heavy I/O on the same disk (possibly the database), put the disk journal on another physical disk.

3. Sometimes decreasing concurrency can actually speed things up.  40 threads piled on the lock could also be an indication of over-concurrency.  Given that your CPU doesn't have 40 cores, or more importantly your I/O subsystem doesn't have 40 spindles, consider a structural change in your code.  This could improve performance whether there is a disk journal involved or not.  For example, can there be one (or few) writers performing SQL, while the other 40+ threads dump things into a work queue that they pull from?  In the case of my company, doing this in a few places removed a lot of contention at the *database* level (nevermind Bitronix) in terms of row locks, page locks, table locks, etc.

Just some thoughts.  I will try to take a look at the disk journal (TransactionLogAppender) to see if there's any easy wins there, but it may take me a few days to get around to it.

Regards,
Brett

On Wed, Oct 3, 2012 at 8:22 PM, Gérald Quintana <[hidden email]> wrote:
Hi all,

I am facing strong contention on the DiskJournal class lock (not XAPool as in the other mail thread) : sometimes I get more than 40 threads (out of 200) waiting for this lock. Is it normal? What could explain this symptom?
We tried to enlarge disk journal size, but it didn't help. What can we do to alleviate such contention?

Using BTM 2.1.1 / Java 6U10 / Windows Server 2003

Thanks for your help,
Gérald


Reply | Threaded
Open this post in threaded view
|

Re: DiskJournal lock

Brett Wooldridge-2
Hi Gerald,

My point (obviously not applicable to you) was, some people (including me) use Bitronix in a "container managed transaction" environment -- where the application container controls the demarcation of transactions whether you are using one resource or ten.  Many users use Bitronix with only one resource (a single DB).

I/O monitoring data would be interesting.  Clearly there is disk I/O contention.  There is locking at several levels in Bitronix, so it would be really useful if you could provide a thread dump containing the blocked threads.  It would help identify the locks involved and where the "choke point" is.

Regards,
Brett


On Wed, Oct 3, 2012 at 10:58 PM, Gérald Quintana <[hidden email]> wrote:
Hi Brett,

Thanks for you answer, very interesting.

1. I don't understand your point, if I use Bitronix it's precisely because I have 2 resources (namely Oracle DB and Sonic MQ) to coordinate.

2. We're working on I/O monitoring.

3. All of 40 threads are either JMS listeners or HTTP request handlers (Web UI). The Oracle DB is not particularly stressed and we do not correlate thread locks with database peak activity. As a result, limiting activity at higher level (JMS or HTTP) would be a pity to me.

Gérald


2012/10/3 Brett Wooldridge <[hidden email]>
Hi Gerald,

The disk journal, by it's nature is a sequential log of transactions, and is going to be bound largely by the performance of your disk I/O.  As implemented it is written sequentially, with a high-level lock guarding the write.  Hence the pile-up on the lock.  I haven't looked to see how or if concurrency could be increased, however you might try these tips:

1. If you really have only one DataSource, rather than distributed transactions, disable the disk journal.  In that scenario, the database's own transaction log is probably sufficient for recovery.  (Though if there is something I'm overlooking maybe Ludovic can weigh in).

2. If there are other things performing heavy I/O on the same disk (possibly the database), put the disk journal on another physical disk.

3. Sometimes decreasing concurrency can actually speed things up.  40 threads piled on the lock could also be an indication of over-concurrency.  Given that your CPU doesn't have 40 cores, or more importantly your I/O subsystem doesn't have 40 spindles, consider a structural change in your code.  This could improve performance whether there is a disk journal involved or not.  For example, can there be one (or few) writers performing SQL, while the other 40+ threads dump things into a work queue that they pull from?  In the case of my company, doing this in a few places removed a lot of contention at the *database* level (nevermind Bitronix) in terms of row locks, page locks, table locks, etc.

Just some thoughts.  I will try to take a look at the disk journal (TransactionLogAppender) to see if there's any easy wins there, but it may take me a few days to get around to it.

Regards,
Brett

On Wed, Oct 3, 2012 at 8:22 PM, Gérald Quintana <[hidden email]> wrote:
Hi all,

I am facing strong contention on the DiskJournal class lock (not XAPool as in the other mail thread) : sometimes I get more than 40 threads (out of 200) waiting for this lock. Is it normal? What could explain this symptom?
We tried to enlarge disk journal size, but it didn't help. What can we do to alleviate such contention?

Using BTM 2.1.1 / Java 6U10 / Windows Server 2003

Thanks for your help,
Gérald



Reply | Threaded
Open this post in threaded view
|

Re: DiskJournal lock

Brett Wooldridge-2
Hi Gerald,

You might also try setting forcedWriteEnabled=false.  Doing so may increase performance, but opens a (narrow) window in which a catastrophic failure may not be rolled-forward properly.

Brett

On Thu, Oct 4, 2012 at 1:31 PM, Brett Wooldridge <[hidden email]> wrote:
Hi Gerald,

My point (obviously not applicable to you) was, some people (including me) use Bitronix in a "container managed transaction" environment -- where the application container controls the demarcation of transactions whether you are using one resource or ten.  Many users use Bitronix with only one resource (a single DB).

I/O monitoring data would be interesting.  Clearly there is disk I/O contention.  There is locking at several levels in Bitronix, so it would be really useful if you could provide a thread dump containing the blocked threads.  It would help identify the locks involved and where the "choke point" is.

Regards,
Brett


On Wed, Oct 3, 2012 at 10:58 PM, Gérald Quintana <[hidden email]> wrote:
Hi Brett,

Thanks for you answer, very interesting.

1. I don't understand your point, if I use Bitronix it's precisely because I have 2 resources (namely Oracle DB and Sonic MQ) to coordinate.

2. We're working on I/O monitoring.

3. All of 40 threads are either JMS listeners or HTTP request handlers (Web UI). The Oracle DB is not particularly stressed and we do not correlate thread locks with database peak activity. As a result, limiting activity at higher level (JMS or HTTP) would be a pity to me.

Gérald


2012/10/3 Brett Wooldridge <[hidden email]>
Hi Gerald,

The disk journal, by it's nature is a sequential log of transactions, and is going to be bound largely by the performance of your disk I/O.  As implemented it is written sequentially, with a high-level lock guarding the write.  Hence the pile-up on the lock.  I haven't looked to see how or if concurrency could be increased, however you might try these tips:

1. If you really have only one DataSource, rather than distributed transactions, disable the disk journal.  In that scenario, the database's own transaction log is probably sufficient for recovery.  (Though if there is something I'm overlooking maybe Ludovic can weigh in).

2. If there are other things performing heavy I/O on the same disk (possibly the database), put the disk journal on another physical disk.

3. Sometimes decreasing concurrency can actually speed things up.  40 threads piled on the lock could also be an indication of over-concurrency.  Given that your CPU doesn't have 40 cores, or more importantly your I/O subsystem doesn't have 40 spindles, consider a structural change in your code.  This could improve performance whether there is a disk journal involved or not.  For example, can there be one (or few) writers performing SQL, while the other 40+ threads dump things into a work queue that they pull from?  In the case of my company, doing this in a few places removed a lot of contention at the *database* level (nevermind Bitronix) in terms of row locks, page locks, table locks, etc.

Just some thoughts.  I will try to take a look at the disk journal (TransactionLogAppender) to see if there's any easy wins there, but it may take me a few days to get around to it.

Regards,
Brett

On Wed, Oct 3, 2012 at 8:22 PM, Gérald Quintana <[hidden email]> wrote:
Hi all,

I am facing strong contention on the DiskJournal class lock (not XAPool as in the other mail thread) : sometimes I get more than 40 threads (out of 200) waiting for this lock. Is it normal? What could explain this symptom?
We tried to enlarge disk journal size, but it didn't help. What can we do to alleviate such contention?

Using BTM 2.1.1 / Java 6U10 / Windows Server 2003

Thanks for your help,
Gérald




Reply | Threaded
Open this post in threaded view
|

Re: DiskJournal lock

Gérald Quintana
In reply to this post by Brett Wooldridge-2
Hi Brett

Thanks for your help and support!

Most threads are blocked with the same stacktrace (read it backwards): transaction begin:
bitronix.tm.journal.DiskJournal.log(DiskJournal.java:95)
bitronix.tm.BitronixTransaction.setStatus(BitronixTransaction.java:334)
bitronix.tm.BitronixTransaction.setStatus(BitronixTransaction.java:323)
bitronix.tm.BitronixTransaction.setActive(BitronixTransaction.java:314)
bitronix.tm.BitronixTransactionManager.begin(BitronixTransactionManager.java:103)
org.springframework.transaction.jta.JtaTransactionManager.doJtaBegin(JtaTransactionManager.java:863)
org.springframework.transaction.jta.JtaTransactionManager.doBegin(JtaTransactionManager.java:820)
org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:371)
org.springframework.transaction.interceptor.TransactionAspectSupport.createTransactionIfNecessary(TransactionAspectSupport.java:335)
org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:105)

One of them is a transaction commit:
bitronix.tm.journal.DiskJournal.log(DiskJournal.java:95)
bitronix.tm.BitronixTransaction.setStatus(BitronixTransaction.java:334)
bitronix.tm.BitronixTransaction.setStatus(BitronixTransaction.java:323)
bitronix.tm.twopc.Preparer.prepare(Preparer.java:64)
bitronix.tm.BitronixTransaction.commit(BitronixTransaction.java:225)
bitronix.tm.BitronixTransactionManager.commit(BitronixTransactionManager.java:120)
org.springframework.transaction.jta.JtaTransactionManager.doCommit(JtaTransactionManager.java:1009)
org.springframework.transaction.support.AbstractPlatformTransactionManager.processCommit(AbstractPlatformTransactionManager.java:754)
org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:723)
org.springframework.transaction.interceptor.TransactionAspectSupport.commitTransactionAfterReturning(TransactionAspectSupport.java:393)
org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:120)

Other symptoms, they occur frequently but not systematically, at exactly the same moment, but I can't tell whether it is a cause or a consequence:
- JDBC connexions suddenly jumping from an average 50 to the maximum 100: 50 connexions allocated in the same second
- Full GC

Gérald

2012/10/4 Brett Wooldridge <[hidden email]>
Hi Gerald,

My point (obviously not applicable to you) was, some people (including me) use Bitronix in a "container managed transaction" environment -- where the application container controls the demarcation of transactions whether you are using one resource or ten.  Many users use Bitronix with only one resource (a single DB).

I/O monitoring data would be interesting.  Clearly there is disk I/O contention.  There is locking at several levels in Bitronix, so it would be really useful if you could provide a thread dump containing the blocked threads.  It would help identify the locks involved and where the "choke point" is.

Regards,
Brett


On Wed, Oct 3, 2012 at 10:58 PM, Gérald Quintana <[hidden email]> wrote:
Hi Brett,

Thanks for you answer, very interesting.

1. I don't understand your point, if I use Bitronix it's precisely because I have 2 resources (namely Oracle DB and Sonic MQ) to coordinate.

2. We're working on I/O monitoring.

3. All of 40 threads are either JMS listeners or HTTP request handlers (Web UI). The Oracle DB is not particularly stressed and we do not correlate thread locks with database peak activity. As a result, limiting activity at higher level (JMS or HTTP) would be a pity to me.

Gérald


2012/10/3 Brett Wooldridge <[hidden email]>
Hi Gerald,

The disk journal, by it's nature is a sequential log of transactions, and is going to be bound largely by the performance of your disk I/O.  As implemented it is written sequentially, with a high-level lock guarding the write.  Hence the pile-up on the lock.  I haven't looked to see how or if concurrency could be increased, however you might try these tips:

1. If you really have only one DataSource, rather than distributed transactions, disable the disk journal.  In that scenario, the database's own transaction log is probably sufficient for recovery.  (Though if there is something I'm overlooking maybe Ludovic can weigh in).

2. If there are other things performing heavy I/O on the same disk (possibly the database), put the disk journal on another physical disk.

3. Sometimes decreasing concurrency can actually speed things up.  40 threads piled on the lock could also be an indication of over-concurrency.  Given that your CPU doesn't have 40 cores, or more importantly your I/O subsystem doesn't have 40 spindles, consider a structural change in your code.  This could improve performance whether there is a disk journal involved or not.  For example, can there be one (or few) writers performing SQL, while the other 40+ threads dump things into a work queue that they pull from?  In the case of my company, doing this in a few places removed a lot of contention at the *database* level (nevermind Bitronix) in terms of row locks, page locks, table locks, etc.

Just some thoughts.  I will try to take a look at the disk journal (TransactionLogAppender) to see if there's any easy wins there, but it may take me a few days to get around to it.

Regards,
Brett

On Wed, Oct 3, 2012 at 8:22 PM, Gérald Quintana <[hidden email]> wrote:
Hi all,

I am facing strong contention on the DiskJournal class lock (not XAPool as in the other mail thread) : sometimes I get more than 40 threads (out of 200) waiting for this lock. Is it normal? What could explain this symptom?
We tried to enlarge disk journal size, but it didn't help. What can we do to alleviate such contention?

Using BTM 2.1.1 / Java 6U10 / Windows Server 2003

Thanks for your help,
Gérald




Reply | Threaded
Open this post in threaded view
|

Re: DiskJournal lock

Ludovic Orban-2
You can try enabling the 'filterLogStatus' config property. It roughly
cuts the disk journal's usage by half so it might help you.

The Disk Journal is a well-known bottleneck that is very hard to avoid
as it is mandatory for the TM to be able to log to permanent storage.
I've considered optimizing the existing code, using alternative
implementations, using alternative logging policies but so far I found
nothing satisfying.

--
Ludovic

On Thu, Oct 4, 2012 at 3:28 PM, Gérald Quintana
<[hidden email]> wrote:

> Hi Brett
>
> Thanks for your help and support!
>
> Most threads are blocked with the same stacktrace (read it backwards):
> transaction begin:
> bitronix.tm.journal.DiskJournal.log(DiskJournal.java:95)
> bitronix.tm.BitronixTransaction.setStatus(BitronixTransaction.java:334)
> bitronix.tm.BitronixTransaction.setStatus(BitronixTransaction.java:323)
> bitronix.tm.BitronixTransaction.setActive(BitronixTransaction.java:314)
> bitronix.tm.BitronixTransactionManager.begin(BitronixTransactionManager.java:103)
> org.springframework.transaction.jta.JtaTransactionManager.doJtaBegin(JtaTransactionManager.java:863)
> org.springframework.transaction.jta.JtaTransactionManager.doBegin(JtaTransactionManager.java:820)
> org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:371)
> org.springframework.transaction.interceptor.TransactionAspectSupport.createTransactionIfNecessary(TransactionAspectSupport.java:335)
> org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:105)
>
> One of them is a transaction commit:
> bitronix.tm.journal.DiskJournal.log(DiskJournal.java:95)
> bitronix.tm.BitronixTransaction.setStatus(BitronixTransaction.java:334)
> bitronix.tm.BitronixTransaction.setStatus(BitronixTransaction.java:323)
> bitronix.tm.twopc.Preparer.prepare(Preparer.java:64)
> bitronix.tm.BitronixTransaction.commit(BitronixTransaction.java:225)
> bitronix.tm.BitronixTransactionManager.commit(BitronixTransactionManager.java:120)
> org.springframework.transaction.jta.JtaTransactionManager.doCommit(JtaTransactionManager.java:1009)
> org.springframework.transaction.support.AbstractPlatformTransactionManager.processCommit(AbstractPlatformTransactionManager.java:754)
> org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:723)
> org.springframework.transaction.interceptor.TransactionAspectSupport.commitTransactionAfterReturning(TransactionAspectSupport.java:393)
> org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:120)
>
> Other symptoms, they occur frequently but not systematically, at exactly the
> same moment, but I can't tell whether it is a cause or a consequence:
> - JDBC connexions suddenly jumping from an average 50 to the maximum 100: 50
> connexions allocated in the same second
> - Full GC
>
> Gérald
>
>
> 2012/10/4 Brett Wooldridge <[hidden email]>
>>
>> Hi Gerald,
>>
>> My point (obviously not applicable to you) was, some people (including me)
>> use Bitronix in a "container managed transaction" environment -- where the
>> application container controls the demarcation of transactions whether you
>> are using one resource or ten.  Many users use Bitronix with only one
>> resource (a single DB).
>>
>> I/O monitoring data would be interesting.  Clearly there is disk I/O
>> contention.  There is locking at several levels in Bitronix, so it would be
>> really useful if you could provide a thread dump containing the blocked
>> threads.  It would help identify the locks involved and where the "choke
>> point" is.
>>
>> Regards,
>> Brett
>>
>>
>> On Wed, Oct 3, 2012 at 10:58 PM, Gérald Quintana
>> <[hidden email]> wrote:
>>>
>>> Hi Brett,
>>>
>>> Thanks for you answer, very interesting.
>>>
>>> 1. I don't understand your point, if I use Bitronix it's precisely
>>> because I have 2 resources (namely Oracle DB and Sonic MQ) to coordinate.
>>>
>>> 2. We're working on I/O monitoring.
>>>
>>> 3. All of 40 threads are either JMS listeners or HTTP request handlers
>>> (Web UI). The Oracle DB is not particularly stressed and we do not correlate
>>> thread locks with database peak activity. As a result, limiting activity at
>>> higher level (JMS or HTTP) would be a pity to me.
>>>
>>> Gérald
>>>
>>>
>>> 2012/10/3 Brett Wooldridge <[hidden email]>
>>>>
>>>> Hi Gerald,
>>>>
>>>> The disk journal, by it's nature is a sequential log of transactions,
>>>> and is going to be bound largely by the performance of your disk I/O.  As
>>>> implemented it is written sequentially, with a high-level lock guarding the
>>>> write.  Hence the pile-up on the lock.  I haven't looked to see how or if
>>>> concurrency could be increased, however you might try these tips:
>>>>
>>>> 1. If you really have only one DataSource, rather than distributed
>>>> transactions, disable the disk journal.  In that scenario, the database's
>>>> own transaction log is probably sufficient for recovery.  (Though if there
>>>> is something I'm overlooking maybe Ludovic can weigh in).
>>>>
>>>> 2. If there are other things performing heavy I/O on the same disk
>>>> (possibly the database), put the disk journal on another physical disk.
>>>>
>>>> 3. Sometimes decreasing concurrency can actually speed things up.  40
>>>> threads piled on the lock could also be an indication of over-concurrency.
>>>> Given that your CPU doesn't have 40 cores, or more importantly your I/O
>>>> subsystem doesn't have 40 spindles, consider a structural change in your
>>>> code.  This could improve performance whether there is a disk journal
>>>> involved or not.  For example, can there be one (or few) writers performing
>>>> SQL, while the other 40+ threads dump things into a work queue that they
>>>> pull from?  In the case of my company, doing this in a few places removed a
>>>> lot of contention at the *database* level (nevermind Bitronix) in terms of
>>>> row locks, page locks, table locks, etc.
>>>>
>>>> Just some thoughts.  I will try to take a look at the disk journal
>>>> (TransactionLogAppender) to see if there's any easy wins there, but it may
>>>> take me a few days to get around to it.
>>>>
>>>> Regards,
>>>> Brett
>>>>
>>>> On Wed, Oct 3, 2012 at 8:22 PM, Gérald Quintana
>>>> <[hidden email]> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I am facing strong contention on the DiskJournal class lock (not XAPool
>>>>> as in the other mail thread) : sometimes I get more than 40 threads (out of
>>>>> 200) waiting for this lock. Is it normal? What could explain this symptom?
>>>>> We tried to enlarge disk journal size, but it didn't help. What can we
>>>>> do to alleviate such contention?
>>>>>
>>>>> Using BTM 2.1.1 / Java 6U10 / Windows Server 2003
>>>>>
>>>>> Thanks for your help,
>>>>> Gérald
>>>>
>>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply | Threaded
Open this post in threaded view
|

Re: DiskJournal lock

Gérald Quintana
Obviously, changes were made in BTM since v2.1.1. Are improvements introduced in 2.1.2 and 2.1.3 relevant for my problem? Should I upgrade?

Gérald

2012/10/4 Ludovic Orban <[hidden email]>
You can try enabling the 'filterLogStatus' config property. It roughly
cuts the disk journal's usage by half so it might help you.

The Disk Journal is a well-known bottleneck that is very hard to avoid
as it is mandatory for the TM to be able to log to permanent storage.
I've considered optimizing the existing code, using alternative
implementations, using alternative logging policies but so far I found
nothing satisfying.

--
Ludovic

On Thu, Oct 4, 2012 at 3:28 PM, Gérald Quintana
<[hidden email]> wrote:
> Hi Brett
>
> Thanks for your help and support!
>
> Most threads are blocked with the same stacktrace (read it backwards):
> transaction begin:
> bitronix.tm.journal.DiskJournal.log(DiskJournal.java:95)
> bitronix.tm.BitronixTransaction.setStatus(BitronixTransaction.java:334)
> bitronix.tm.BitronixTransaction.setStatus(BitronixTransaction.java:323)
> bitronix.tm.BitronixTransaction.setActive(BitronixTransaction.java:314)
> bitronix.tm.BitronixTransactionManager.begin(BitronixTransactionManager.java:103)
> org.springframework.transaction.jta.JtaTransactionManager.doJtaBegin(JtaTransactionManager.java:863)
> org.springframework.transaction.jta.JtaTransactionManager.doBegin(JtaTransactionManager.java:820)
> org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:371)
> org.springframework.transaction.interceptor.TransactionAspectSupport.createTransactionIfNecessary(TransactionAspectSupport.java:335)
> org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:105)
>
> One of them is a transaction commit:
> bitronix.tm.journal.DiskJournal.log(DiskJournal.java:95)
> bitronix.tm.BitronixTransaction.setStatus(BitronixTransaction.java:334)
> bitronix.tm.BitronixTransaction.setStatus(BitronixTransaction.java:323)
> bitronix.tm.twopc.Preparer.prepare(Preparer.java:64)
> bitronix.tm.BitronixTransaction.commit(BitronixTransaction.java:225)
> bitronix.tm.BitronixTransactionManager.commit(BitronixTransactionManager.java:120)
> org.springframework.transaction.jta.JtaTransactionManager.doCommit(JtaTransactionManager.java:1009)
> org.springframework.transaction.support.AbstractPlatformTransactionManager.processCommit(AbstractPlatformTransactionManager.java:754)
> org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:723)
> org.springframework.transaction.interceptor.TransactionAspectSupport.commitTransactionAfterReturning(TransactionAspectSupport.java:393)
> org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:120)
>
> Other symptoms, they occur frequently but not systematically, at exactly the
> same moment, but I can't tell whether it is a cause or a consequence:
> - JDBC connexions suddenly jumping from an average 50 to the maximum 100: 50
> connexions allocated in the same second
> - Full GC
>
> Gérald
>
>
> 2012/10/4 Brett Wooldridge <[hidden email]>
>>
>> Hi Gerald,
>>
>> My point (obviously not applicable to you) was, some people (including me)
>> use Bitronix in a "container managed transaction" environment -- where the
>> application container controls the demarcation of transactions whether you
>> are using one resource or ten.  Many users use Bitronix with only one
>> resource (a single DB).
>>
>> I/O monitoring data would be interesting.  Clearly there is disk I/O
>> contention.  There is locking at several levels in Bitronix, so it would be
>> really useful if you could provide a thread dump containing the blocked
>> threads.  It would help identify the locks involved and where the "choke
>> point" is.
>>
>> Regards,
>> Brett
>>
>>
>> On Wed, Oct 3, 2012 at 10:58 PM, Gérald Quintana
>> <[hidden email]> wrote:
>>>
>>> Hi Brett,
>>>
>>> Thanks for you answer, very interesting.
>>>
>>> 1. I don't understand your point, if I use Bitronix it's precisely
>>> because I have 2 resources (namely Oracle DB and Sonic MQ) to coordinate.
>>>
>>> 2. We're working on I/O monitoring.
>>>
>>> 3. All of 40 threads are either JMS listeners or HTTP request handlers
>>> (Web UI). The Oracle DB is not particularly stressed and we do not correlate
>>> thread locks with database peak activity. As a result, limiting activity at
>>> higher level (JMS or HTTP) would be a pity to me.
>>>
>>> Gérald
>>>
>>>
>>> 2012/10/3 Brett Wooldridge <[hidden email]>
>>>>
>>>> Hi Gerald,
>>>>
>>>> The disk journal, by it's nature is a sequential log of transactions,
>>>> and is going to be bound largely by the performance of your disk I/O.  As
>>>> implemented it is written sequentially, with a high-level lock guarding the
>>>> write.  Hence the pile-up on the lock.  I haven't looked to see how or if
>>>> concurrency could be increased, however you might try these tips:
>>>>
>>>> 1. If you really have only one DataSource, rather than distributed
>>>> transactions, disable the disk journal.  In that scenario, the database's
>>>> own transaction log is probably sufficient for recovery.  (Though if there
>>>> is something I'm overlooking maybe Ludovic can weigh in).
>>>>
>>>> 2. If there are other things performing heavy I/O on the same disk
>>>> (possibly the database), put the disk journal on another physical disk.
>>>>
>>>> 3. Sometimes decreasing concurrency can actually speed things up.  40
>>>> threads piled on the lock could also be an indication of over-concurrency.
>>>> Given that your CPU doesn't have 40 cores, or more importantly your I/O
>>>> subsystem doesn't have 40 spindles, consider a structural change in your
>>>> code.  This could improve performance whether there is a disk journal
>>>> involved or not.  For example, can there be one (or few) writers performing
>>>> SQL, while the other 40+ threads dump things into a work queue that they
>>>> pull from?  In the case of my company, doing this in a few places removed a
>>>> lot of contention at the *database* level (nevermind Bitronix) in terms of
>>>> row locks, page locks, table locks, etc.
>>>>
>>>> Just some thoughts.  I will try to take a look at the disk journal
>>>> (TransactionLogAppender) to see if there's any easy wins there, but it may
>>>> take me a few days to get around to it.
>>>>
>>>> Regards,
>>>> Brett
>>>>
>>>> On Wed, Oct 3, 2012 at 8:22 PM, Gérald Quintana
>>>> <[hidden email]> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I am facing strong contention on the DiskJournal class lock (not XAPool
>>>>> as in the other mail thread) : sometimes I get more than 40 threads (out of
>>>>> 200) waiting for this lock. Is it normal? What could explain this symptom?
>>>>> We tried to enlarge disk journal size, but it didn't help. What can we
>>>>> do to alleviate such contention?
>>>>>
>>>>> Using BTM 2.1.1 / Java 6U10 / Windows Server 2003
>>>>>
>>>>> Thanks for your help,
>>>>> Gérald
>>>>
>>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Reply | Threaded
Open this post in threaded view
|

Re: DiskJournal lock

Brett Wooldridge-2
Hi Gerald,

I think the changes in 2.1.2 and 2.1.3 are unlikely to have an effect on your specific problem.  Having said that, it's always a good idea to keep current with versions, as overall performance generally improves.

Ludovic is right, that basically what you are seeing is disk journal related.

I'll touch in several things here, first your comment about your connections jumping up.  If you are not already, I suggest setting shareTransactionConnections=true.  This increases connection re-use and could possibly help reduce the spikes you are seeing.

I spent today analysing the Disk Journal code and trying to improve performance.  There are two basic bottlenecks here:

1. Insertions into the log are sequential (serialized), and given that it is disk intensive, you may be running into an I/O bottleneck.

2. Second, there is a condition that occurs when the current transaction journal fills up.  In a nutshell, Bitronix has two journals that are alternated.  When the current journal file reaches capacity, it is scanned from beginning to end, and incomplete (in-flight) transaction logs are copied into the alternate journal.  This is done inside of the disk journal lock, blocking all writer threads, and on top of that it is relatively time consuming (up to several seconds depending on the maximum disk journal file size).  Once this process is complete, the alternate journal becomes the current journal, and the old current journal becomes the alternate.

I made several improvements today in a separately maintained fork here.  If you are brave, you can give it a try.  It passes the test suite, and I added some additional tests.

Here is a summary of the improvements:

As I mentioned, the current implementation serializes all journal writes, so multiple threads running looks something like this (where T is time):

T1: (log 1) Thread 1
T2: (log 2) Thread 2
T3: (log 3) Thread 3
T4: (log 4) Thread 1

The new implementation is "lock-free" (see note below) and allows multiple threads to write to the journal at the same time:

T1: (log 1) Thread 1
      (log 2) Thread 2
      (log 3) Thread 3
T2: (log 4) Thread 1

Note, order is still enforced.  I put lock-free in quotes, because there is still one condition under which threads are blocked.  When the transaction journal rolls over (reaches capacity), writes are blocked while in-flight transaction logs are moved from the current journal to the alternate.  My gut feeling is, in general performance will be greatly improved in a heavily multithreaded environment.  Up to a point.  The I/O subsystem may still come into play as a bottleneck, however, since the concurrent writes to the log are generally occurring on the same disk page the OS (or the disk controller itself) is likely to combine the write operations.

Now, about that expensive transaction log copy operation which still blocks writers.  I changed to code that performs the copy to use Java NIO to read the data as 4K pages from disk, rather than peppering the I/O subsystem with thousands of individual read() calls.  The result of this change is that the cutover time from current journal to alternate journal is nearly cut in half.

I have not been able to test the journal changes in a heavily multithreaded environment yet, but as I said above, I would expect performance to be greatly improved.

The current design of how the disk journal works has the merit of being fairly straight forward code-wise, however it should be revisited.  There are some existing patterns out there for the kind of "garbage collection" that needs to take place in a transaction log which should be examined.

It is possible that even with the improvements above, when the cutover from one journal to another occurs you can still see some bottlenecking and possible connection spikes.  If you are indeed brave enough to try the experimental improvements I would love to hear the results.

Best Regards,
Brett

On Thu, Oct 4, 2012 at 11:02 PM, Gérald Quintana <[hidden email]> wrote:
Obviously, changes were made in BTM since v2.1.1. Are improvements introduced in 2.1.2 and 2.1.3 relevant for my problem? Should I upgrade?

Gérald


2012/10/4 Ludovic Orban <[hidden email]>
You can try enabling the 'filterLogStatus' config property. It roughly
cuts the disk journal's usage by half so it might help you.

The Disk Journal is a well-known bottleneck that is very hard to avoid
as it is mandatory for the TM to be able to log to permanent storage.
I've considered optimizing the existing code, using alternative
implementations, using alternative logging policies but so far I found
nothing satisfying.

--
Ludovic
Reply | Threaded
Open this post in threaded view
|

Re: DiskJournal lock

Gérald Quintana
Hi Brett,

Thank you very very very much for you work, your replies... I really appreciate.

About shareTransactionConnections=true, correct me if I am wrong, but I think that using Hibernate (and Spring) on top of BTM datasource is enough to avoid creating multiple connections for a single transaction.

About your work on v2.2, we don't know how to reproduce our problem so I can't test your patch easily and I don't dare put it in production immediately. By the way, if 2.2 release is ready by the end of the year, we might include in the next major release. For the time being, I'll stay on 2.1.x releases.

My first change (attemp?) will be to enable logFilter (Ludovic's suggestion) and some GC logging/tuning options. I'm just wondering why logFilter/sharedConnections is not enabled by default. Right now I have many things to test: reducing thread pool size, upgrading to BTM 2.1.3, moving journal to a another disk, disabling journal forced writes, and so on. I'll keep you informed of the result.

About the disk journal swapping, I could enable the DiskJournal and see whether there is link between thread lock and file swap.

Thanks again,
Gérald


2012/10/5 Brett Wooldridge <[hidden email]>
Hi Gerald,

I think the changes in 2.1.2 and 2.1.3 are unlikely to have an effect on your specific problem.  Having said that, it's always a good idea to keep current with versions, as overall performance generally improves.

Ludovic is right, that basically what you are seeing is disk journal related.

I'll touch in several things here, first your comment about your connections jumping up.  If you are not already, I suggest setting shareTransactionConnections=true.  This increases connection re-use and could possibly help reduce the spikes you are seeing.

I spent today analysing the Disk Journal code and trying to improve performance.  There are two basic bottlenecks here:

1. Insertions into the log are sequential (serialized), and given that it is disk intensive, you may be running into an I/O bottleneck.

2. Second, there is a condition that occurs when the current transaction journal fills up.  In a nutshell, Bitronix has two journals that are alternated.  When the current journal file reaches capacity, it is scanned from beginning to end, and incomplete (in-flight) transaction logs are copied into the alternate journal.  This is done inside of the disk journal lock, blocking all writer threads, and on top of that it is relatively time consuming (up to several seconds depending on the maximum disk journal file size).  Once this process is complete, the alternate journal becomes the current journal, and the old current journal becomes the alternate.

I made several improvements today in a separately maintained fork here.  If you are brave, you can give it a try.  It passes the test suite, and I added some additional tests.

Here is a summary of the improvements:

As I mentioned, the current implementation serializes all journal writes, so multiple threads running looks something like this (where T is time):

T1: (log 1) Thread 1
T2: (log 2) Thread 2
T3: (log 3) Thread 3
T4: (log 4) Thread 1

The new implementation is "lock-free" (see note below) and allows multiple threads to write to the journal at the same time:

T1: (log 1) Thread 1
      (log 2) Thread 2
      (log 3) Thread 3
T2: (log 4) Thread 1

Note, order is still enforced.  I put lock-free in quotes, because there is still one condition under which threads are blocked.  When the transaction journal rolls over (reaches capacity), writes are blocked while in-flight transaction logs are moved from the current journal to the alternate.  My gut feeling is, in general performance will be greatly improved in a heavily multithreaded environment.  Up to a point.  The I/O subsystem may still come into play as a bottleneck, however, since the concurrent writes to the log are generally occurring on the same disk page the OS (or the disk controller itself) is likely to combine the write operations.

Now, about that expensive transaction log copy operation which still blocks writers.  I changed to code that performs the copy to use Java NIO to read the data as 4K pages from disk, rather than peppering the I/O subsystem with thousands of individual read() calls.  The result of this change is that the cutover time from current journal to alternate journal is nearly cut in half.

I have not been able to test the journal changes in a heavily multithreaded environment yet, but as I said above, I would expect performance to be greatly improved.

The current design of how the disk journal works has the merit of being fairly straight forward code-wise, however it should be revisited.  There are some existing patterns out there for the kind of "garbage collection" that needs to take place in a transaction log which should be examined.

It is possible that even with the improvements above, when the cutover from one journal to another occurs you can still see some bottlenecking and possible connection spikes.  If you are indeed brave enough to try the experimental improvements I would love to hear the results.

Best Regards,
Brett

On Thu, Oct 4, 2012 at 11:02 PM, Gérald Quintana <[hidden email]> wrote:
Obviously, changes were made in BTM since v2.1.1. Are improvements introduced in 2.1.2 and 2.1.3 relevant for my problem? Should I upgrade?

Gérald


2012/10/4 Ludovic Orban <[hidden email]>
You can try enabling the 'filterLogStatus' config property. It roughly
cuts the disk journal's usage by half so it might help you.

The Disk Journal is a well-known bottleneck that is very hard to avoid
as it is mandatory for the TM to be able to log to permanent storage.
I've considered optimizing the existing code, using alternative
implementations, using alternative logging policies but so far I found
nothing satisfying.

--
Ludovic

Reply | Threaded
Open this post in threaded view
|

Re: DiskJournal lock

Brett Wooldridge-2
On Fri, Oct 5, 2012 at 9:54 PM, Gérald Quintana <[hidden email]> wrote:
Hi Brett,

Thank you very very very much for you work, your replies... I really appreciate.

About shareTransactionConnections=true, correct me if I am wrong, but I think that using Hibernate (and Spring) on top of BTM datasource is enough to avoid creating multiple connections for a single transaction.

I'm not sure either of those alone is enough.  I seem to remember getting multiple connections per transaction when using Hibernate.  I could be wrong.  In general, *without* connection sharing enabled, the following code:

tx.begin();

Connection conn1 = ds.getConnection();
Statement stmt = conn1.createStatement(...);  // causes transaction enlistment

Connection conn2 = ds.getConnection();

tx.commit();

Will result in two connections.

My first change (attemp?) will be to enable logFilter (Ludovic's suggestion) and some GC logging/tuning options. I'm just wondering why logFilter/sharedConnections is not enabled by default. Right now I have many things to test: reducing thread pool size, upgrading to BTM 2.1.3, moving journal to a another disk, disabling journal forced writes, and so on. I'll keep you informed of the result.

Basically, the approach has been to not surprise existing users when a new feature is added (like sharedConnections in the past).  Because the prior versions did not share connections, the default in the new version was not to share connections.  I think it is useful enough that it should become the default.  Maybe we should have a policy of the "next" release after a new feature is added, it defaults to disabled, but the release after that enabled becomes the default (or something).  Anyway, I'd turn it on, it cut our connection pool from growing to hundreds of connections under load to several dozen.

Brett

Reply | Threaded
Open this post in threaded view
|

Re: DiskJournal lock

Brett Wooldridge-2
Gerald, Ludovic, et. al

I've spent the weekend and the last few days going over the disk journal code, profiling, testing, and profiling some more.  I think the performance improvements are notable.  Even in the absence of these changes, Gerald, I think you will see 'smoother' behavior from the existing (2.1.x) code-base if you set your disk journal to the minimum size (1MB).  I strongly suspect that the connection spike you are seeing are caused when the disk journal ("transaction log") rolls over.  In the 2.1.x code-based on my machine, a 1MB journal takes about 600ms to roll over.  In a loaded server, this is enough to cause a back-up of connection requests and commit requests.  If the journal is set to a larger size, say 8MB or larger, the pauses will be longer.

In the code checked into my 2.2. branch, a "roll" of a 1MB transaction log now takes ~28ms compared to the former 600ms.

There are quite a few other throughput improvements, and I have put together a technical analysis/description here:


Currently these changes are in my "private" Bitronix repo, but I am in the process of merging them into the 2.2 branch.

I have added additional unit tests, and over the last few days I've run several dozen million transactions through the log code, so my confidence is fairly high that it is correct.

Best Regards,
Brett

Reply | Threaded
Open this post in threaded view
|

Re: DiskJournal lock

Ludovic Orban-2
Brett,

That's an impressive improvement you made to the disk journal, and all
without touching its on-disk format nor its general architecture!
That's just awesome, congratulations and many thanks for this
contribution!

I didn't look at the code but the detailed explanations clearly prove
you've thought about all the important use-cases and I believe this
new disk journal should provide the same guarantees as the old one.
I'm very confident about the quality of your new journal.

This definitely needs to be included into the official version, as
well as your XAPool concurrency changes. The detailed improvements
pages you created for both should be added to the official
documentation as well IMO.

Actually, the only bit of the 2.2 branch I don't like are the new
separated JDBC 3/4 modules, but we certainly can come to an agreement
on that subject. I believe we should seriously discuss merging the 2.1
and 2.2 branches together and prepare for the next release. Looking at
the amount of improvements, 2.5 should be the new version number.

Thanks,
Ludovic

On Wed, Oct 10, 2012 at 2:50 PM, Brett Wooldridge
<[hidden email]> wrote:

> Gerald, Ludovic, et. al
>
> I've spent the weekend and the last few days going over the disk journal
> code, profiling, testing, and profiling some more.  I think the performance
> improvements are notable.  Even in the absence of these changes, Gerald, I
> think you will see 'smoother' behavior from the existing (2.1.x) code-base
> if you set your disk journal to the minimum size (1MB).  I strongly suspect
> that the connection spike you are seeing are caused when the disk journal
> ("transaction log") rolls over.  In the 2.1.x code-based on my machine, a
> 1MB journal takes about 600ms to roll over.  In a loaded server, this is
> enough to cause a back-up of connection requests and commit requests.  If
> the journal is set to a larger size, say 8MB or larger, the pauses will be
> longer.
>
> In the code checked into my 2.2. branch, a "roll" of a 1MB transaction log
> now takes ~28ms compared to the former 600ms.
>
> There are quite a few other throughput improvements, and I have put together
> a technical analysis/description here:
>
> https://github.com/brettwooldridge/bitronix-hp/wiki/Disk-Journal-Improvements
>
> Currently these changes are in my "private" Bitronix repo, but I am in the
> process of merging them into the 2.2 branch.
>
> I have added additional unit tests, and over the last few days I've run
> several dozen million transactions through the log code, so my confidence is
> fairly high that it is correct.
>
> Best Regards,
> Brett
>

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply | Threaded
Open this post in threaded view
|

Re: DiskJournal lock

Brett Wooldridge-2
Ludovic,

It seems more prudent to look at increasing the existing journal throughput and eliminating whatever bottlenecks before considering an entirely new journal structure.  Something like a tree-based, multi-segment journal like some databases use seemed like a huge amount of work for unknown gains.  I too am quite pleased with the performance we could squeeze out of the existing code with relatively small changes.

With respect to the JDBC4 changes.  There's two way to go.  One is, as you suggested, dropping Java 5 support.  This simplifies the project structure and the build.  However, following your earlier objections to the new JDBC4 changes, I modified the maven build (several months back) to merge the JDBC4 delegate classes into the main Bitronix jar.  So, while there is minor build-time complexity (in jar creation), the deployment artifacts are the same was 2.1.x.  If you wish to keep this structure, that's fine too.  If you want to stop Java 5 support at 2.1.x, and support Java 6+ going forward, I can make the attendant build changes and move the JDBC4 classes into the main btm module.  I fine with either.

I have been keeping the codehaus-hosted 2.2 branch up-to-date merging in any changes made to the 'master' over the past year.  With the exception of these latest log changes.  So, I think a merge of the 2.2 into master should be straightforward and mostly if not entirely conflict-free.

Regards,
Brett

On Wed, Oct 10, 2012 at 10:16 PM, Ludovic Orban <[hidden email]> wrote:
Brett,

That's an impressive improvement you made to the disk journal, and all
without touching its on-disk format nor its general architecture!
That's just awesome, congratulations and many thanks for this
contribution!

I didn't look at the code but the detailed explanations clearly prove
you've thought about all the important use-cases and I believe this
new disk journal should provide the same guarantees as the old one.
I'm very confident about the quality of your new journal.

This definitely needs to be included into the official version, as
well as your XAPool concurrency changes. The detailed improvements
pages you created for both should be added to the official
documentation as well IMO.

Actually, the only bit of the 2.2 branch I don't like are the new
separated JDBC 3/4 modules, but we certainly can come to an agreement
on that subject. I believe we should seriously discuss merging the 2.1
and 2.2 branches together and prepare for the next release. Looking at
the amount of improvements, 2.5 should be the new version number.

Thanks,
Ludovic

On Wed, Oct 10, 2012 at 2:50 PM, Brett Wooldridge
<[hidden email]> wrote:
> Gerald, Ludovic, et. al
>
> I've spent the weekend and the last few days going over the disk journal
> code, profiling, testing, and profiling some more.  I think the performance
> improvements are notable.  Even in the absence of these changes, Gerald, I
> think you will see 'smoother' behavior from the existing (2.1.x) code-base
> if you set your disk journal to the minimum size (1MB).  I strongly suspect
> that the connection spike you are seeing are caused when the disk journal
> ("transaction log") rolls over.  In the 2.1.x code-based on my machine, a
> 1MB journal takes about 600ms to roll over.  In a loaded server, this is
> enough to cause a back-up of connection requests and commit requests.  If
> the journal is set to a larger size, say 8MB or larger, the pauses will be
> longer.
>
> In the code checked into my 2.2. branch, a "roll" of a 1MB transaction log
> now takes ~28ms compared to the former 600ms.
>
> There are quite a few other throughput improvements, and I have put together
> a technical analysis/description here:
>
> https://github.com/brettwooldridge/bitronix-hp/wiki/Disk-Journal-Improvements
>
> Currently these changes are in my "private" Bitronix repo, but I am in the
> process of merging them into the 2.2 branch.
>
> I have added additional unit tests, and over the last few days I've run
> several dozen million transactions through the log code, so my confidence is
> fairly high that it is correct.
>
> Best Regards,
> Brett
>

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Reply | Threaded
Open this post in threaded view
|

Re: DiskJournal lock

Gérald Quintana
Hi Brett, Ludovic and all,

These are very good news, many thanks to Brett for his amazing work on DiskJournal, I am eager to try it. About JDBC, will you support Java 7/JDBC 4.1?

We tested the filterLog option in production and it doesn't help: All threads are locked on commit (instead of begin) now. From what I can see (and understand), here is what occurs:
  1. JDBC pool size contains roughly 40 connections under normal activity
  2. 20 threads are blocked on DiskJournal.log
  3. Tomcat starts 10 new threads
  4. ... which open 10 new connections (since other are blocked)
  5. These 10 new threads are blocked on DiskJournal as well
  6. Go to point 3 unless 100 connections are opened (pool max size) or DiskJournal lock is freed
Gérald

2012/10/10 Brett Wooldridge <[hidden email]>
Ludovic,

It seems more prudent to look at increasing the existing journal throughput and eliminating whatever bottlenecks before considering an entirely new journal structure.  Something like a tree-based, multi-segment journal like some databases use seemed like a huge amount of work for unknown gains.  I too am quite pleased with the performance we could squeeze out of the existing code with relatively small changes.

With respect to the JDBC4 changes.  There's two way to go.  One is, as you suggested, dropping Java 5 support.  This simplifies the project structure and the build.  However, following your earlier objections to the new JDBC4 changes, I modified the maven build (several months back) to merge the JDBC4 delegate classes into the main Bitronix jar.  So, while there is minor build-time complexity (in jar creation), the deployment artifacts are the same was 2.1.x.  If you wish to keep this structure, that's fine too.  If you want to stop Java 5 support at 2.1.x, and support Java 6+ going forward, I can make the attendant build changes and move the JDBC4 classes into the main btm module.  I fine with either.

I have been keeping the codehaus-hosted 2.2 branch up-to-date merging in any changes made to the 'master' over the past year.  With the exception of these latest log changes.  So, I think a merge of the 2.2 into master should be straightforward and mostly if not entirely conflict-free.

Regards,
Brett

On Wed, Oct 10, 2012 at 10:16 PM, Ludovic Orban <[hidden email]> wrote:
Brett,

That's an impressive improvement you made to the disk journal, and all
without touching its on-disk format nor its general architecture!
That's just awesome, congratulations and many thanks for this
contribution!

I didn't look at the code but the detailed explanations clearly prove
you've thought about all the important use-cases and I believe this
new disk journal should provide the same guarantees as the old one.
I'm very confident about the quality of your new journal.

This definitely needs to be included into the official version, as
well as your XAPool concurrency changes. The detailed improvements
pages you created for both should be added to the official
documentation as well IMO.

Actually, the only bit of the 2.2 branch I don't like are the new
separated JDBC 3/4 modules, but we certainly can come to an agreement
on that subject. I believe we should seriously discuss merging the 2.1
and 2.2 branches together and prepare for the next release. Looking at
the amount of improvements, 2.5 should be the new version number.

Thanks,
Ludovic

On Wed, Oct 10, 2012 at 2:50 PM, Brett Wooldridge
<[hidden email]> wrote:
> Gerald, Ludovic, et. al
>
> I've spent the weekend and the last few days going over the disk journal
> code, profiling, testing, and profiling some more.  I think the performance
> improvements are notable.  Even in the absence of these changes, Gerald, I
> think you will see 'smoother' behavior from the existing (2.1.x) code-base
> if you set your disk journal to the minimum size (1MB).  I strongly suspect
> that the connection spike you are seeing are caused when the disk journal
> ("transaction log") rolls over.  In the 2.1.x code-based on my machine, a
> 1MB journal takes about 600ms to roll over.  In a loaded server, this is
> enough to cause a back-up of connection requests and commit requests.  If
> the journal is set to a larger size, say 8MB or larger, the pauses will be
> longer.
>
> In the code checked into my 2.2. branch, a "roll" of a 1MB transaction log
> now takes ~28ms compared to the former 600ms.
>
> There are quite a few other throughput improvements, and I have put together
> a technical analysis/description here:
>
> https://github.com/brettwooldridge/bitronix-hp/wiki/Disk-Journal-Improvements
>
> Currently these changes are in my "private" Bitronix repo, but I am in the
> process of merging them into the 2.2 branch.
>
> I have added additional unit tests, and over the last few days I've run
> several dozen million transactions through the log code, so my confidence is
> fairly high that it is correct.
>
> Best Regards,
> Brett
>

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email




Reply | Threaded
Open this post in threaded view
|

Re: DiskJournal lock

Brett Wooldridge-2
Gerald,

One thing I found with the disk journal is that "rollover" is relatively time-consuming.  If the journal file is specified to be large (several megabytes), this rollover maybe be causing the backlog.  I'm hoping you've set your journal to be large, so that you can now shrink it. :-)  I would recommend setting the journal to the minimum (1MB).  This will rollover more frequently, but the rollover time should be less -- which might "smooth out" the connection spikes you're seeing.  But if you're journal size is already set to 1MB, unfortunately I don't currently have any other ideas.

Regarding JDBC 4.1, I haven't looked at it so I can't really say when or if we will add support for it.

Regards,
Brett


On Thu, Oct 11, 2012 at 5:38 PM, Gérald Quintana <[hidden email]> wrote:
Hi Brett, Ludovic and all,

These are very good news, many thanks to Brett for his amazing work on DiskJournal, I am eager to try it. About JDBC, will you support Java 7/JDBC 4.1?

We tested the filterLog option in production and it doesn't help: All threads are locked on commit (instead of begin) now. From what I can see (and understand), here is what occurs:
  1. JDBC pool size contains roughly 40 connections under normal activity
  2. 20 threads are blocked on DiskJournal.log
  3. Tomcat starts 10 new threads
  4. ... which open 10 new connections (since other are blocked)
  5. These 10 new threads are blocked on DiskJournal as well
  6. Go to point 3 unless 100 connections are opened (pool max size) or DiskJournal lock is freed
Gérald


2012/10/10 Brett Wooldridge <[hidden email]>
Ludovic,

It seems more prudent to look at increasing the existing journal throughput and eliminating whatever bottlenecks before considering an entirely new journal structure.  Something like a tree-based, multi-segment journal like some databases use seemed like a huge amount of work for unknown gains.  I too am quite pleased with the performance we could squeeze out of the existing code with relatively small changes.

With respect to the JDBC4 changes.  There's two way to go.  One is, as you suggested, dropping Java 5 support.  This simplifies the project structure and the build.  However, following your earlier objections to the new JDBC4 changes, I modified the maven build (several months back) to merge the JDBC4 delegate classes into the main Bitronix jar.  So, while there is minor build-time complexity (in jar creation), the deployment artifacts are the same was 2.1.x.  If you wish to keep this structure, that's fine too.  If you want to stop Java 5 support at 2.1.x, and support Java 6+ going forward, I can make the attendant build changes and move the JDBC4 classes into the main btm module.  I fine with either.

I have been keeping the codehaus-hosted 2.2 branch up-to-date merging in any changes made to the 'master' over the past year.  With the exception of these latest log changes.  So, I think a merge of the 2.2 into master should be straightforward and mostly if not entirely conflict-free.

Regards,
Brett

On Wed, Oct 10, 2012 at 10:16 PM, Ludovic Orban <[hidden email]> wrote:
Brett,

That's an impressive improvement you made to the disk journal, and all
without touching its on-disk format nor its general architecture!
That's just awesome, congratulations and many thanks for this
contribution!

I didn't look at the code but the detailed explanations clearly prove
you've thought about all the important use-cases and I believe this
new disk journal should provide the same guarantees as the old one.
I'm very confident about the quality of your new journal.

This definitely needs to be included into the official version, as
well as your XAPool concurrency changes. The detailed improvements
pages you created for both should be added to the official
documentation as well IMO.

Actually, the only bit of the 2.2 branch I don't like are the new
separated JDBC 3/4 modules, but we certainly can come to an agreement
on that subject. I believe we should seriously discuss merging the 2.1
and 2.2 branches together and prepare for the next release. Looking at
the amount of improvements, 2.5 should be the new version number.

Thanks,
Ludovic

On Wed, Oct 10, 2012 at 2:50 PM, Brett Wooldridge
<[hidden email]> wrote:
> Gerald, Ludovic, et. al
>
> I've spent the weekend and the last few days going over the disk journal
> code, profiling, testing, and profiling some more.  I think the performance
> improvements are notable.  Even in the absence of these changes, Gerald, I
> think you will see 'smoother' behavior from the existing (2.1.x) code-base
> if you set your disk journal to the minimum size (1MB).  I strongly suspect
> that the connection spike you are seeing are caused when the disk journal
> ("transaction log") rolls over.  In the 2.1.x code-based on my machine, a
> 1MB journal takes about 600ms to roll over.  In a loaded server, this is
> enough to cause a back-up of connection requests and commit requests.  If
> the journal is set to a larger size, say 8MB or larger, the pauses will be
> longer.
>
> In the code checked into my 2.2. branch, a "roll" of a 1MB transaction log
> now takes ~28ms compared to the former 600ms.
>
> There are quite a few other throughput improvements, and I have put together
> a technical analysis/description here:
>
> https://github.com/brettwooldridge/bitronix-hp/wiki/Disk-Journal-Improvements
>
> Currently these changes are in my "private" Bitronix repo, but I am in the
> process of merging them into the 2.2 branch.
>
> I have added additional unit tests, and over the last few days I've run
> several dozen million transactions through the log code, so my confidence is
> fairly high that it is correct.
>
> Best Regards,
> Brett
>

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email





Reply | Threaded
Open this post in threaded view
|

Re: DiskJournal lock

Ludovic Orban-2
Brett,

Regarding the rollover performance of the improved journal: I think we
could keep the state of the active transactions in a map and iterate
over that map rather than re-reading the journal from the disk when a
rollover is needed. That should boost the rollover perf a bit more,
shouldn't that?

WDYT?

Thanks,
Ludovic

On Thu, Oct 11, 2012 at 1:27 PM, Brett Wooldridge
<[hidden email]> wrote:

> Gerald,
>
> One thing I found with the disk journal is that "rollover" is relatively
> time-consuming.  If the journal file is specified to be large (several
> megabytes), this rollover maybe be causing the backlog.  I'm hoping you've
> set your journal to be large, so that you can now shrink it. :-)  I would
> recommend setting the journal to the minimum (1MB).  This will rollover more
> frequently, but the rollover time should be less -- which might "smooth out"
> the connection spikes you're seeing.  But if you're journal size is already
> set to 1MB, unfortunately I don't currently have any other ideas.
>
> Regarding JDBC 4.1, I haven't looked at it so I can't really say when or if
> we will add support for it.
>
> Regards,
> Brett
>
>
> On Thu, Oct 11, 2012 at 5:38 PM, Gérald Quintana <[hidden email]>
> wrote:
>>
>> Hi Brett, Ludovic and all,
>>
>> These are very good news, many thanks to Brett for his amazing work on
>> DiskJournal, I am eager to try it. About JDBC, will you support Java 7/JDBC
>> 4.1?
>>
>> We tested the filterLog option in production and it doesn't help: All
>> threads are locked on commit (instead of begin) now. From what I can see
>> (and understand), here is what occurs:
>>
>> JDBC pool size contains roughly 40 connections under normal activity
>> 20 threads are blocked on DiskJournal.log
>> Tomcat starts 10 new threads
>> ... which open 10 new connections (since other are blocked)
>> These 10 new threads are blocked on DiskJournal as well
>> Go to point 3 unless 100 connections are opened (pool max size) or
>> DiskJournal lock is freed
>>
>> Gérald
>>
>>
>> 2012/10/10 Brett Wooldridge <[hidden email]>
>>>
>>> Ludovic,
>>>
>>> It seems more prudent to look at increasing the existing journal
>>> throughput and eliminating whatever bottlenecks before considering an
>>> entirely new journal structure.  Something like a tree-based, multi-segment
>>> journal like some databases use seemed like a huge amount of work for
>>> unknown gains.  I too am quite pleased with the performance we could squeeze
>>> out of the existing code with relatively small changes.
>>>
>>> With respect to the JDBC4 changes.  There's two way to go.  One is, as
>>> you suggested, dropping Java 5 support.  This simplifies the project
>>> structure and the build.  However, following your earlier objections to the
>>> new JDBC4 changes, I modified the maven build (several months back) to merge
>>> the JDBC4 delegate classes into the main Bitronix jar.  So, while there is
>>> minor build-time complexity (in jar creation), the deployment artifacts are
>>> the same was 2.1.x.  If you wish to keep this structure, that's fine too.
>>> If you want to stop Java 5 support at 2.1.x, and support Java 6+ going
>>> forward, I can make the attendant build changes and move the JDBC4 classes
>>> into the main btm module.  I fine with either.
>>>
>>> I have been keeping the codehaus-hosted 2.2 branch up-to-date merging in
>>> any changes made to the 'master' over the past year.  With the exception of
>>> these latest log changes.  So, I think a merge of the 2.2 into master should
>>> be straightforward and mostly if not entirely conflict-free.
>>>
>>> Regards,
>>> Brett
>>>
>>> On Wed, Oct 10, 2012 at 10:16 PM, Ludovic Orban <[hidden email]>
>>> wrote:
>>>>
>>>> Brett,
>>>>
>>>> That's an impressive improvement you made to the disk journal, and all
>>>> without touching its on-disk format nor its general architecture!
>>>> That's just awesome, congratulations and many thanks for this
>>>> contribution!
>>>>
>>>> I didn't look at the code but the detailed explanations clearly prove
>>>> you've thought about all the important use-cases and I believe this
>>>> new disk journal should provide the same guarantees as the old one.
>>>> I'm very confident about the quality of your new journal.
>>>>
>>>> This definitely needs to be included into the official version, as
>>>> well as your XAPool concurrency changes. The detailed improvements
>>>> pages you created for both should be added to the official
>>>> documentation as well IMO.
>>>>
>>>> Actually, the only bit of the 2.2 branch I don't like are the new
>>>> separated JDBC 3/4 modules, but we certainly can come to an agreement
>>>> on that subject. I believe we should seriously discuss merging the 2.1
>>>> and 2.2 branches together and prepare for the next release. Looking at
>>>> the amount of improvements, 2.5 should be the new version number.
>>>>
>>>> Thanks,
>>>> Ludovic
>>>>
>>>> On Wed, Oct 10, 2012 at 2:50 PM, Brett Wooldridge
>>>> <[hidden email]> wrote:
>>>> > Gerald, Ludovic, et. al
>>>> >
>>>> > I've spent the weekend and the last few days going over the disk
>>>> > journal
>>>> > code, profiling, testing, and profiling some more.  I think the
>>>> > performance
>>>> > improvements are notable.  Even in the absence of these changes,
>>>> > Gerald, I
>>>> > think you will see 'smoother' behavior from the existing (2.1.x)
>>>> > code-base
>>>> > if you set your disk journal to the minimum size (1MB).  I strongly
>>>> > suspect
>>>> > that the connection spike you are seeing are caused when the disk
>>>> > journal
>>>> > ("transaction log") rolls over.  In the 2.1.x code-based on my
>>>> > machine, a
>>>> > 1MB journal takes about 600ms to roll over.  In a loaded server, this
>>>> > is
>>>> > enough to cause a back-up of connection requests and commit requests.
>>>> > If
>>>> > the journal is set to a larger size, say 8MB or larger, the pauses
>>>> > will be
>>>> > longer.
>>>> >
>>>> > In the code checked into my 2.2. branch, a "roll" of a 1MB transaction
>>>> > log
>>>> > now takes ~28ms compared to the former 600ms.
>>>> >
>>>> > There are quite a few other throughput improvements, and I have put
>>>> > together
>>>> > a technical analysis/description here:
>>>> >
>>>> >
>>>> > https://github.com/brettwooldridge/bitronix-hp/wiki/Disk-Journal-Improvements
>>>> >
>>>> > Currently these changes are in my "private" Bitronix repo, but I am in
>>>> > the
>>>> > process of merging them into the 2.2 branch.
>>>> >
>>>> > I have added additional unit tests, and over the last few days I've
>>>> > run
>>>> > several dozen million transactions through the log code, so my
>>>> > confidence is
>>>> > fairly high that it is correct.
>>>> >
>>>> > Best Regards,
>>>> > Brett
>>>> >
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe from this list, please visit:
>>>>
>>>>     http://xircles.codehaus.org/manage_email
>>>>
>>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply | Threaded
Open this post in threaded view
|

Re: DiskJournal lock

Brett Wooldridge-2
Ludovic,

I had thoughts along similar lines.  Particularly, because in my tests, the number of in-flight transactions that rollover are typically few (single or double digits in number).  However, I had two concerns.

First, in cases where the number of in-flight transactions is large; either though some actual use-case or a defect in the user's code.  In this case, the map of in-flight transactions might grow quite large.  One solution might be, if we set a cap on the number of tracked in-flight transactions (say 1000), and if the map grows to that size at the time of rollover, we would fallback to the current scheme.

Second, I was unsure of the true value of the "win".  Basically, we're trading decreased rollover-time for additional log-time overhead.  We would have to keep a map of in-flight transactions, and every time a COMMITTED log comes in, we have to reconcile the map.  For example, borrowing from the unit test, if this COMMITTING log comes in:

journal.log(Status.STATUS_COMMITTING, gtrid, csvToSet("name1,name2,name3"))

We're tracking 3 resources.  The COMMITTED could come in one-by-one, or in groups:

journal.log(Status.STATUS_COMMITTED, gtrid, csvToSet("name1"));
journal.log(Status.STATUS_COMMITTED, gtrid, csvToSet("name2,name3"));

The code would probably end up looking similar to the code in DiskJournal.collectDanglingRecords()...

Set<String> recUniqueNames = new HashSet<String>(rec.getUniqueNames());
recUniqueNames.removeAll(tlog.getUniqueNames());
if (recUniqueNames.isEmpty()) {
   inFlight.remove(tlog.getGtrid());
}

In the end, the overhead of this code, performed for every new STATUS_COMMITTED or STATUS_ROLLBACK log could very well offset the performance gain in the rollover.

Do you think the tradeoff of logging performance is worth the gain in rollover performance?

Brett

On Sat, Oct 13, 2012 at 1:15 AM, Ludovic Orban <[hidden email]> wrote:
Brett,

Regarding the rollover performance of the improved journal: I think we
could keep the state of the active transactions in a map and iterate
over that map rather than re-reading the journal from the disk when a
rollover is needed. That should boost the rollover perf a bit more,
shouldn't that?

WDYT?

Thanks,
Ludovic

On Thu, Oct 11, 2012 at 1:27 PM, Brett Wooldridge
<[hidden email]> wrote:
> Gerald,
>
> One thing I found with the disk journal is that "rollover" is relatively
> time-consuming.  If the journal file is specified to be large (several
> megabytes), this rollover maybe be causing the backlog.  I'm hoping you've
> set your journal to be large, so that you can now shrink it. :-)  I would
> recommend setting the journal to the minimum (1MB).  This will rollover more
> frequently, but the rollover time should be less -- which might "smooth out"
> the connection spikes you're seeing.  But if you're journal size is already
> set to 1MB, unfortunately I don't currently have any other ideas.
>
> Regarding JDBC 4.1, I haven't looked at it so I can't really say when or if
> we will add support for it.
>
> Regards,
> Brett
>
>
> On Thu, Oct 11, 2012 at 5:38 PM, Gérald Quintana <[hidden email]>
> wrote:
>>
>> Hi Brett, Ludovic and all,
>>
>> These are very good news, many thanks to Brett for his amazing work on
>> DiskJournal, I am eager to try it. About JDBC, will you support Java 7/JDBC
>> 4.1?
>>
>> We tested the filterLog option in production and it doesn't help: All
>> threads are locked on commit (instead of begin) now. From what I can see
>> (and understand), here is what occurs:
>>
>> JDBC pool size contains roughly 40 connections under normal activity
>> 20 threads are blocked on DiskJournal.log
>> Tomcat starts 10 new threads
>> ... which open 10 new connections (since other are blocked)
>> These 10 new threads are blocked on DiskJournal as well
>> Go to point 3 unless 100 connections are opened (pool max size) or
>> DiskJournal lock is freed
>>
>> Gérald
>>
>>
>> 2012/10/10 Brett Wooldridge <[hidden email]>
>>>
>>> Ludovic,
>>>
>>> It seems more prudent to look at increasing the existing journal
>>> throughput and eliminating whatever bottlenecks before considering an
>>> entirely new journal structure.  Something like a tree-based, multi-segment
>>> journal like some databases use seemed like a huge amount of work for
>>> unknown gains.  I too am quite pleased with the performance we could squeeze
>>> out of the existing code with relatively small changes.
>>>
>>> With respect to the JDBC4 changes.  There's two way to go.  One is, as
>>> you suggested, dropping Java 5 support.  This simplifies the project
>>> structure and the build.  However, following your earlier objections to the
>>> new JDBC4 changes, I modified the maven build (several months back) to merge
>>> the JDBC4 delegate classes into the main Bitronix jar.  So, while there is
>>> minor build-time complexity (in jar creation), the deployment artifacts are
>>> the same was 2.1.x.  If you wish to keep this structure, that's fine too.
>>> If you want to stop Java 5 support at 2.1.x, and support Java 6+ going
>>> forward, I can make the attendant build changes and move the JDBC4 classes
>>> into the main btm module.  I fine with either.
>>>
>>> I have been keeping the codehaus-hosted 2.2 branch up-to-date merging in
>>> any changes made to the 'master' over the past year.  With the exception of
>>> these latest log changes.  So, I think a merge of the 2.2 into master should
>>> be straightforward and mostly if not entirely conflict-free.
>>>
>>> Regards,
>>> Brett
>>>
>>> On Wed, Oct 10, 2012 at 10:16 PM, Ludovic Orban <[hidden email]>
>>> wrote:
>>>>
>>>> Brett,
>>>>
>>>> That's an impressive improvement you made to the disk journal, and all
>>>> without touching its on-disk format nor its general architecture!
>>>> That's just awesome, congratulations and many thanks for this
>>>> contribution!
>>>>
>>>> I didn't look at the code but the detailed explanations clearly prove
>>>> you've thought about all the important use-cases and I believe this
>>>> new disk journal should provide the same guarantees as the old one.
>>>> I'm very confident about the quality of your new journal.
>>>>
>>>> This definitely needs to be included into the official version, as
>>>> well as your XAPool concurrency changes. The detailed improvements
>>>> pages you created for both should be added to the official
>>>> documentation as well IMO.
>>>>
>>>> Actually, the only bit of the 2.2 branch I don't like are the new
>>>> separated JDBC 3/4 modules, but we certainly can come to an agreement
>>>> on that subject. I believe we should seriously discuss merging the 2.1
>>>> and 2.2 branches together and prepare for the next release. Looking at
>>>> the amount of improvements, 2.5 should be the new version number.
>>>>
>>>> Thanks,
>>>> Ludovic
>>>>
>>>> On Wed, Oct 10, 2012 at 2:50 PM, Brett Wooldridge
>>>> <[hidden email]> wrote:
>>>> > Gerald, Ludovic, et. al
>>>> >
>>>> > I've spent the weekend and the last few days going over the disk
>>>> > journal
>>>> > code, profiling, testing, and profiling some more.  I think the
>>>> > performance
>>>> > improvements are notable.  Even in the absence of these changes,
>>>> > Gerald, I
>>>> > think you will see 'smoother' behavior from the existing (2.1.x)
>>>> > code-base
>>>> > if you set your disk journal to the minimum size (1MB).  I strongly
>>>> > suspect
>>>> > that the connection spike you are seeing are caused when the disk
>>>> > journal
>>>> > ("transaction log") rolls over.  In the 2.1.x code-based on my
>>>> > machine, a
>>>> > 1MB journal takes about 600ms to roll over.  In a loaded server, this
>>>> > is
>>>> > enough to cause a back-up of connection requests and commit requests.
>>>> > If
>>>> > the journal is set to a larger size, say 8MB or larger, the pauses
>>>> > will be
>>>> > longer.
>>>> >
>>>> > In the code checked into my 2.2. branch, a "roll" of a 1MB transaction
>>>> > log
>>>> > now takes ~28ms compared to the former 600ms.
>>>> >
>>>> > There are quite a few other throughput improvements, and I have put
>>>> > together
>>>> > a technical analysis/description here:
>>>> >
>>>> >
>>>> > https://github.com/brettwooldridge/bitronix-hp/wiki/Disk-Journal-Improvements
>>>> >
>>>> > Currently these changes are in my "private" Bitronix repo, but I am in
>>>> > the
>>>> > process of merging them into the 2.2 branch.
>>>> >
>>>> > I have added additional unit tests, and over the last few days I've
>>>> > run
>>>> > several dozen million transactions through the log code, so my
>>>> > confidence is
>>>> > fairly high that it is correct.
>>>> >
>>>> > Best Regards,
>>>> > Brett
>>>> >
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe from this list, please visit:
>>>>
>>>>     http://xircles.codehaus.org/manage_email
>>>>
>>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Reply | Threaded
Open this post in threaded view
|

Re: DiskJournal lock

Ludovic Orban-2
Hi Brett,

I wouldn't care too much about problem (1). Both cases you describe
would cause an OOME, and is fixable in a trivial way: fix your code or
increase the max heap size.

I though about problem (2) too and I'm fairly confident the
performance drop caused by maintaining the list of in-flight
transactions would be negligible. OTOH the performance benefit of not
re-reading the disk in case of a rollover might be negligible. I
suppose performing measurements would be necessary to answer this
question.

Here's what I propose: either we don't bother implementing this
optimization or we implement it together with a config option that can
enable it, and disable it by default. We could hint users facing
rollover performance problems to enable this option and let them
figure out if that helps in their environment.

--
Ludovic


On Mon, Oct 15, 2012 at 4:15 AM, Brett Wooldridge
<[hidden email]> wrote:

> Ludovic,
>
> I had thoughts along similar lines.  Particularly, because in my tests, the
> number of in-flight transactions that rollover are typically few (single or
> double digits in number).  However, I had two concerns.
>
> First, in cases where the number of in-flight transactions is large; either
> though some actual use-case or a defect in the user's code.  In this case,
> the map of in-flight transactions might grow quite large.  One solution
> might be, if we set a cap on the number of tracked in-flight transactions
> (say 1000), and if the map grows to that size at the time of rollover, we
> would fallback to the current scheme.
>
> Second, I was unsure of the true value of the "win".  Basically, we're
> trading decreased rollover-time for additional log-time overhead.  We would
> have to keep a map of in-flight transactions, and every time a COMMITTED log
> comes in, we have to reconcile the map.  For example, borrowing from the
> unit test, if this COMMITTING log comes in:
>
> journal.log(Status.STATUS_COMMITTING, gtrid, csvToSet("name1,name2,name3"))
>
> We're tracking 3 resources.  The COMMITTED could come in one-by-one, or in
> groups:
>
> journal.log(Status.STATUS_COMMITTED, gtrid, csvToSet("name1"));
> journal.log(Status.STATUS_COMMITTED, gtrid, csvToSet("name2,name3"));
>
> The code would probably end up looking similar to the code in
> DiskJournal.collectDanglingRecords()...
>
>
> Set<String> recUniqueNames = new HashSet<String>(rec.getUniqueNames());
> recUniqueNames.removeAll(tlog.getUniqueNames());
> if (recUniqueNames.isEmpty()) {
>    inFlight.remove(tlog.getGtrid());
> }
>
>
> In the end, the overhead of this code, performed for every new
> STATUS_COMMITTED or STATUS_ROLLBACK log could very well offset the
> performance gain in the rollover.
>
> Do you think the tradeoff of logging performance is worth the gain in
> rollover performance?
>
> Brett
>
> On Sat, Oct 13, 2012 at 1:15 AM, Ludovic Orban <[hidden email]> wrote:
>>
>> Brett,
>>
>> Regarding the rollover performance of the improved journal: I think we
>> could keep the state of the active transactions in a map and iterate
>> over that map rather than re-reading the journal from the disk when a
>> rollover is needed. That should boost the rollover perf a bit more,
>> shouldn't that?
>>
>> WDYT?
>>
>> Thanks,
>> Ludovic
>>
>> On Thu, Oct 11, 2012 at 1:27 PM, Brett Wooldridge
>> <[hidden email]> wrote:
>> > Gerald,
>> >
>> > One thing I found with the disk journal is that "rollover" is relatively
>> > time-consuming.  If the journal file is specified to be large (several
>> > megabytes), this rollover maybe be causing the backlog.  I'm hoping
>> > you've
>> > set your journal to be large, so that you can now shrink it. :-)  I
>> > would
>> > recommend setting the journal to the minimum (1MB).  This will rollover
>> > more
>> > frequently, but the rollover time should be less -- which might "smooth
>> > out"
>> > the connection spikes you're seeing.  But if you're journal size is
>> > already
>> > set to 1MB, unfortunately I don't currently have any other ideas.
>> >
>> > Regarding JDBC 4.1, I haven't looked at it so I can't really say when or
>> > if
>> > we will add support for it.
>> >
>> > Regards,
>> > Brett
>> >
>> >
>> > On Thu, Oct 11, 2012 at 5:38 PM, Gérald Quintana
>> > <[hidden email]>
>> > wrote:
>> >>
>> >> Hi Brett, Ludovic and all,
>> >>
>> >> These are very good news, many thanks to Brett for his amazing work on
>> >> DiskJournal, I am eager to try it. About JDBC, will you support Java
>> >> 7/JDBC
>> >> 4.1?
>> >>
>> >> We tested the filterLog option in production and it doesn't help: All
>> >> threads are locked on commit (instead of begin) now. From what I can
>> >> see
>> >> (and understand), here is what occurs:
>> >>
>> >> JDBC pool size contains roughly 40 connections under normal activity
>> >> 20 threads are blocked on DiskJournal.log
>> >> Tomcat starts 10 new threads
>> >> ... which open 10 new connections (since other are blocked)
>> >> These 10 new threads are blocked on DiskJournal as well
>> >> Go to point 3 unless 100 connections are opened (pool max size) or
>> >> DiskJournal lock is freed
>> >>
>> >> Gérald
>> >>
>> >>
>> >> 2012/10/10 Brett Wooldridge <[hidden email]>
>> >>>
>> >>> Ludovic,
>> >>>
>> >>> It seems more prudent to look at increasing the existing journal
>> >>> throughput and eliminating whatever bottlenecks before considering an
>> >>> entirely new journal structure.  Something like a tree-based,
>> >>> multi-segment
>> >>> journal like some databases use seemed like a huge amount of work for
>> >>> unknown gains.  I too am quite pleased with the performance we could
>> >>> squeeze
>> >>> out of the existing code with relatively small changes.
>> >>>
>> >>> With respect to the JDBC4 changes.  There's two way to go.  One is, as
>> >>> you suggested, dropping Java 5 support.  This simplifies the project
>> >>> structure and the build.  However, following your earlier objections
>> >>> to the
>> >>> new JDBC4 changes, I modified the maven build (several months back) to
>> >>> merge
>> >>> the JDBC4 delegate classes into the main Bitronix jar.  So, while
>> >>> there is
>> >>> minor build-time complexity (in jar creation), the deployment
>> >>> artifacts are
>> >>> the same was 2.1.x.  If you wish to keep this structure, that's fine
>> >>> too.
>> >>> If you want to stop Java 5 support at 2.1.x, and support Java 6+ going
>> >>> forward, I can make the attendant build changes and move the JDBC4
>> >>> classes
>> >>> into the main btm module.  I fine with either.
>> >>>
>> >>> I have been keeping the codehaus-hosted 2.2 branch up-to-date merging
>> >>> in
>> >>> any changes made to the 'master' over the past year.  With the
>> >>> exception of
>> >>> these latest log changes.  So, I think a merge of the 2.2 into master
>> >>> should
>> >>> be straightforward and mostly if not entirely conflict-free.
>> >>>
>> >>> Regards,
>> >>> Brett
>> >>>
>> >>> On Wed, Oct 10, 2012 at 10:16 PM, Ludovic Orban <[hidden email]>
>> >>> wrote:
>> >>>>
>> >>>> Brett,
>> >>>>
>> >>>> That's an impressive improvement you made to the disk journal, and
>> >>>> all
>> >>>> without touching its on-disk format nor its general architecture!
>> >>>> That's just awesome, congratulations and many thanks for this
>> >>>> contribution!
>> >>>>
>> >>>> I didn't look at the code but the detailed explanations clearly prove
>> >>>> you've thought about all the important use-cases and I believe this
>> >>>> new disk journal should provide the same guarantees as the old one.
>> >>>> I'm very confident about the quality of your new journal.
>> >>>>
>> >>>> This definitely needs to be included into the official version, as
>> >>>> well as your XAPool concurrency changes. The detailed improvements
>> >>>> pages you created for both should be added to the official
>> >>>> documentation as well IMO.
>> >>>>
>> >>>> Actually, the only bit of the 2.2 branch I don't like are the new
>> >>>> separated JDBC 3/4 modules, but we certainly can come to an agreement
>> >>>> on that subject. I believe we should seriously discuss merging the
>> >>>> 2.1
>> >>>> and 2.2 branches together and prepare for the next release. Looking
>> >>>> at
>> >>>> the amount of improvements, 2.5 should be the new version number.
>> >>>>
>> >>>> Thanks,
>> >>>> Ludovic
>> >>>>
>> >>>> On Wed, Oct 10, 2012 at 2:50 PM, Brett Wooldridge
>> >>>> <[hidden email]> wrote:
>> >>>> > Gerald, Ludovic, et. al
>> >>>> >
>> >>>> > I've spent the weekend and the last few days going over the disk
>> >>>> > journal
>> >>>> > code, profiling, testing, and profiling some more.  I think the
>> >>>> > performance
>> >>>> > improvements are notable.  Even in the absence of these changes,
>> >>>> > Gerald, I
>> >>>> > think you will see 'smoother' behavior from the existing (2.1.x)
>> >>>> > code-base
>> >>>> > if you set your disk journal to the minimum size (1MB).  I strongly
>> >>>> > suspect
>> >>>> > that the connection spike you are seeing are caused when the disk
>> >>>> > journal
>> >>>> > ("transaction log") rolls over.  In the 2.1.x code-based on my
>> >>>> > machine, a
>> >>>> > 1MB journal takes about 600ms to roll over.  In a loaded server,
>> >>>> > this
>> >>>> > is
>> >>>> > enough to cause a back-up of connection requests and commit
>> >>>> > requests.
>> >>>> > If
>> >>>> > the journal is set to a larger size, say 8MB or larger, the pauses
>> >>>> > will be
>> >>>> > longer.
>> >>>> >
>> >>>> > In the code checked into my 2.2. branch, a "roll" of a 1MB
>> >>>> > transaction
>> >>>> > log
>> >>>> > now takes ~28ms compared to the former 600ms.
>> >>>> >
>> >>>> > There are quite a few other throughput improvements, and I have put
>> >>>> > together
>> >>>> > a technical analysis/description here:
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> > https://github.com/brettwooldridge/bitronix-hp/wiki/Disk-Journal-Improvements
>> >>>> >
>> >>>> > Currently these changes are in my "private" Bitronix repo, but I am
>> >>>> > in
>> >>>> > the
>> >>>> > process of merging them into the 2.2 branch.
>> >>>> >
>> >>>> > I have added additional unit tests, and over the last few days I've
>> >>>> > run
>> >>>> > several dozen million transactions through the log code, so my
>> >>>> > confidence is
>> >>>> > fairly high that it is correct.
>> >>>> >
>> >>>> > Best Regards,
>> >>>> > Brett
>> >>>> >
>> >>>>
>> >>>> ---------------------------------------------------------------------
>> >>>> To unsubscribe from this list, please visit:
>> >>>>
>> >>>>     http://xircles.codehaus.org/manage_email
>> >>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe from this list, please visit:
>>
>>     http://xircles.codehaus.org/manage_email
>>
>>
>

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply | Threaded
Open this post in threaded view
|

Re: DiskJournal lock

Brett Wooldridge-2
Ludovic,

I went ahead and implemented your suggestion of the transaction log appender maintaining a list of in-flight transaction so that rollover does not require a scan of the log.  As expected the rollover time basically dropped to 0ms.  The overhead of maintaining the in-flight map is small, < 10%.  Given that the new journal code was 2-3x the previous code, shaving 10% off of that is a small price to pay.  Basically, where rollover could take from 30ms to 1sec (in the improved journal), and 600ms to several seconds (in the original journal), the new and improved journal rollover is essentially seamless and should cause no upstream blocking of any kind.

Brett



On Fri, Oct 19, 2012 at 8:24 PM, Ludovic Orban <[hidden email]> wrote:
Hi Brett,

I wouldn't care too much about problem (1). Both cases you describe
would cause an OOME, and is fixable in a trivial way: fix your code or
increase the max heap size.

I though about problem (2) too and I'm fairly confident the
performance drop caused by maintaining the list of in-flight
transactions would be negligible. OTOH the performance benefit of not
re-reading the disk in case of a rollover might be negligible. I
suppose performing measurements would be necessary to answer this
question.

Here's what I propose: either we don't bother implementing this
optimization or we implement it together with a config option that can
enable it, and disable it by default. We could hint users facing
rollover performance problems to enable this option and let them
figure out if that helps in their environment.

--
Ludovic


On Mon, Oct 15, 2012 at 4:15 AM, Brett Wooldridge
<[hidden email]> wrote:
> Ludovic,
>
> I had thoughts along similar lines.  Particularly, because in my tests, the
> number of in-flight transactions that rollover are typically few (single or
> double digits in number).  However, I had two concerns.
>
> First, in cases where the number of in-flight transactions is large; either
> though some actual use-case or a defect in the user's code.  In this case,
> the map of in-flight transactions might grow quite large.  One solution
> might be, if we set a cap on the number of tracked in-flight transactions
> (say 1000), and if the map grows to that size at the time of rollover, we
> would fallback to the current scheme.
>
> Second, I was unsure of the true value of the "win".  Basically, we're
> trading decreased rollover-time for additional log-time overhead.  We would
> have to keep a map of in-flight transactions, and every time a COMMITTED log
> comes in, we have to reconcile the map.  For example, borrowing from the
> unit test, if this COMMITTING log comes in:
>
> journal.log(Status.STATUS_COMMITTING, gtrid, csvToSet("name1,name2,name3"))
>
> We're tracking 3 resources.  The COMMITTED could come in one-by-one, or in
> groups:
>
> journal.log(Status.STATUS_COMMITTED, gtrid, csvToSet("name1"));
> journal.log(Status.STATUS_COMMITTED, gtrid, csvToSet("name2,name3"));
>
> The code would probably end up looking similar to the code in
> DiskJournal.collectDanglingRecords()...
>
>
> Set<String> recUniqueNames = new HashSet<String>(rec.getUniqueNames());
> recUniqueNames.removeAll(tlog.getUniqueNames());
> if (recUniqueNames.isEmpty()) {
>    inFlight.remove(tlog.getGtrid());
> }
>
>
> In the end, the overhead of this code, performed for every new
> STATUS_COMMITTED or STATUS_ROLLBACK log could very well offset the
> performance gain in the rollover.
>
> Do you think the tradeoff of logging performance is worth the gain in
> rollover performance?
>
> Brett
>
> On Sat, Oct 13, 2012 at 1:15 AM, Ludovic Orban <[hidden email]> wrote:
>>
>> Brett,
>>
>> Regarding the rollover performance of the improved journal: I think we
>> could keep the state of the active transactions in a map and iterate
>> over that map rather than re-reading the journal from the disk when a
>> rollover is needed. That should boost the rollover perf a bit more,
>> shouldn't that?
>>
>> WDYT?
>>
>> Thanks,
>> Ludovic
>>
>> On Thu, Oct 11, 2012 at 1:27 PM, Brett Wooldridge
>> <[hidden email]> wrote:
>> > Gerald,
>> >
>> > One thing I found with the disk journal is that "rollover" is relatively
>> > time-consuming.  If the journal file is specified to be large (several
>> > megabytes), this rollover maybe be causing the backlog.  I'm hoping
>> > you've
>> > set your journal to be large, so that you can now shrink it. :-)  I
>> > would
>> > recommend setting the journal to the minimum (1MB).  This will rollover
>> > more
>> > frequently, but the rollover time should be less -- which might "smooth
>> > out"
>> > the connection spikes you're seeing.  But if you're journal size is
>> > already
>> > set to 1MB, unfortunately I don't currently have any other ideas.
>> >
>> > Regarding JDBC 4.1, I haven't looked at it so I can't really say when or
>> > if
>> > we will add support for it.
>> >
>> > Regards,
>> > Brett
>> >
>> >
>> > On Thu, Oct 11, 2012 at 5:38 PM, Gérald Quintana
>> > <[hidden email]>
>> > wrote:
>> >>
>> >> Hi Brett, Ludovic and all,
>> >>
>> >> These are very good news, many thanks to Brett for his amazing work on
>> >> DiskJournal, I am eager to try it. About JDBC, will you support Java
>> >> 7/JDBC
>> >> 4.1?
>> >>
>> >> We tested the filterLog option in production and it doesn't help: All
>> >> threads are locked on commit (instead of begin) now. From what I can
>> >> see
>> >> (and understand), here is what occurs:
>> >>
>> >> JDBC pool size contains roughly 40 connections under normal activity
>> >> 20 threads are blocked on DiskJournal.log
>> >> Tomcat starts 10 new threads
>> >> ... which open 10 new connections (since other are blocked)
>> >> These 10 new threads are blocked on DiskJournal as well
>> >> Go to point 3 unless 100 connections are opened (pool max size) or
>> >> DiskJournal lock is freed
>> >>
>> >> Gérald
>> >>
>> >>
>> >> 2012/10/10 Brett Wooldridge <[hidden email]>
>> >>>
>> >>> Ludovic,
>> >>>
>> >>> It seems more prudent to look at increasing the existing journal
>> >>> throughput and eliminating whatever bottlenecks before considering an
>> >>> entirely new journal structure.  Something like a tree-based,
>> >>> multi-segment
>> >>> journal like some databases use seemed like a huge amount of work for
>> >>> unknown gains.  I too am quite pleased with the performance we could
>> >>> squeeze
>> >>> out of the existing code with relatively small changes.
>> >>>
>> >>> With respect to the JDBC4 changes.  There's two way to go.  One is, as
>> >>> you suggested, dropping Java 5 support.  This simplifies the project
>> >>> structure and the build.  However, following your earlier objections
>> >>> to the
>> >>> new JDBC4 changes, I modified the maven build (several months back) to
>> >>> merge
>> >>> the JDBC4 delegate classes into the main Bitronix jar.  So, while
>> >>> there is
>> >>> minor build-time complexity (in jar creation), the deployment
>> >>> artifacts are
>> >>> the same was 2.1.x.  If you wish to keep this structure, that's fine
>> >>> too.
>> >>> If you want to stop Java 5 support at 2.1.x, and support Java 6+ going
>> >>> forward, I can make the attendant build changes and move the JDBC4
>> >>> classes
>> >>> into the main btm module.  I fine with either.
>> >>>
>> >>> I have been keeping the codehaus-hosted 2.2 branch up-to-date merging
>> >>> in
>> >>> any changes made to the 'master' over the past year.  With the
>> >>> exception of
>> >>> these latest log changes.  So, I think a merge of the 2.2 into master
>> >>> should
>> >>> be straightforward and mostly if not entirely conflict-free.
>> >>>
>> >>> Regards,
>> >>> Brett
>> >>>
>> >>> On Wed, Oct 10, 2012 at 10:16 PM, Ludovic Orban <[hidden email]>
>> >>> wrote:
>> >>>>
>> >>>> Brett,
>> >>>>
>> >>>> That's an impressive improvement you made to the disk journal, and
>> >>>> all
>> >>>> without touching its on-disk format nor its general architecture!
>> >>>> That's just awesome, congratulations and many thanks for this
>> >>>> contribution!
>> >>>>
>> >>>> I didn't look at the code but the detailed explanations clearly prove
>> >>>> you've thought about all the important use-cases and I believe this
>> >>>> new disk journal should provide the same guarantees as the old one.
>> >>>> I'm very confident about the quality of your new journal.
>> >>>>
>> >>>> This definitely needs to be included into the official version, as
>> >>>> well as your XAPool concurrency changes. The detailed improvements
>> >>>> pages you created for both should be added to the official
>> >>>> documentation as well IMO.
>> >>>>
>> >>>> Actually, the only bit of the 2.2 branch I don't like are the new
>> >>>> separated JDBC 3/4 modules, but we certainly can come to an agreement
>> >>>> on that subject. I believe we should seriously discuss merging the
>> >>>> 2.1
>> >>>> and 2.2 branches together and prepare for the next release. Looking
>> >>>> at
>> >>>> the amount of improvements, 2.5 should be the new version number.
>> >>>>
>> >>>> Thanks,
>> >>>> Ludovic
>> >>>>
>> >>>> On Wed, Oct 10, 2012 at 2:50 PM, Brett Wooldridge
>> >>>> <[hidden email]> wrote:
>> >>>> > Gerald, Ludovic, et. al
>> >>>> >
>> >>>> > I've spent the weekend and the last few days going over the disk
>> >>>> > journal
>> >>>> > code, profiling, testing, and profiling some more.  I think the
>> >>>> > performance
>> >>>> > improvements are notable.  Even in the absence of these changes,
>> >>>> > Gerald, I
>> >>>> > think you will see 'smoother' behavior from the existing (2.1.x)
>> >>>> > code-base
>> >>>> > if you set your disk journal to the minimum size (1MB).  I strongly
>> >>>> > suspect
>> >>>> > that the connection spike you are seeing are caused when the disk
>> >>>> > journal
>> >>>> > ("transaction log") rolls over.  In the 2.1.x code-based on my
>> >>>> > machine, a
>> >>>> > 1MB journal takes about 600ms to roll over.  In a loaded server,
>> >>>> > this
>> >>>> > is
>> >>>> > enough to cause a back-up of connection requests and commit
>> >>>> > requests.
>> >>>> > If
>> >>>> > the journal is set to a larger size, say 8MB or larger, the pauses
>> >>>> > will be
>> >>>> > longer.
>> >>>> >
>> >>>> > In the code checked into my 2.2. branch, a "roll" of a 1MB
>> >>>> > transaction
>> >>>> > log
>> >>>> > now takes ~28ms compared to the former 600ms.
>> >>>> >
>> >>>> > There are quite a few other throughput improvements, and I have put
>> >>>> > together
>> >>>> > a technical analysis/description here:
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> > https://github.com/brettwooldridge/bitronix-hp/wiki/Disk-Journal-Improvements
>> >>>> >
>> >>>> > Currently these changes are in my "private" Bitronix repo, but I am
>> >>>> > in
>> >>>> > the
>> >>>> > process of merging them into the 2.2 branch.
>> >>>> >
>> >>>> > I have added additional unit tests, and over the last few days I've
>> >>>> > run
>> >>>> > several dozen million transactions through the log code, so my
>> >>>> > confidence is
>> >>>> > fairly high that it is correct.
>> >>>> >
>> >>>> > Best Regards,
>> >>>> > Brett
>> >>>> >
>> >>>>
>> >>>> ---------------------------------------------------------------------
>> >>>> To unsubscribe from this list, please visit:
>> >>>>
>> >>>>     http://xircles.codehaus.org/manage_email
>> >>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe from this list, please visit:
>>
>>     http://xircles.codehaus.org/manage_email
>>
>>
>

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



12