Recovering from bad connections in pool

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Recovering from bad connections in pool

cwong15
I was testing BTM's ability to recover from bad connections, using LrcXADataSource (because of MySQL's XA limitations). I let the app run for a bit, then restarted the MySQL server, which killed all connections in BTM's pool. When the app next tried to access the DB, it seemed to be permanently stuck, unable to clear out the bad connections in the pool. I don't manage the connections myself, since Hibernate takes care of those details. Is BTM supposed to discard bad connections somehow? I know it's possible to make sure connections work by setting the testQuery property, but that seems expensive.

Thanks.

Chris
Reply | Threaded
Open this post in threaded view
|

Re: Recovering from bad connections in pool

Ludovic Orban
Administrator
Hi,

Alas, the only mechanism to clear out invalid connections is to set the testQuery property. I must say I did not try any harder than that to make the pool recoverable in case of a database crash / restart but when enabled, this mechanism works fine.

The fact that it is expensive is relative, how much overhead would the test query add to your requests ?

Anyway, if you have suggestions on how to improve this mechanism I'm open.

Ludovic
Reply | Threaded
Open this post in threaded view
|

Re: Recovering from bad connections in pool

cwong15
Thanks for responding. It looks like I should always have a testQuery set for a production BTM environment. The alternative would be to keep the connection idle time short so connections never live long in the pool.

As to suggestions, I have a few thoughts. While I don't have a sense of how much overhead an additional query would add to every DB connection request, it does not seem elegant to add  overhead every single time to guard against a problem that almost never happens. At the same time, I cannot afford to have a production server become unusable and require a restart on those rare occasions -- probably when I am asleep -- when connections *might* be lost.

One solution might be optimistic connection usage: assume your pool has good connections, but make sure that when a connection error happens, that connection does not get returned to the pool. The tricky part is how to catch that error. For example, Apache's DBCP always calls Connection.setAutoCommit() when activating a connection, and with MySQL this triggers an exception if the connection is bad. That has the effect of testing a connection every time, but without additional overhead because that call is always made anyway. At least for MySQL, a bad connection is never returned to the user in such a situation, and thus is never returned to the pool.

Another possibility is to check the pool periodically, like c3p0. Intead of testing a connection at every request, it tests every connection in the pool periodically. I'm not sure that this buys me much over simply setting a low maxIdleTime parameter, and it still leaves open a window of vulnerability between pool cleanings.

I think it's great that BTM has JMX support for pool monitoring. Expanding that support to add an operation like "clear all connections in all pools" might be useful for those rare DB server restart situations.

Keep up the good work.



Reply | Threaded
Open this post in threaded view
|

Re: Recovering from bad connections in pool

Ludovic Orban
Administrator
Relying on JDBC calls like Connection.setAutoCommit() might sound like a good idea but in practice it isn't for at least those reasons:

 - some databases do not natively support autocommit so the driver emulates that by sending a commit call after each statement. This is the case with Firebird and even Oracle I think. This means setting autocommit just sets a variable in the JVM and does not guarantee the connection is still working for those DBs.

 - for databases which do support autocommit like MySQL or Sybase ASE, calling this method means a round-trip to the database. This won't perform any better than a test query.

Checking connections asynchronously like c3p0 does could help but it requires you to tune the connection check settings to get good results. This isn't trivial and if it's poorly configured, the mechanism isn't effective.

Regarding your JMX question, this sounds like a good idea to be able to reset connection pools. Please open an enhancement request in JIRA and I'll try to get that done for the next release.

Ludovic