Failed to recover SQL Server Restart

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Failed to recover SQL Server Restart

samyem
I am running BTM 2.0.1 with SQL Server 2005 and 2008. When I restart SQL Server, the BTM connections does not seem to be recovering as expected. I am getting the following exception and the only way for me to gain connection is to restart my application. Any help?


Caused by: javax.transaction.xa.XAException: The function RECOVER: has failed. The status is: -3. Error: "*** SQLJDBC_XA DTC_ERROR Context: xa_recover, state=1, StatusCode:-3 (0xFFFFFFFD) ***"
        at com.microsoft.sqlserver.jdbc.SQLServerXAResource.DTC_XA_Interface(SQLServerXAResource.java:545)
        at com.microsoft.sqlserver.jdbc.SQLServerXAResource.recover(SQLServerXAResource.java:723)
        at bitronix.tm.recovery.RecoveryHelper.recover(RecoveryHelper.java:74)
        at bitronix.tm.recovery.RecoveryHelper.recover(RecoveryHelper.java:39)
        at bitronix.tm.recovery.IncrementalRecoverer.recover(IncrementalRecoverer.java:45)


Thanks,
Reply | Threaded
Open this post in threaded view
|

Re: Failed to recover SQL Server Restart

samyem
Since I had to get this working, I got around with a workaround by by-passing the recovery phase all together by modifying the "recover" method in bitronix.tm.recovery.RecoveryHelper as:


Index: src/bitronix/tm/recovery/RecoveryHelper.java
===================================================================
--- src/bitronix/tm/recovery/RecoveryHelper.java (revision 487)
+++ src/bitronix/tm/recovery/RecoveryHelper.java (working copy)
@@ -71,7 +71,13 @@
      * @throws javax.transaction.xa.XAException if {@link XAResource#recover(int)} call fails.
      */
     private static int recover(XAResourceHolderState resourceHolderState, Set alreadyRecoveredXids, int flags) throws XAException {
-        Xid[] xids = resourceHolderState.getXAResource().recover(flags);
+     Xid[] xids = null;
+     try{
+     xids = resourceHolderState.getXAResource().recover(flags);
+     }catch(Throwable t){
+     log.error("Cannot recover",t);
+     }
+        
         if (xids == null)
             return 0;
 


-----------

Is there a better, proper solution here?


samyem wrote
I am running BTM 2.0.1 with SQL Server 2005 and 2008. When I restart SQL Server, the BTM connections does not seem to be recovering as expected. I am getting the following exception and the only way for me to gain connection is to restart my application. Any help?


Caused by: javax.transaction.xa.XAException: The function RECOVER: has failed. The status is: -3. Error: "*** SQLJDBC_XA DTC_ERROR Context: xa_recover, state=1, StatusCode:-3 (0xFFFFFFFD) ***"
        at com.microsoft.sqlserver.jdbc.SQLServerXAResource.DTC_XA_Interface(SQLServerXAResource.java:545)
        at com.microsoft.sqlserver.jdbc.SQLServerXAResource.recover(SQLServerXAResource.java:723)
        at bitronix.tm.recovery.RecoveryHelper.recover(RecoveryHelper.java:74)
        at bitronix.tm.recovery.RecoveryHelper.recover(RecoveryHelper.java:39)
        at bitronix.tm.recovery.IncrementalRecoverer.recover(IncrementalRecoverer.java:45)


Thanks,
Reply | Threaded
Open this post in threaded view
|

Re: Failed to recover SQL Server Restart

Ludovic Orban
Administrator
You problem looks a lot like there's some bug in either SQL server or its JDBC driver.

I'd recommend all the classic things to do in this case such as making sure you're running with the latest versions with all patches / service packs applied and contacting Microsoft's support to get their advice.

Do you also get the problem with earlier versions of BTM? I think you should as the recovery code virtually didn't change between 1.3 and 2.0.
Reply | Threaded
Open this post in threaded view
|

Re: Failed to recover SQL Server Restart

samyem
Yes I am running the latest version of the SQL Server JDBC driver 3.0 and tried with SQL Server 2005 and 2008 with various patch levels and all hotfixes that I can find. Also, I've tried previous versions of BTM as well as the trunk version.

The interesting thing is that apparently the JBoss version also had similar issue with SQL Server at one point which they have since fixed:
http://community.jboss.org/thread/153726

So I was wondering if this is a bug with BTM itself. BTM assumes that the underlying database will always be able to recover from lost connections, but that is not the case when SQL Server restarts. Should we not expect BTM to be able to just create new connection if it cannot recover? The code flow in BTM is such that if it cannot recover, it will simply be in an invalid state and will never be cleaned - leaving the app in limbo. The small patch I had pasted manages to get around that by just ignoring recovery and fetching fresh connections.


Ludovic Orban wrote
You problem looks a lot like there's some bug in either SQL server or its JDBC driver.

I'd recommend all the classic things to do in this case such as making sure you're running with the latest versions with all patches / service packs applied and contacting Microsoft's support to get their advice.

Do you also get the problem with earlier versions of BTM? I think you should as the recovery code virtually didn't change between 1.3 and 2.0.
Reply | Threaded
Open this post in threaded view
|

Re: Failed to recover SQL Server Restart

Ludovic Orban-2
I found the exact same discussion in the JBoss TX forum but it wasn't apparently clear to me that JBoss fixed anything.

What's for sure is that SQL server is supposed to support recover() calls happening at anytime and apparently it does not. I could eventually build a workaround if we could pinpoint what exactly makes it fail and if BTM can do something about it. Unfortunately I don't know how I can help you with that, you're pretty much on your own and that's why I recommended you to try to get help from Microsoft.

BTM is extremely strict about the integrity of the resources it works with: if it has any doubt a resource isn't sane it just won't allow you to use to to avoid working on corrupt data or worsening the situation. When a datasource fails recovery and is marked as failed any new getConnection() call should redo recovery and mark the resource as sane if it works. If you observed a different behavior you may have spotted a bug.

Reply | Threaded
Open this post in threaded view
|

Re: Failed to recover SQL Server Restart

samyem
Since I am just working on top of SQL Server, I was wondering if the recovery works properly in other databases out there using BTM? I am having a hard time finding out that others didn't have similar issue using BTM with SQL Server.

Ludovic Orban-2 wrote
I found the exact same discussion in the JBoss TX forum but it wasn't
apparently clear to me that JBoss fixed anything.

What's for sure is that SQL server is supposed to support recover() calls
happening at anytime and apparently it does not. I could eventually build a
workaround if we could pinpoint what exactly makes it fail and if BTM can do
something about it. Unfortunately I don't know how I can help you with that,
you're pretty much on your own and that's why I recommended you to try to
get help from Microsoft.

BTM is extremely strict about the integrity of the resources it works with:
if it has any doubt a resource isn't sane it just won't allow you to use to
to avoid working on corrupt data or worsening the situation. When a
datasource fails recovery and is marked as failed any new getConnection()
call should redo recovery and mark the resource as sane if it works. If you
observed a different behavior you may have spotted a bug.
Reply | Threaded
Open this post in threaded view
|

Re: Failed to recover SQL Server Restart

Ludovic Orban-2
Recovery works flawlessly with all supported databases: Oracle, DB2, Informix, Postgresql, Derby and Sybase ASE.

It also used to work with SQL server in the limited testing I did some time ago: http://docs.codehaus.org/display/BTM/JdbcXaSupportEvaluation#JdbcXaSupportEvaluation-MicrosoftSQLServer

2010/9/15 samyem <[hidden email]>

Since I am just working on top of SQL Server, I was wondering if the recovery
works properly in other databases out there using BTM? I am having a hard
time finding out that others didn't have similar issue using BTM with SQL
Server.


Ludovic Orban-2 wrote:
>
> I found the exact same discussion in the JBoss TX forum but it wasn't
> apparently clear to me that JBoss fixed anything.
>
> What's for sure is that SQL server is supposed to support recover() calls
> happening at anytime and apparently it does not. I could eventually build
> a
> workaround if we could pinpoint what exactly makes it fail and if BTM can
> do
> something about it. Unfortunately I don't know how I can help you with
> that,
> you're pretty much on your own and that's why I recommended you to try to
> get help from Microsoft.
>
> BTM is extremely strict about the integrity of the resources it works
> with:
> if it has any doubt a resource isn't sane it just won't allow you to use
> to
> to avoid working on corrupt data or worsening the situation. When a
> datasource fails recovery and is marked as failed any new getConnection()
> call should redo recovery and mark the resource as sane if it works. If
> you
> observed a different behavior you may have spotted a bug.
>
>

--
View this message in context: http://old.nabble.com/Failed-to-recover-SQL-Server-Restart-tp29710541p29722420.html
Sent from the Bitronix Transaction Manager mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

   http://xircles.codehaus.org/manage_email



Reply | Threaded
Open this post in threaded view
|

Re: Failed to recover SQL Server Restart

samyem
My tests were on SQL Server 2005 and SQL Server Express 2008 on XP and Windows Server 2008. In all cases, when the SQL Server is stopped and started after about 5 minutes, BTM cannot recover. However, it CAN recover if the server is started within a few seconds of stopping - for example when there is just a "restart". Now, does BTM  not have any plans to support cases where such a recovery cannot be guaranteed as in my case?

Ludovic Orban-2 wrote
Recovery works flawlessly with all supported databases: Oracle, DB2,
Informix, Postgresql, Derby and Sybase ASE.

It also used to work with SQL server in the limited testing I did some time
ago:
http://docs.codehaus.org/display/BTM/JdbcXaSupportEvaluation#JdbcXaSupportEvaluation-MicrosoftSQLServer

2010/9/15 samyem <samyem@gmail.com>

>
> Since I am just working on top of SQL Server, I was wondering if the
> recovery
> works properly in other databases out there using BTM? I am having a hard
> time finding out that others didn't have similar issue using BTM with SQL
> Server.
>
>
> Ludovic Orban-2 wrote:
> >
> > I found the exact same discussion in the JBoss TX forum but it wasn't
> > apparently clear to me that JBoss fixed anything.
> >
> > What's for sure is that SQL server is supposed to support recover() calls
> > happening at anytime and apparently it does not. I could eventually build
> > a
> > workaround if we could pinpoint what exactly makes it fail and if BTM can
> > do
> > something about it. Unfortunately I don't know how I can help you with
> > that,
> > you're pretty much on your own and that's why I recommended you to try to
> > get help from Microsoft.
> >
> > BTM is extremely strict about the integrity of the resources it works
> > with:
> > if it has any doubt a resource isn't sane it just won't allow you to use
> > to
> > to avoid working on corrupt data or worsening the situation. When a
> > datasource fails recovery and is marked as failed any new getConnection()
> > call should redo recovery and mark the resource as sane if it works. If
> > you
> > observed a different behavior you may have spotted a bug.
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Failed-to-recover-SQL-Server-Restart-tp29710541p29722420.html
> Sent from the Bitronix Transaction Manager mailing list archive at
> Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe from this list, please visit:
>
>    http://xircles.codehaus.org/manage_email
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Failed to recover SQL Server Restart

Ludovic Orban-2
What could BTM do when the database server refuses to recover? As far as I can say this seems to be a serious flaw in SQL server's support of the XA protocol and I can't think of anything a transaction manager could do to work around that problem without putting data at risk.

I'd recommend you to have a look at the Last Resource Commit optimization (http://docs.codehaus.org/display/BTM/LastResourceCommit13) if you can't get this SQL server issue solved.

2010/9/15 samyem <[hidden email]>

My tests were on SQL Server 2005 and SQL Server Express 2008 on XP and
Windows Server 2008. In all cases, when the SQL Server is stopped and
started after about 5 minutes, BTM cannot recover. However, it CAN recover
if the server is started within a few seconds of stopping - for example when
there is just a "restart". Now BTM does not have any plans to support cases
where such a recovery cannot be guaranteed as in my case?


Ludovic Orban-2 wrote:
>
> Recovery works flawlessly with all supported databases: Oracle, DB2,
> Informix, Postgresql, Derby and Sybase ASE.
>
> It also used to work with SQL server in the limited testing I did some
> time
> ago:
> http://docs.codehaus.org/display/BTM/JdbcXaSupportEvaluation#JdbcXaSupportEvaluation-MicrosoftSQLServer
>
> 2010/9/15 samyem <[hidden email]>
>
>>
>> Since I am just working on top of SQL Server, I was wondering if the
>> recovery
>> works properly in other databases out there using BTM? I am having a hard
>> time finding out that others didn't have similar issue using BTM with SQL
>> Server.
>>
>>
>> Ludovic Orban-2 wrote:
>> >
>> > I found the exact same discussion in the JBoss TX forum but it wasn't
>> > apparently clear to me that JBoss fixed anything.
>> >
>> > What's for sure is that SQL server is supposed to support recover()
>> calls
>> > happening at anytime and apparently it does not. I could eventually
>> build
>> > a
>> > workaround if we could pinpoint what exactly makes it fail and if BTM
>> can
>> > do
>> > something about it. Unfortunately I don't know how I can help you with
>> > that,
>> > you're pretty much on your own and that's why I recommended you to try
>> to
>> > get help from Microsoft.
>> >
>> > BTM is extremely strict about the integrity of the resources it works
>> > with:
>> > if it has any doubt a resource isn't sane it just won't allow you to
>> use
>> > to
>> > to avoid working on corrupt data or worsening the situation. When a
>> > datasource fails recovery and is marked as failed any new
>> getConnection()
>> > call should redo recovery and mark the resource as sane if it works. If
>> > you
>> > observed a different behavior you may have spotted a bug.
>> >
>> >
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Failed-to-recover-SQL-Server-Restart-tp29710541p29722420.html
>> Sent from the Bitronix Transaction Manager mailing list archive at
>> Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe from this list, please visit:
>>
>>    http://xircles.codehaus.org/manage_email
>>
>>
>>
>
>

--
View this message in context: http://old.nabble.com/Failed-to-recover-SQL-Server-Restart-tp29710541p29722677.html
Sent from the Bitronix Transaction Manager mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

   http://xircles.codehaus.org/manage_email