ORA-01309: invalid session in dataguard logical standby

We ran across a fun one today with dataguard and a logical standby.  Second time we have seen it with the same fix. After making changes such as moving redo logs around or increasing TEMP (with log apply off) the redo/arch transport will correctly start and begin shipping data from the primary to the logical standby.  However, the log miner is not working and gives the error ORA-01309: invalid session in the alert log.  Stopping and restarting log apply do not work.  The error does not always happen with all configuration changes, only sometimes.  The error looks like this and happens immediately after the log apply setup:

LOGSTDBY status: ORA-16111: log mining and apply setting up
LOGSTDBY status: ORA-01309: invalid session

This metalink article gives us a clue.  Although it refers to Streams, it seems likely this is the same memory bug causing this issue, since the error is the same and the recommended fix – a database restart on the logical standby – works every time.

Another clue is that we do indeed see a parallel query process hung up, as the article says we will.  This is alerted on in the alert log (at least it’s easy to get the pid when it’s in the alert log) and we have to kill -9 it at the OS level to get the database to shutdown, even with shutdown immediate.  It looks like this in the alert log:

Active call for process 6244 user 'oracle' program 'oracle@dvpsor03 (P000)'
SHUTDOWN: waiting for active calls to complete.

The bug they list as the possible root cause does not sound like our culprit, but that doesn’t negate the similarities.

Once we got the database to go down cleanly, and restarted and turned on log apply, everything worked like a charm.

3/11/2010 edit: Experienced a 2nd set of symptoms with the same fix and likely the same cause today. A logical standby threw this error in the alert log:

LSP0: can’t recover from rollback of multi-chunk txn, aborting..

Research showed this means it encountered a deadlock in log apply. (Interesting note: a second logical standby hanging off the same primary did not encounter the same problem.) When this happens, it makes a “note to self” that it needs to do X then Y next time, then restarts log apply. However, the parallel process, P000, did not shut down properly. When it came back up, it was just reading the same block forever (over 2 hours). While P000 was hung, vmstat showed that memory was being slowly but steadily consumed. I had to “alter database abort logical standby apply” to get apply to stop. Then when I shut the database down, it hung. I killed P000 at the OS level, and shutdown completed immediately. Database and log apply both started up fine and the logical standby caught up.

04/30/2010 edit: we have now confirmed that if you do not shut down the database, but you instead shut down log apply, then wait, then kill -9 the P000 process, then start log apply, you can get away with not doing a database restart and things continue on fine.  We’ve used this at least 3 times now and it is consistent.  With a logical standby often users are accessing the database for reporting even if log apply is temporarily stopped, so it is handy that this works.

Advertisements

5 thoughts on “ORA-01309: invalid session in dataguard logical standby

  1. Jen Castillo says:

    brilliant! This info on kill -9 of P000 really saved me tonight. Instead of having to call an SA to freeze a VCS cluster so I could restart our logical standby I followed your most excellent advice. Thanks so much for posting this info! Super-helpful!

  2. yanggq says:

    Applies to:
    Oracle Server – Enterprise Edition – Version: 9.2.0 to 11.1.0
    This problem can occur on any platform.

    Symptoms
    You face Logical Apply suddenly aborting. Logical Standby ALERT.LOG shows:

    LSP0: rolling back apply server 4
    LSP0: apply server 4 rolled back
    LSP0: can’t recover from rollback of multi-chunk txn, aborting..
    LOGSTDBY Apply process P004 pid=25 OS id=176 stopped
    LOGSTDBY Apply process P005 pid=26 OS id=179 stopped
    LOGSTDBY Apply process P007 pid=27 OS id=183 stopped
    LOGSTDBY Analyzer process P003 pid=24 OS id=174 stopped
    LOGSTDBY Apply process P008 pid=28 OS id=185 stopped
    LOGSTDBY Apply process P006 pid=12 OS id=181 stopped

    Cause
    In order to reduce Memory Consumption on the Logical Standby Database, large Transactions to be applied are splitted into smaller Pieces (Chunks). Those Chunks can be applied to the Logical Standby Database before the Commit/Rollback has been reached by the LogMiner in the Redo. Once a Chunk is applied, the LCR’s belonging to this Chunk get deleted and so the Memory is freed up.
    However, sometimes it is possible that such a Transaction is blocking another Transaction. So in this Case, the splitted large Transaction is rolled back in order to let the other Transaction apply.
    After that we are missing the LCR’s from the already applied, then rolled back Chunks. In order to make the LogMiner being able to read those again, the Logical Apply needs to be aborted and restarted so that the LogMiner can read the large Transaction again (typically starting from the READ_SCN).

    Solution
    This is not a Bug or Problem. It’s just as per the Design of the Logical Apply (see description above).

    • tylerv says:

      This is good info, thanks! However this wasn’t the root cause … the problem was that when it did this restart of apply, it didn’t restart P000 and apply hung for hours with no progress. See above: “However, the parallel process, P000, did not shut down properly. When it came back up, it was just reading the same block forever (over 2 hours). While P000 was hung, vmstat showed that memory was being slowly but steadily consumed. I had to “alter database abort logical standby apply” to get apply to stop. Then when I shut the database down, it hung. I killed P000 at the OS level, and shutdown completed immediately. Database and log apply both started up fine and the logical standby caught up.”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s