Lots of small databases on a big Solaris server, part 3

This is turning into a series; I have already written about issues on large boxes hosting a lot of databases, once on the Listener handling incoming requests for tons of databases all using connection pooling, and once on the danger of letting Oracle pick a parallel degree all by itself during an 11g database upgrade.

The next (and hopefully last) challenge we faced turned out to be related to default Solaris settings that are too low for how we are using the box.  The box in question is a Sun M5000 with 256G memory, 8 cpus, 4 cores each, 2 threads / core running Solaris 10; it hosts 1 global zone running 3 zones, which are home to 42 non-production databases that see intermittent, light usage.  We have other similar boxes hosting other non-production databases, but this one in particular was experiencing signs of distress.  It also had the most databases.  At seemingly random times, or sometimes correlated to heavy backup and restore activity, the entire global zone would become partially unresponsive, enough to disrupt traffic to the databases for most of the applications.

We spent some time troubleshooting tcp/ip stack as part of the aforementioned listener issue, but we just weren’t seeing the same symptoms of that being exhausted, this time.  We spent time looking at i/o and in particular the settings on our NFS mount that we write our rman backups to, dialed them back a bit, etc., but it didn’t help.  Finally we started getting fork errors, such as “fork: Resource temporarily unavailable” which immediately send our unix admin looking at processes.  We were exhausting LWPs on the box.

A bit about LWPs, otherwise known as light weight processes.  The basics are here and here.  The crux of our issue: “From Solaris 8 software onwards, the 64-bit kernel allows segkp to be sized from 512 MB to 24 GB, with the default being two gigabtyes (sufficient for more than 87,000 kernel thread stacks/LWPs),” (Multithreading in the Solaris™ Operating Environment).  Another resource said there is a theoretical limit of 88,000 LWPs.  Theoretical … meaning you may see system performance degrade when using fewer LWPs than the limit.

Oracle decides how how many db writers (DBWn) to spin up based on – you guessed it – the number of CPUs in the box.  The rule of thumb listed in the Oracle Performance docs is at least 1 DBWr per 8 CPUs.  Oracle sees our M5000s as having 64 cpus (they are 8 cpus, 4 cores each, 2 threads / core).  You can see Oracle’s count of cpus that it uses for its calculations by using this query:

select STAT_NAME, VALUE from V$OSSTAT where stat_name = ‘NUM_CPUS’;

Our bad that we hadn’t set DB_WRITER_PROCESSES explicitly, but we didn’t know it would try to go so high.  So every single one of our 42 databases spun up 8 db writers.  From what we can see, every single one of them uses up 258 LWPs.  They do not use this many immediately after the database is started, but eventually they climb to 258 and stop there, after they’ve had some usage.  So 42 * 8 * 258 = 86,688 … more than enough to cause problems, given that every other activity on the box would grab at least one LWP as well.

You can see how many LWPs are being used with ps: ps -efL | wc -l

You can see the number of LWPs by pid using ps with these flags: ps -ef -o “pid nlwp”

Make sure to run these commands on the global zone to ensure you see all LWPs.

To fix the issue, you assign more memory to segkp.  The default is 2gb; our unix admin quadrupled ours to 8gb.

To see your segkp currently: kstat -n segkp

This wasn’t an easy one to diagnose since the behavior took a while to produce ‘real’ errors that led to LWPs as the issue.  If you have a similar setup and are having random performance issues, it certainly doesn’t hurt to check your usage of LWPs.

Errors that pointed us toward forking and thereby, LWPs:

  • ORA-12518: TNS:listener could not hand off client connection (DBD ERROR: OCIServerAttach).
  • ORA-12537: TNS:connection closed (DBD ERROR: OCIServerAttach).
  • Agent is Unreachable (REASON = IOException in sending Request :: Socket is closed) but the host is reachable.

These look like a listener problem, but are actually the inability of the listener to handoff to the OS by forking a process to handle the connection. Enterprise Manager repeatedly sent critical alerts about this error, then cleared them, since the issues were intermittent rather than being completely down.

At the apex of the issue, we were intermittently unable to login to the box at all.  When we did get in, we saw the message “fork: Resource temporarily unavailable” on and off when trying to run routine commands. Additionally, after reaching this level of LWP exhaustion, when we later restarted the box overnight to add the memory to segkp, I had to shut down the listeners and the Enterprise Manager agents just to free up resources enough so that I could do a ‘shutdown immediate’ and have it complete.  Despite this, about 30% of the databases would not shut down (they just sat there) and I had to use ‘shutdown abort.’  I would guess that all processes were responding sluggishly when Oracle tried to do its normal shutdown cleanup.

Once the segkp was increased and the box was restarted, the databases all started with no problems, and we have not had any further problems with this host.  We had had issues with multiple rman backups running at once on this box recently, so to test our change to segkp we ran a stress test with all the databases online, two 8 channel rman backups writing to NFS, and a 12 channel rman restore writing to the SAN (fibre channel) all running at once.  While we drove load up due to heavy i/o, everything stayed performant.

As a precaution we have also added the initialization parameter DB_WRITER_PROCESSES to our standard database template (we script our new database builds), as well as all of our non-production databases (set to 2).  We are also evaluating what we need to set it to for each Production database.