Lots of small databases on a big Solaris server, part 3

This is turning into a series; I have already written about issues on large boxes hosting a lot of databases, once on the Listener handling incoming requests for tons of databases all using connection pooling, and once on the danger of letting Oracle pick a parallel degree all by itself during an 11g database upgrade.

The next (and hopefully last) challenge we faced turned out to be related to default Solaris settings that are too low for how we are using the box.  The box in question is a Sun M5000 with 256G memory, 8 cpus, 4 cores each, 2 threads / core running Solaris 10; it hosts 1 global zone running 3 zones, which are home to 42 non-production databases that see intermittent, light usage.  We have other similar boxes hosting other non-production databases, but this one in particular was experiencing signs of distress.  It also had the most databases.  At seemingly random times, or sometimes correlated to heavy backup and restore activity, the entire global zone would become partially unresponsive, enough to disrupt traffic to the databases for most of the applications.

We spent some time troubleshooting tcp/ip stack as part of the aforementioned listener issue, but we just weren’t seeing the same symptoms of that being exhausted, this time.  We spent time looking at i/o and in particular the settings on our NFS mount that we write our rman backups to, dialed them back a bit, etc., but it didn’t help.  Finally we started getting fork errors, such as “fork: Resource temporarily unavailable” which immediately send our unix admin looking at processes.  We were exhausting LWPs on the box.

A bit about LWPs, otherwise known as light weight processes.  The basics are here and here.  The crux of our issue: “From Solaris 8 software onwards, the 64-bit kernel allows segkp to be sized from 512 MB to 24 GB, with the default being two gigabtyes (sufficient for more than 87,000 kernel thread stacks/LWPs),” (Multithreading in the Solaris™ Operating Environment).  Another resource said there is a theoretical limit of 88,000 LWPs.  Theoretical … meaning you may see system performance degrade when using fewer LWPs than the limit.

Oracle decides how how many db writers (DBWn) to spin up based on – you guessed it – the number of CPUs in the box.  The rule of thumb listed in the Oracle Performance docs is at least 1 DBWr per 8 CPUs.  Oracle sees our M5000s as having 64 cpus (they are 8 cpus, 4 cores each, 2 threads / core).  You can see Oracle’s count of cpus that it uses for its calculations by using this query:

select STAT_NAME, VALUE from V$OSSTAT where stat_name = ‘NUM_CPUS’;

Our bad that we hadn’t set DB_WRITER_PROCESSES explicitly, but we didn’t know it would try to go so high.  So every single one of our 42 databases spun up 8 db writers.  From what we can see, every single one of them uses up 258 LWPs.  They do not use this many immediately after the database is started, but eventually they climb to 258 and stop there, after they’ve had some usage.  So 42 * 8 * 258 = 86,688 … more than enough to cause problems, given that every other activity on the box would grab at least one LWP as well.

You can see how many LWPs are being used with ps: ps -efL | wc -l

You can see the number of LWPs by pid using ps with these flags: ps -ef -o “pid nlwp”

Make sure to run these commands on the global zone to ensure you see all LWPs.

To fix the issue, you assign more memory to segkp.  The default is 2gb; our unix admin quadrupled ours to 8gb.

To see your segkp currently: kstat -n segkp

This wasn’t an easy one to diagnose since the behavior took a while to produce ‘real’ errors that led to LWPs as the issue.  If you have a similar setup and are having random performance issues, it certainly doesn’t hurt to check your usage of LWPs.

Errors that pointed us toward forking and thereby, LWPs:

  • ORA-12518: TNS:listener could not hand off client connection (DBD ERROR: OCIServerAttach).
  • ORA-12537: TNS:connection closed (DBD ERROR: OCIServerAttach).
  • Agent is Unreachable (REASON = IOException in sending Request :: Socket is closed) but the host is reachable.

These look like a listener problem, but are actually the inability of the listener to handoff to the OS by forking a process to handle the connection. Enterprise Manager repeatedly sent critical alerts about this error, then cleared them, since the issues were intermittent rather than being completely down.

At the apex of the issue, we were intermittently unable to login to the box at all.  When we did get in, we saw the message “fork: Resource temporarily unavailable” on and off when trying to run routine commands. Additionally, after reaching this level of LWP exhaustion, when we later restarted the box overnight to add the memory to segkp, I had to shut down the listeners and the Enterprise Manager agents just to free up resources enough so that I could do a ‘shutdown immediate’ and have it complete.  Despite this, about 30% of the databases would not shut down (they just sat there) and I had to use ‘shutdown abort.’  I would guess that all processes were responding sluggishly when Oracle tried to do its normal shutdown cleanup.

Once the segkp was increased and the box was restarted, the databases all started with no problems, and we have not had any further problems with this host.  We had had issues with multiple rman backups running at once on this box recently, so to test our change to segkp we ran a stress test with all the databases online, two 8 channel rman backups writing to NFS, and a 12 channel rman restore writing to the SAN (fibre channel) all running at once.  While we drove load up due to heavy i/o, everything stayed performant.

As a precaution we have also added the initialization parameter DB_WRITER_PROCESSES to our standard database template (we script our new database builds), as well as all of our non-production databases (set to 2).  We are also evaluating what we need to set it to for each Production database.

Upgrading small databases on big servers (ORA-04031 / 11g)

Today I received an ORA-04031 error when upgrading to 11.2.0.2, with the memory allocation error happening during the final utlrp.sql recompilation step and referring consistently to the large pool.

ERROR at line 1:
ORA-12801: error signaled in parallel query server P237
ORA-12853: insufficient memory for PX buffers: current 84352K, max needed 13280400K
ORA-04031: unable to allocate 65560 bytes of shared memory (“large pool”,”unknown object”,”large pool”,”PX msg pool”)
ORA-06512: at “SYS.UTL_RECOMP”, line 804
ORA-06512: at line 4

First thing that stands out is P237, not to mention that it claims it needs 13G of memory to perform this task without erroring.  That’s a heck of a lot of memory and parallel threads for a tiny database with not much data in it.  It turned out to be because I was working with a small database that lives on a Solaris 10 zone that lives on large server (M5000) with tons of cpus available.  Of course, we have this problem because we use zones set up a certain way, and not containers or other configurations that do not allow the logical host to see all the physical host’s cpus, but we like the benefits we get from flexible resource sharing.

The database has a memory_target of 2G.  It turns out that utlrp.sql issues a PARALLEL with no limiting number, and on a large server that can be bad.  It ran out of memory to handle all these threads and died.

—– utlrp.sql passes in a 0 to utlprp.sql (and this is all it does):

Rem ===========================================================================
Rem BEGIN utlrp.sql
Rem ===========================================================================

@@utlprp.sql 0

—– utlprp.sql then uses that 0 as the number of parallel threads:

DECLARE
threads pls_integer := &&1;
BEGIN
utl_recomp.recomp_parallel(threads);
END;

So you can either modify utlrp.sql or issue @@utlprp.sql X to pass in a reasonable number of threads, or you can set a max on the number of parallel threads anything can spin up (a good idea anyway):

alter system set PARALLEL_MAX_SERVERS=5 scope=both;
alter system set PARALLEL_SERVERS_TARGET=2 scope=both;

You’ll have to experiment to find the right number.  I found that sometimes 5 worked, sometimes the recompilation still failed.  And if I received an error, then the next time I might get a memory error on another part of memory (shared pool vs large pool).  Repeatedly running the script eventually resulted in having to shutdown abort, couldn’t shutdown cleanly due to memory allocation errors on the shared pool.

With this low sga I also had to increase the java_pool_size manually so that memory_target would give it that as a minimum; otherwise catupgrd.sql failed with a java pool memory allocation error on Oracle Multimedia.  An error you can even see in the example in Oracle’s docs; go here and search for ORA-04031.  Rerunning catupgrd.sql with more java pool memory allowed it to complete successfully.  I then set java_pool_size back to 0 so it could be managed by memory_target again.

alter system set java_pool_size=256M scope=spfile;  (restart)
alter system set java_pool_size=0 scope=spfile;  (restart)

Great finds along the way:

See memory allocation in 11g with memory_target, from asktom:

with data
as
(
select decode( grouping( pool ), 1, ‘total:’, pool ) “Pool”,
sum(bytes) bytes
from (select nvl(pool,’*’||name) pool, bytes from v$sgastat )
group by rollup (pool)
)
select “Pool”, bytes, round(bytes/1024/1024) mbytes
from data
union all
select ‘PGA target’, v-bytes, round((v-bytes)/1024/1024)
from data, (select to_number(value) v
from v$parameter
where name = ‘memory_target’)
where “Pool” = ‘total:’
/

With small memory databases, especially on virtualized servers that can still “see” all the cpus, need to set these parameters / change these defaults:

PARALLEL_MAX_SERVERS
PARALLEL_SERVERS_TARGET: (note that it is plural, the documentation writes it incorrectly a few times)
JOB_QUEUE_PROCESS

I also posted in the OTN forums about this here.

Listener queuesize and Solaris tcpip

11/18/2010 edit: after more troubleshooting on this box, we learned more about what was happening, and while the post below stands as valid items to look at, our issues were perhaps more complex and may have been related to a different issue.  See this post as well as all three posts on some gotchas we have run into while running lots of databases on Solaris zones.

——–

I just made my first post to forums.oracle.com (finally saw something go across my RSS feed that I felt I could respond to) and I thought I would reprise it here.

We use Sun M5000s with 256G of memory for our non-production servers, so we load them up with a lot of databases.  Some are separated out into Solaris zones, but some zones still have a lot of smaller databases running on a single listener.  Our app admins reported that at times they could not restart the application; Glassfish would not connect, even from an admin console test. Here are the two things we found and addressed to fix these issues.

First thing to check for …

By default, the queue the listener has for incoming requests is very small. It is enough for most database servers, so no one ever talks about it or changes it from the default, but when you run this many databases AND the app is using connection pooling, then the startup of the app (with multiple clustered app servers all starting up the default number of threads in each pool all at once) tends to exhaust this listener queue. From Oracle’s doc:

10.3.2 Handling Large Volumes of Concurrent Connection Requests

If you expect the listener to handle large volumes of concurrent connection requests, then you can specify a listener queue size for its TCP/IP or IPC listening endpoints. To specify the listener queue size, specify the QUEUESIZE parameter at the end of the protocol address with its value set to the expected number of concurrent requests, similar to Example 10-2.

Example 10-2 listener.ora File with Queue Size

LISTENER=
(DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp)(HOST=sales-server)(PORT=1521)(QUEUESIZE=20)))

Note:
The default number of concurrent connection requests is operating system-specific. The defaults for TCP/IP on the Solaris Operating System and Windows follow:

* Solaris Operating System: 5
* Windows XP Professional SP2: 10
* Windows 2003 Server Enterprise Edition: 200

In conjunction with this, it can also exhaust the tcp/ip stack, so there are some settings you should have the server admin increase (at least we experienced this on Solaris).  Here’s an article that explains how to test for this, and if you see evidence of it, what settings to change (for Solaris).

We set our queuesize to 512 (for all database hosts) and our Solaris tcp_conn_req_max_q to 1024. Resolved the “connection storm” issue caused by app side connection pooling.

Second thing to check for …

We thought the first problem had recurred after we fixed it with queuesize/tcp_conn_req_max_q but it just had the same symptoms, and here’s what it was …

We use SAN/fiberchannel disk for our database storage. However, we have an NFS mount which we use solely for RMAN backups and restores. Our unix admins are awesome and they have tuned the NFS within an inch of its life — and we found it was too fast.  (Meaning, we had to stop being stupid and not run 3T multi-channel restores and backups at the same time.) NFS sits on top of the tcpip stack, so if you share the pipe in and out of the box (10G for us, no bottleneck there) with NFS and you really hit it hard, you can exhaust the tcpip stack again and that manifests as listener drop errors without seeing any “listendrops” in netstat (see article above) and without any errors making it to your listener log, because it never gets that far.