Upgrading small databases on big servers (ORA-04031 / 11g)

Today I received an ORA-04031 error when upgrading to 11.2.0.2, with the memory allocation error happening during the final utlrp.sql recompilation step and referring consistently to the large pool.

ERROR at line 1:
ORA-12801: error signaled in parallel query server P237
ORA-12853: insufficient memory for PX buffers: current 84352K, max needed 13280400K
ORA-04031: unable to allocate 65560 bytes of shared memory (“large pool”,”unknown object”,”large pool”,”PX msg pool”)
ORA-06512: at “SYS.UTL_RECOMP”, line 804
ORA-06512: at line 4

First thing that stands out is P237, not to mention that it claims it needs 13G of memory to perform this task without erroring.  That’s a heck of a lot of memory and parallel threads for a tiny database with not much data in it.  It turned out to be because I was working with a small database that lives on a Solaris 10 zone that lives on large server (M5000) with tons of cpus available.  Of course, we have this problem because we use zones set up a certain way, and not containers or other configurations that do not allow the logical host to see all the physical host’s cpus, but we like the benefits we get from flexible resource sharing.

The database has a memory_target of 2G.  It turns out that utlrp.sql issues a PARALLEL with no limiting number, and on a large server that can be bad.  It ran out of memory to handle all these threads and died.

—– utlrp.sql passes in a 0 to utlprp.sql (and this is all it does):

Rem ===========================================================================
Rem BEGIN utlrp.sql
Rem ===========================================================================

@@utlprp.sql 0

—– utlprp.sql then uses that 0 as the number of parallel threads:

DECLARE
threads pls_integer := &&1;
BEGIN
utl_recomp.recomp_parallel(threads);
END;

So you can either modify utlrp.sql or issue @@utlprp.sql X to pass in a reasonable number of threads, or you can set a max on the number of parallel threads anything can spin up (a good idea anyway):

alter system set PARALLEL_MAX_SERVERS=5 scope=both;
alter system set PARALLEL_SERVERS_TARGET=2 scope=both;

You’ll have to experiment to find the right number.  I found that sometimes 5 worked, sometimes the recompilation still failed.  And if I received an error, then the next time I might get a memory error on another part of memory (shared pool vs large pool).  Repeatedly running the script eventually resulted in having to shutdown abort, couldn’t shutdown cleanly due to memory allocation errors on the shared pool.

With this low sga I also had to increase the java_pool_size manually so that memory_target would give it that as a minimum; otherwise catupgrd.sql failed with a java pool memory allocation error on Oracle Multimedia.  An error you can even see in the example in Oracle’s docs; go here and search for ORA-04031.  Rerunning catupgrd.sql with more java pool memory allowed it to complete successfully.  I then set java_pool_size back to 0 so it could be managed by memory_target again.

alter system set java_pool_size=256M scope=spfile;  (restart)
alter system set java_pool_size=0 scope=spfile;  (restart)

Great finds along the way:

See memory allocation in 11g with memory_target, from asktom:

with data
as
(
select decode( grouping( pool ), 1, ‘total:’, pool ) “Pool”,
sum(bytes) bytes
from (select nvl(pool,’*’||name) pool, bytes from v$sgastat )
group by rollup (pool)
)
select “Pool”, bytes, round(bytes/1024/1024) mbytes
from data
union all
select ‘PGA target’, v-bytes, round((v-bytes)/1024/1024)
from data, (select to_number(value) v
from v$parameter
where name = ‘memory_target’)
where “Pool” = ‘total:’
/

With small memory databases, especially on virtualized servers that can still “see” all the cpus, need to set these parameters / change these defaults:

PARALLEL_MAX_SERVERS
PARALLEL_SERVERS_TARGET: (note that it is plural, the documentation writes it incorrectly a few times)
JOB_QUEUE_PROCESS

I also posted in the OTN forums about this here.

Advertisements