Peoplesoft and 12c Adaptive Features

2016-11-29 EDIT: Oracle has announced that in 12.2 the features that drove this overuse of the result cache (dynamic stats triggered by SPDs) will default to off, and provided patches to apply to 12.1 as well.  More info here.

I had been collecting examples and mentally writing something about this but the king of Peoplesoft DBAs, David Kurtz, beat me to it.  So I’ll just link to his analysis: Peoplesoft on Oracle 12c.  Read on for some of my own experience tuning around the issues with 12c Adaptive Features.  Apologies that it isn’t complete or full of examples, I wanted to at least get something out here before more months pass me by.  I may improve it over time.  Also, the experiences that led to this post were on 12.1.0.2.0.  We have patched to 160119, the January bundle patch, but there hasn’t been enough time to assess if any of the fixes to SQL Plan Management and Adaptive features apply to this situation.

The short story is that Peoplesoft has a lot of trouble with the 12c adaptive features.  Peoplesoft still has a lot of queries that do not use binds.  Especially with Peoplesoft jobs that run thousands of these queries that are normally < 1 second, there can be relatively massive increases in runtime when you move to 12c.  If you look at trace files (10046 and 10053), you see that what is happening is that it is spending a ton of time in the parse phase, doing “test” queries.  This is of course poor application design, but it is a vendor app and it certainly isn’t the only one like it.

Unfortunately the issues with applications that are not perfectly designed for Oracle were not apparent until we spent some quality time with our Peoplesoft databases running on 12c in production.  So at this point it would take a pretty large performance testing effort to turn off adaptive features at the database level.  So far, I have been working with individual performance issues by:

  • Having the developers add session variable statements to turn off adaptive features at the beginning of the job;
  • If that’s not possible, or we only want to fix one query, having them add an OPT_PARAM(‘OPTIMIZER_ADAPTIVE_FEATURES’,’false’) hint;
  • If they can’t/won’t change the code (delivered code, for instance) then I create a one-hint SQL Profile with force matching on.

I’ve seen a number of similar-but-different issues with performance and 12c adaptive features, but one recurring pattern that is resolvable in the same way each time goes as follows.

  • Cursor pin s wait on x waits
  • Also library cache pin
  • Spending inordinate time in parse

In 11g, the “fix prod fast” solution for an execution plan gone rogue was to look at the history of various plans, find a good one, and create a SQL Plan Baseline (we’ll ignore differing plans for differing binds for now for the sake of brevity).  Then you can go back and take some time to look at stats.  One of the most annoying things about 12c Adaptive Features is that SQL Plan Directives cause the Optimizer to ignore your SQL Plan Baselines.  It thinks it knows better, or at least it wants to try and find out.  More aspects of this particular situation:

  • SQL Plan Baseline(s) are ignored/not used, even if Accepted, even if Fixed (maddening).  Either a baseline you create to try to fix the problem, or one that was already there – same result.  You can see them being considered and discarded in an Optimizer trace.  Even if the plan was used by the same query moments ago, the Optimizer says they fail to validate as workable plans.
  • There are tons of SQL Plan Baselines, sometimes 100+.
  • Often, at least one of the tables involved in the query has an inordinate number of SQL Plan Directives, even after running stats.

To fix this situation I have taken the following steps:

  • Created SQL Profile disabling 12c Adaptive Features (thanks to Kerry Osborne for his single-hint sql profile script) … see hint above.
  • Deleted all SQL Baselines.  Flush the individual statement from cache if necessary.
  • At this point, parse related waits dropped off.
  • The query was now using a new plan but it was not a great plan, so it was still somewhat slow (but not waiting on parse / pin S / library cache etc).
  • Ran stats on all tables involved, a new plan was derived by the Optimizer. (Note: stats were not stale and had also been run manually per-table beforehand to attempt a fix.)
  • Created new baseline with best plan.

From that point on, performance was excellent, parse time was minimal, and the plan was stable.

Bug in tablespace size/usage monitoring in 12.1.0.2 OEM (DBA_TABLESPACE_USAGE_METRICS)

Despite rarely posting these days, this one is worth trying to get the word out.  We just ran across a 12.1.0.2 bug which is not fixed in the January or April bundle patches/PSUs.  It is also not mentioned in the 12c “you should install these one-off patches” page.  Yet it has the ability to take down your production environment.

The view DBA_TABLESPACE_USAGE_METRICS is used by OEM to drive its alerts on tablespace storage.  It turns out that due to a bug in the code behind the view, it leaves out a number of tablespaces entirely.  You can easily see the problem by comparing DBA_TABLESPACES and DBA_TABLESPACE_USAGE_METRICS.  This means that if any of the missing tablespaces runs out of space (hits the max autoextend) you won’t get any of your percentage-free alerts before it does.  We found anywhere from one to over 70 missing tablespaces, depending on the database.  Every database was missing at least one tablespace from this view.

You can easily see the problem by comparing DBA_TABLESPACES and DBA_TABLESPACE_USAGE_METRICS:

select tablespace_name from dba_tablespaces 
where tablespace_name not in (
    select tablespace_name from DBA_TABLESPACE_USAGE_METRICS
);

We thought we were smart by having another backup monitor running from cron, in case OEM missed sending a notification or we missed it.  Except we used the same view in our monitor, so it misses these tablespaces as well.  Just goes to show that it’s always tough to judge how paranoid to get when trying to eliminate dependencies in redundant systems.

I’m pretty miffed that this has not been deemed important enough issue to be included in a bundle patch.  It’s a big deal, but apparently Oracle doesn’t think so.  If they’d even just referenced it on their page about important one-offs to apply to 12c, we would have already had it in production as of our most recent patching round.

Edit: Oracle Support recommended the 12.1.0.2.4 version of the patch to apply on our 12.1.0.2.160119 Oracle home, and it worked, the missing regular tablespaces now show up in the view.  However, the Oracle doc on this says that “DBA_TABLESPACE_USAGE_METRICS describes tablespace usage metrics for all types of tablespaces, including permanent, temporary, and undo tablespaces.”

After applying the patch, all of the normal tablespaces are now included in the view:

select distinct tablespace_name from dba_data_files minus select tablespace_name from dba_tablespace_usage_metrics;

no rows selected

HOWEVER the TEMP tablespaces are still missing – is this a new bug?  Updated SR to find out, will update here when I know.

select distinct tablespace_name from dba_temp_files minus select tablespace_name from dba_tablespace_usage_metrics;

TABLESPACE_NAME
——————
PSGTT01
PSTEMP