Latch free waits, Dynamic Statistics, and the Result Cache

2016-11-29 EDIT: Oracle has announced that in 12.2 the features that drove this overuse of the result cache (dynamic stats triggered by SPDs) will default to off, and provided patches to apply to 12.1 as well.  More info here.  Also, some investigation by Mauro Pagano around the impact of the new OPTIMIZER_ADAPTIVE_STATISTICS parameter turned up an interesting minor fix related to this post: the result cache hint is no longer used in dynamic stats queries. So once you apply the patches/upgrade to 12.2, it should resolve the issue discussed in this post even if you turn SPD-triggered dynamic statistics back on.

2015-06-03 EDIT: looks like earlier in May Oracle published a brief doc based on my ticket(had discussed the need for a doc with engineer): High “Latch Free” Waits with Contention on ‘Result Cache: RC Latch’ when RESULT_CACHE_MODE = MANUAL on Oracle 12c (Doc ID 2002089.1)

2015-05-27 EDIT: Some great discussion in the comments, and based on that I added some info at the end.  So if you’re doing the TL;DR thing, don’t skip the very end.


Recently migrated Peoplesoft Campus Solutions database from 11.2.0.3 Solaris non-RAC non-ASM to 12.1.0.2 Linux ASM RAC (yes really, all at once). This is a mixed workload database, OLTP and Peoplesoft batch jobs. Seeing a lot of “latch free” waits during job runs and more than you would expect even during low times when no jobs are running. This is odd, usually there is something after “latch free: ???” since 10g and also we have never seen latching issues before on 11g with the same workload. It is also clear that it is having a significant performance impact because certain jobs are taking many times longer to run when these waits are high. The jobs (like many delivered peoplesoft jobs) cause a fair amount of blocking which is acceptable to the business side when they last 11 minutes but not when they last an hour — so much blocking occurs with the job having latch free waits at the root that eventually there are multiple-participant deadlocks and the job fails. And of course regular users are being blocked for a long time so web usage is heavily impacted.

AWR showed that the latch free wait was actually 100% “Result Cache: RC Latch.”

> select name, gets, misses, sleeps, immediate_gets, wait_time/1000 wait_time_ms
     from v$latch
     where name like 'Result Cache%'
/
NAME                                 GETS     MISSES     SLEEPS IMMEDIATE_GETS WAIT_TIME_MS
------------------------------ ---------- ---------- ---------- -------------- ------------
Result Cache: RC Latch           97101269    1106885     995211              0    128375216
Result Cache: SO Latch             124798         15          0              0            0
Result Cache: MB Latch                 93          0          0              0            0

That’s ugly. Ok, so we’re having problems latching on the result cache. The question is, why are we using the result cache at all?

> show parameter result
NAME                                 TYPE                              VALUE
------------------------------------ --------------------------------- ------------------------------
client_result_cache_lag              big integer                       3000
client_result_cache_size             big integer                       0
result_cache_max_result              integer                           5
result_cache_max_size                big integer                       330304K
result_cache_mode                    string                            MANUAL
result_cache_remote_expiration       integer                           0

result_cache_mode is set to MANUAL so (according to 12c docs) only the results from queries that use the RESULT_CACHE hint should be stored in the result cache.

I verified that none of the user or job queries were using the result_cache hint. A search of gv$sql looking for ‘%result_cache%’ in sql_text was illuminating — 400K (fluctuates) sql statements that contained the result_cache hint. They turned out to all be from dynamic sampling/statistics, as can be determined by the presence of the DS_SVC hint — here’s an example:

2015-04-11/14:30:23                                       a8gdt9zqk2b3m
SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring optimizer_features_enable(default) no_parallel result_cache(snapshot=3600) */ SUM(C1) FROM (SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "PS_SSR_TSRSLT_STAT")       */ 1 AS C1 FROM "PS_SSR_TSRSLT_STAT" SAMPLE BLOCK(1.37542, 8) SEED(1
)  "PS_SSR_TSRSLT_STAT" WHERE ("PS_SSR_TSRSLT_STAT"."REPORT_REQUEST_NBR"='003125267') AND ("PS_SSR_TSRSLT_STAT"."STRM"='1137') AND ("PS_SSR_TSRSLT_STAT"."REQUEST_SEQ_NBR"=1) AND ("PS_SSR_TSRSLT_STAT"."REPORT_NUMBER"=1)) innerQuery

There were ZERO queries in gv$sql that used the result_cache hint and were NOT from dynamic sampling.

OK, why are we using dynamic sampling so much, I thought things were at the defaults. I know 12c has a new level of 11 or AUTO which mean “the optimizer will use dynamic statistics to verify cardinality estimates for *all* SQL operators, and it will determine an internal time limit to spend verifying the estimates.” (http://docs.oracle.com/database/121/REFRN/refrn10140.htm#REFRN10140) But I did not set my database to 11/AUTO.

> show parameter optimizer_dynamic_sampling
NAME                                 TYPE                              VALUE
------------------------------------ --------------------------------- ------------------------------
optimizer_dynamic_sampling           integer                           2

optimizer_dynamic_sampling is set to 2. From 12c docs: 2 means “Use dynamic statistics if at least one table in the statement has no statistics. This is the default setting.” So, I picked out lots of tables from the dynamic sampling sql statements and made sure they had stats, re-ran to be sure, and then watched for them to be dynamically sampled again (should not be). Dynamic sampling statements for these tables continued to show up in gv$sql with “LAST_LOAD_TIME” equal to the present time, so dynamic sampling was continuing to run AND store its sampling statements in the result cache.

At this point I requested a full rolling restart of the Peoplesoft application tier to make sure we did not have some persistent “dirty” sessions where something/someone had issued a session-level dynamic sampling hint at level . This would recreate all the sessions. This did not get rid of the issue.

Next step, I decided to test turning off dynamic sampling to prove the theory beyond a doubt. Before I did this, I ran a query that looked at that last 60 seconds of gv$sql LAST_LOAD_TIMEs. There were hundreds of results. I had also run this query on and off for the past hour and it always had lots of results, clearly there was a ton of dynamic sampling going on. Then I ran this:

alter system set optimizer_dynamic_sampling=0 scope=both sid='*';

After a few minutes, the query consistently returned zero rows for the past 60 seconds. This stayed the same until I turned dynamic sampling back on at level 2:

alter system set optimizer_dynamic_sampling=2 scope=both sid='*';

Instantly there were over 1000 dynamic sampling queries in gv$sql with a last_load_time in the past 60 seconds.

This doesn’t make sense based on what the docs say about level 2, but ok, let’s assume *for now that this is a good thing and take a look at the result cache / tune it.

> SET SERVEROUTPUT ON
> EXECUTE DBMS_RESULT_CACHE.MEMORY_REPORT

R e s u l t   C a c h e   M e m o r y   R e p o r t
[Parameters]
Block Size          = 1K bytes
Maximum Cache Size  = 330304K bytes (330304 blocks)
Maximum Result Size = 16515K bytes (16515 blocks)
[Memory]
Total Memory = 339190216 bytes [2.055% of the Shared Pool]

... Fixed Memory = 108416 bytes [0.001% of the Shared Pool]
... Dynamic Memory = 339081800 bytes [2.054% of the Shared Pool]
....... Overhead = 850504 bytes
....... Cache Memory = 330304K bytes (330304 blocks)
........... Unused Memory = 0 blocks
........... Used Memory = 330304 blocks
............... Dependencies = 2 blocks (2 count)
............... Results = 330302 blocks
................... SQL     = 39452 blocks (39452 count)
................... Invalid = 290850 blocks (290850 count)

Ok, that’s completely full / has used its max allowed memory. Perhaps it needs more (and I did find significant CANCELLED requests where the shared pool wanted to GROW and the buffer cache said NO from the time period where these latch free waits were the worst). However, before we assume that let’s see what we can see about the efficiency of the use of the result cache.

> select trunc(to_date(creation_timestamp, 'MM/DD/YYYY HH24:MI:SS'), 'MI') MIN, count(*)
from V$RESULT_CACHE_OBJECTS
where creation_timestamp > sysdate-1/24
group by trunc(to_date(creation_timestamp, 'MM/DD/YYYY HH24:MI:SS'), 'MI')
order by trunc(to_date(creation_timestamp, 'MM/DD/YYYY HH24:MI:SS'), 'MI')
/
...
04/11/2015 15:17:00        212
04/11/2015 15:18:00        188
04/11/2015 15:19:00        213
04/11/2015 15:20:00        168
04/11/2015 15:21:00        109
04/11/2015 15:22:00        199
04/11/2015 15:23:00        194
04/11/2015 15:24:00        153
04/11/2015 15:25:00        149
...

That’s quite a bit of churn for something that’s meant to be reused. Let’s zoom out:

> select trunc(to_date(creation_timestamp, 'MM/DD/YYYY HH24:MI:SS'), 'HH24') HOUR, count(*)
from gV$RESULT_CACHE_OBJECTS
group by trunc(to_date(creation_timestamp, 'MM/DD/YYYY HH24:MI:SS'), 'HH24')
order by trunc(to_date(creation_timestamp, 'MM/DD/YYYY HH24:MI:SS'), 'HH24')
/
HOUR                  COUNT(*)
------------------- ----------
03/27/2015 03:00:00          2
03/28/2015 03:00:00          2
03/30/2015 03:00:00          2
04/10/2015 15:00:00      26881
04/10/2015 16:00:00      88926
04/10/2015 17:00:00      23122
04/10/2015 18:00:00      15636
04/10/2015 19:00:00      20075
04/10/2015 20:00:00       9660
04/10/2015 21:00:00       6549
04/10/2015 22:00:00      14950
04/10/2015 23:00:00      18235
04/11/2015 00:00:00       6855
04/11/2015 01:00:00      17711
04/11/2015 02:00:00       3656
04/11/2015 03:00:00      48318
04/11/2015 04:00:00      54292
04/11/2015 05:00:00     124391
04/11/2015 06:00:00      76643
04/11/2015 07:00:00      81864
04/11/2015 08:00:00      97948
04/11/2015 09:00:00      56916
04/11/2015 10:00:00      22606
04/11/2015 11:00:00      31104
04/11/2015 12:00:00      61384
04/11/2015 13:00:00      21226
04/11/2015 14:00:00      31671
04/11/2015 15:00:00      30287

In a word, wow. With that kind of creation rate, no wonder we are getting waits, since the first run of result cached queries are often slower than without since they have to write to the result cache. We are probably burning through the result cache’s LRU really fast, making it very inefficient — but let’s prove that:

> select count(*) from gv$result_cache_objects;
  COUNT(*)
----------
    990912

Result cache is specific to node. I did verify that the objects were evenly split across nodes (about 300K each).

> SELECT scan_count, count(*)
FROM gv$result_cache_objects
group by scan_count
ORDER BY 2,1
/
SCAN_COUNT   COUNT(*)
---------- ----------
...
         7       6970
         2       8765
         3      64562
         1     161017
         0     725298

Ouch. So of nearly a million objects in the result cache across the cluster, 725K were never used (scanned). I ran this over time a few times to make sure I wasn’t seeing a burst of adds and then reuse after that. It didn’t fluctuate. (The evenness of the per-minute object creation rate backs this up.) Of the other 270K or so, 161K were scanned once. Even though we have creation timestamps going back to 15:00 yesterday, we haven’t seen much use of the cached results.

This view also lets us see the LRU position (LRU_NUMBER) of any object in the result cache, let’s see how spread out it is:

> SELECT lru_number, count(*)
FROM gv$result_cache_objects
group by lru_number
ORDER BY 2
/
...
  73968536          1
  73968524          1
  73968520          1
         0     946702

So almost every object is at the start of the LRU. This backs up the creation timestamp info above showing that they are cycling through very quickly.

At this point I went back to the docs because I don’t want to have to entirely turn off dynamic sampling, and it is not working the way I would expect (set to level 2 but acting like it’s on the new 12c AUTO). That’s when I found this text (http://docs.oracle.com/database/121/REFRN/refrn10140.htm#REFRN10140):

When this parameter is set to 11, the optimizer will use dynamic statistics to verify cardinality estimates for all SQL operators, and it will determine an internal time limit to spend verifying the estimates.

There are cases where the optimizer will automatically decide to use 11, for example:

The query will run in parallel.

The query was executed before and its history is available (from the cursor cache, Automatic Workload Repository, or the SQL Management Base).

So what this basically says is that even if you set dynamic sampling to the value you want, the Oracle optimizer *can* choose to ignore it if it has ever been run before. Given the lack of reuse of the objects in the result cache, it seems like this should not apply (probably has not seen these queries before) but this is close enough to what is happening for it to be plausible that it is ignoring my level 2 setting and using dynamic sampling. Whether that is an optimizer bug or by design remains to be seen.

The 12c doc on OPTIMIZER_DYNAMIC_SAMPLING linked over to the Oracle Database SQL Tuning Guide so I went to see what it had to say on OPTIMIZER_DYNAMIC_SAMPLING (http://docs.oracle.com/database/121/TGSQL/tgsql_astat.htm#TGSQL455). This was illuminating, missed it before when re-reading the level definitions:

13.1.3 Disabling Dynamic Statistics
In general, the best practice is not to incur the cost of dynamic statistics for queries whose compile times must be as fast as possible, for example, unrepeated OLTP queries.

… and goes on to tell you how to turn off dynamic statistics entirely: ALTER SESSION SET OPTIMIZER_DYNAMIC_SAMPLING=0;

So we don’t have a size problem. We have a revolving door LRU problem, as well as the wrong type of query being put into the result cache. Things are put in and pushed out so fast they’re never used, and additionally we are putting in dynamic sampling queries about queries with literals that will almost never be repeated. Oracle’s tuning guide basically says if you have a lot of OLTP / small queries running then turn dynamic statistics off entirely. This is unfortunate in a mixed-workload database like Peoplesoft, because our jobs can benefit from being able to sample tables on the fly. Especially with Peoplesoft where jobs are constantly deleting and reloading “temporary” (real) tables and statistics are often incorrect mid-job.

I opened an SR and worked with Oracle Support on this issue. They confirmed my tests and said that use of dynamic statistics in spite of the OPTIMIZER_DYNAMIC_SAMPLING setting is expected behavior in 12c. They pointed me to two unpublished bug fixes released in 11.2.0.4 (you can find the bug number in the release notes but there’s no info on them, 8823383 and 12914055) which enabled dynamic statistics in a number of cases. The example they gave was if the query runs in parallel and there table is sufficiently large, based on a threshold value, but that is not the only case.

When I asked for clarification on the cases that might trigger the use of dynamic statistics, the explanation was that dynamic statistics is now part of AS (Adaptic Statistics) which is comprised of Adaptive Dynamic Statistics (ADS), Automatic Reoptimization, and SQL Plan Directives, and they all interact. One example was that if an adaptive plan is used, dynamic statistics might also be used. ADS might also be triggered when Automatic Reoptimization is used. It makes sense that these other functions would want to use dynamic statistics, it is just confusing to see Oracle appear to ignore your setting.

Suggested fixes for this issue were to hint dynamic sampling off, or to turn it off entirely. Setting OPTIMIZER_DYNAMIC_SAMPLING=0 does still disable ADS entirely. You can also turn the result cache off. But there is a much more targeted solution that Oracle Support provided — you can set “_optimizer_ads_use_result_cache” = FALSE which will stop only dynamic statistics from using the result cache! That’s exactly what I was looking for.

alter system set "_optimizer_ads_use_result_cache" = FALSE;

With this issue, that amounts to a magic bullet. I’ll update as we roll it out if there are any issues, but I think at least for Peoplesoft databases and any other applications that use a lot of literals and have an at least partial OLTP workload, when upgrading to 11.2.0.4 or 12c this parameter is worth serious consideration.

Note: I did try turning on AUTO/11 for a short time under the same workload I was monitoring while researching, and it ran ADS on far more queries than with ADS set to 2. So while the optimizer is using ADS on more than you would expect with ADS set to 2, it is not the same as setting it to AUTO/11.

2015-05-27 EDIT: There’s some great back and forth in the comments that I strongly encourage you to read.  Christian Antognini, an expert in all the new or updated Adaptive features in 12c (among other things), had some significant insights into what is and is not happening with dynamic statistics.  Thanks Chris!  Here’s a summary that I wrote based on his comments (but do read through the comments):

There are dual paths by which dynamic statistics (DS) is triggered. On its own / by itself — DS behaves as expected and documented. But because of SQL Plan Directives (SPD), DS can be kicked off at any time if a SPD exists. Effectively “dynamic sampling/statistics is running more often in 12c” is a correct statement but the important part is that it is not triggered by DS, but by SPD. 12c dynamic statistics/sampling on its own does not trigger any more often than it would have in 11g for a given value of  OPTIMIZER_DYNAMIC_SAMPLING.

 

Advertisements

19 thoughts on “Latch free waits, Dynamic Statistics, and the Result Cache

  1. Hi

    > One example was that if an adaptive plan is used, dynamic statistics might also be used.

    I’m surprised to read this. I never saw it and, honestly, I think that it makes no sense whatsoever to use ADS in case of adaptive plans.

    > ADS might also be triggered when Automatic Reoptimization is used.

    ADS is not directly triggered by automatic reoptimization. In case of automatic reoptimization the query optimizer creates an SPD. Then, it’s the SPD that triggers ADS (also at level 11, since 12.1.0.2).

    So, AFAIK, ADS is only triggers through the “obvious reasons”, in case PX is used, and when SPD instructs the query optimizer to do so.

    Just my 2 cents,
    Chris

    • tylerv says:

      Christian — obviously your reputation is very solid and well-known and I don’t feel I can argue on the exact chain of events. Thank you for sharing your knowledge! What I stated for how ADS is triggered was directly from an Oracle SR, but we have all experienced that expertise and/or writing skills from that source vary widely.

      If I take a step back to what I actually observed, it boils down to that however it gets triggered, without setting OPTIMIZER_DYNAMIC_SAMPLING to 11 (with it set to 2), what used to be dynamic sampling *seems to be* kicking off much more often than it used to before 11.2.0.4 at that same setting. While not having a strictly apples-to-apples test to look at, this is a database that I was intimately familiar with when it ran on 11g (frequent tuning / observation) and I think I would have noticed the number of dynamic sampling queries and/or the waits caused by the result cache being heavily used.

      Furthermore, when looking at the queries (1000-2000 per minute) that ADS was putting in the result cache in 12c in the upgraded version of this same database, I observed that none of them were parallel, and that I could not prevent dynamic statistics from firing by making sure statistics were up-to-date on all participating tables/indexes. This is what led to my observation that it seemed that there were an oddly high number of queries being dynamically sampled given: 1) what I have been told by Oracle, and 2) what is documented by Oracle about when ADS will fire off even when OPTIMIZER_DYNAMIC_SAMPLING is set to lower numbers (ie. 2) in 12c. I definitely recognize that this is not a 100% proven fact or completely solid test case, just my observation.

      I wanted to share this because it involved a number of things I had not personally dug until before and the dig was interesting (to me anyway), and the latch free waits seemed like something that others might encounter. Regardless of why things are happening exactly as they are — the _optimizer_ads_use_result_cache setting recommended by Oracle did prevent the heavy use of the result cache, and it seems to have eliminated the latch free waits we were previously experiencing.

      Thanks again for commenting!

      • Hi

        > what used to be dynamic sampling *seems to be* kicking off much
        > more often than it used to before 11.2.0.4 at that same setting.

        Because of SPD, this is definitively the case.

        > I think I would have noticed the number of dynamic sampling queries
        > and/or the waits caused by the result cache being heavily used.

        Excluding SPD from the discussion, it’s important to point out that up to and including 12.1.0.1, PX only triggers DS (2 < level I observed that none of them were parallel

        In that case is due to SPDs.

        > I could not prevent dynamic statistics from firing by making sure
        > statistics were up-to-date on all participating tables/indexes.

        One of the main differences between DS and ADS is that ADS can always kick in. Yesterday I saw it for a query as simple as “SELECT 1 FROM dual” 😦

        > Regardless of why things are happening exactly as they are — the
        > _optimizer_ads_use_result_cache setting recommended by Oracle did
        > prevent the heavy use of the result cache, and it seems to have
        > eliminated the latch free waits we were previously experiencing.

        Yeap, the parameter does what is expected to do…

        Best,
        Chris

      • tylerv says:

        Chris — now believe I see the points you were making. I appreciate your continued attention. I would like to add an edit in my post with this information to avoid spreading any misinformation / incorrect guesses. I will try to restate it here in my own words and if I get it wrong, if you have time, please let me know.

        There are dual paths by which dynamic statistics is triggered. On its own / by itself — DS behaves as expected and documented. But because of SPD, (A)DS can be kicked off at any time if a SPD exists (probably should be called ADS in this case, but I think it is the same action, ie. statistics gathering on the fly?) Effectively “dynamic sampling/statistics is running more often in 11.2.0.4+” is a correct statement but the important part is that it is not triggered by DS, but by SPD. On its own, DS in 11.2.0.4/12c does not *trigger* any more often than it would have < 11.2.0.4 (unless you set it to 11).

        Is the previous paragraph correct? Thanks again!

      • > Is the previous paragraph correct?

        The version specific information, probably because of my last (misleading reply), isn’t accurate. In fact SPD is only available in 12c. Also note that three (11.2.0.4, 12.1.0.1 and 12.1.0.2) slightly different implementation of ADS exist. This fact doesn’t make things easier.

        Let me comment the different sentences…

        > But because of SPD, (A)DS can be kicked off at any time if a SPD
        > exists (probably should be called ADS in this case, but I think it is
        > the same action, ie. statistics gathering on the fly?)

        11.2.0.4: no SPD exist
        12.1.0.1: DS at the level configured through optimizer_dynamic_sampling is used
        12.1.0.2: ADS (level=11) is used

        > Effectively “dynamic sampling/statistics is running more often in
        > 11.2.0.4+” is a correct statement but the important part is that it
        > is not triggered by DS, but by SPD.

        I would replace “11.2.0.4+” with “12.1.0.1+” (since SPD don’t exist in 11.2.0.4).

        HTH
        Chris

      • tylerv says:

        First off, thank you again Chris!

        On 11.2.0.4 – Oracle Support told me that there were:

        > two unpublished bug fixes released in 11.2.0.4 (you can find the bug number
        > in the release notes but there’s no info on them, 8823383 and 12914055) which
        > enabled dynamic statistics in a number of cases. The example they gave was if
        > the query runs in parallel and there table is sufficiently large, based on a threshold value

        This was even with OPTIMIZER_DYNAMIC_SAMPLING set to 2. Given that there is no SPD in 11.2.0.4, this would indicate that there are also times that SPD is not involved and DS could kick off more often — unless Oracle Support was incorrect about these two bug fixes? Unfortunately there’s no info available to me on MOS about them.

        Second, on 12.1.0.2 — I am running on this version. I tested during this issue to see what would happen if I set OPTIMIZER_DYNAMIC_SAMPLING to 11. It instantly quadrupled the number of queries that were being dynamically sampled. When I set OPTIMIZER_DYNAMIC_SAMPLING back to 2, the numbers went back down. This would be the other part of the dual path, correct, the non-SPD part — your list of versions and “is used” refers to “is used by SPD” … I think. So in 12.1.0.2, SPD uses ADS OPTIMIZER_DYNAMIC_SAMPLING = 11, no matter what NON-SPD DS is set to.

        I agree with your edit, for SPD / 11.2.0.4. I’m just curious if, given what Oracle Support said, there is likely *also* a change in non-SPD-triggered DS, in 11.2.0.4, that is worth mentioning. As long as it is clear this has nothing to do with SPD. Or if you think or know from your own testing that Oracle Support is incorrect.

        I think I have enough info to make a relatively good edit, if you’re tired of flogging this horse. 🙂 Thanks again for your help!

      • Hi

        > unless Oracle Support was incorrect about these two bug fixes?

        I can’t comment about 8823383. But, AFAIK, 12914055 controls whether level 11 is used instead of a level lower than 11. So, to me, it doesn’t do what they pretend it does.

        > So in 12.1.0.2, SPD uses ADS OPTIMIZER_DYNAMIC_SAMPLING = 11,
        > no matter what NON-SPD DS is set to.

        Yes!

        > given what Oracle Support said, there is likely *also* a change in
        > non-SPD-triggered DS, in 11.2.0.4, that is worth mentioning

        As I already wrote, I’m not aware that such a case exists. So, until somebody shows me a test case that proves it, I don’t believe that it exists.

        Best,
        Chris

  2. this is great info
    i was helping a large Univ in the southeast move from Peoplesoft Campus Solutions on 11.2.x non-RAC DBs to 12.1.0.2 on Linux, staying off RAC this time,

    • tylerv says:

      Thanks for your comment! It’s a fun challenge, isn’t it? I am using incremental cross-endian rman backups to datafile with transportable tablespaces, as well as Goldengate when needed to reduce downtime for the more critical databases. It works really well. The most time consuming part was creating the “prep” script that analyzes an 11g database for all possible angles that we need to cover in our migration.

      How did you do your migration?

    • tylerv says:

      Thanks for the idea, Franck! Are you suggesting looking for a few specific SPD that might be causing frequent ADS? Or disabling SPD completely? I haven’t done a deep look into these yet, just the standard reading about new features. Definitely need to do more reading and testing.

      Edit: have been reading more and read the column definitions of DBA_SQL_PLAN_DIRECTIVES and I see what you’re saying now.

      In this case the frequency of ADS wasn’t really the problem, or at least I don’t *know* that it’s a problem — it could be great that it’s triggering frequently. It’s only ADS’s heavy and mostly useless use of the result cache that was directly causing issues, the latch free waits, because of our workload type. The heavy use of the result cache could also be just fine if you have good use of binds / lots of query reuse.

  3. Thanks Tylerv ,

    I am not sure why I couldn’t find this article when I searched during an issue two months ago . I found this today when I am going through the issue again. I have opened a discussion in OTN as well https://community.oracle.com/thread/3952017

    We were also seeing a lot of latch free waits, in our case, optimizer_dynamic_sampling was increased to 7 after we had few queries not using dynamic sampling and going for sub optimal plans, still we have seen Result_cache latch waits.
    I eventually turned off Adaptive optimization features to avoid all the recursive sql , if I could have read this article prior would have tried disabling “_optimizer_ads_use_result_cache”.

    Thanks
    Rakesh

    • tylerv says:

      Too bad you didn’t find it earlier, but kudos for solving on your own! 🙂 You might want to check out the changes coming in 12.2 and the back-ported patches to bring those same changes to 12.1 … they will adjust the Adaptive parameters to try to alleviate some of the performance issues people have been having with Adaptive features causing excessive dynamic sampling (https://dban00b.wordpress.com/2016/11/25/oracle-back-port-patches-for-adaptive-performance-issues-released-for-12-1-dynamic-stats/).

      Also, you mention “a few” queries that you wanted to fix … you might want to look into creating SQL Profiles for individual queries (or just hint them if you can edit the app that calls them) with a hint to trigger dynamic sampling, if it’s only a few queries, rather than change the setting for the entire database. But depends on your entire load and how many small OLTP-ish and non-bind queries you have. That type of query does not do well with more dynamic sampling, since the DS time tends to be longer than the query runtime itself, which can lead to a huge combined slowdown (going from 1ms to 7ms but called a million times a day, for instance).

      In any case thanks for the comment and best of luck!

  4. Thanks Tylerv ,

    I encountered latch free waits same as in above case in another database ,disabling “_optimizer_ads_use_result_cache” eliminated all waits, must thank you for the above post 🙂

    Best Regards
    Rakesh

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s