在11.2.0.2的数据库上,我们遇到了oracle 11g的Bug 10082277 Excessive allocation in PCUR or KGLH0 heap of “kkscsAddChildNo” (ORA-4031)
Bug 10082277 Excessive allocation in PCUR or KGLH0 heap of “kkscsAddChildNo” (ORA-4031)
This note gives a brief overview of bug 10082277.
The content was last updated on: 26-AUG-2011
Click here for details of each of the sections below.
Affects:
Range of versions believed to be affected Versions BELOW 12.1
Versions confirmed as being affected
11.2.0.2
11.2.0.1
Platforms affected Generic (all / most platforms affected)
Fixed:
This issue is fixed in
12.1 (Future Release)
11.2.0.3 (Future Patch Set)
11.2.0.2.3 Patch Set Update
11.2.0.2 Bundle Patch 4 for Exadata Database
11.2.0.2 Patch 2 on Windows Platforms
11.2.0.1 Patch 11 on Windows Platforms
Description
Under certain circumstances the “perm” space in PCUR subheaps (11.2.0.1)
or KGLH0 subheaps (11.2.0.2) for cursors in the shared pool may continue
to grow over time with continual additions of memory of the type
“kkscsAddChildNo”.
Over time this can use excessive shared pool memory , even leading
to ORA-4031 errors.
Rediscovery Notes:
A heapdump of the SGA will show one or more very large
subheaps with a name of the form “PCUR^xxxx” or “KGLH0^xxxx”.
Within this subheap there is lots of “perm” space allocated.
If CPRM tracing is enable (event 10235 level 65536) then
the subheap dump shows the perm space as due to “kkscsAddChildNo”
allocations of memory.
Workaround
This issue requires that there is a lack of sharing of the
parent cursor. Hence a workaround, where possible, is to
address the reason for not sharing the parent cursor.
(eg: Check V$SQL_SHARED_CURSOR for why the parent is not
being shared).
该问题在我们的系统上表现为
1. 单个SQL占用了大量的shared pool memory, 这个例子中单个SQL就占用了1.7GB的shared pool memory
select VERSION_COUNT,SHARABLE_MEM from v$sqlarea where
hash_value=2038009379;
VERSION_COUNT SHARABLE_MEM
------------- ------------
96 1888704961
而在10.2.0.4的数据库上,同样的SQL只占用了4M左右大小的内存
select VERSION_COUNT,SHARABLE_MEM from v$sqlarea where
hash_value=2038009379;
VERSION_COUNT SHARABLE_MEM
------------- ------------
214 4216097
2. 大部分的内存都被parent cursor (cursor id为65535)占用了
通过Tanel的脚本curheaps.sql 可以查看各个child cursor的大小
SQL> @curheaps 2038009379 65535
old 20: KGLNAHSH in (&1)
new 20: KGLNAHSH in (2038009379)
old 21: and KGLOBT09 like ('&2')
new 21: and KGLOBT09 like ('65535')
KGLNAHSH KGLHDPAR CHILD# KGLHDADR
KGLOBHD0 SIZE0 SIZE1 SIZE2 SIZE3
---------- ---------------- ---------- ---------------- ----------------
---------------------- -------- -------- --------
KGLOBHD4 SIZE4 SIZE5 KGLOBHD6 SIZE6 SIZE7
STATUS
---------------- -------- -------- ---------------- -------- --------
----------
2038009379 0000000F3BC53E78 65535 0000000F3BC53E78
0000000F5BF1E648 *1883443712 *0 0 0
00 0 0 00 0
0 1
old 10: KSMCHDS = hextoraw('&v_curheaps_kglobhd0')
new 10: KSMCHDS = hextoraw('0000000F5BF1E648')
HEAP CLASS ALLOC_COMMENT BYTES CHUNKS
----- -------- ---------------- ---------- ----------
HEAP0 perm permanent memor *1898642464 *474659
HEAP0 free free memory 26531224 473772
HEAP0 freeabl kksfbc:hash1 4872 96
HEAP0 freeabl kgltbtab 912 6
old 10: KSMCHDS = hextoraw('&v_curheaps_kglobhd4')
new 10: KSMCHDS = hextoraw('00')
no rows selected
old 10: KSMCHDS = hextoraw('&v_curheaps_kglobhd6')
new 10: KSMCHDS = hextoraw('00')
no rows selected
另外该问题只发生在client的jdbc driver升级到11g以后,jdbc driver为10g的时候没有这个问题,估计和shared cursor sharing有关系。
SQL的 parent cursor不断增长一方面会使得shared pool的内存耗尽,另外如果发生hard parse耗时非常严重,可能会导致大量的和parse相关的等待时间,例如“cursor: mutex S”。
Oracle有相关的patch可以下载,打上patch后问题解决。
Eagle Fan oracle 11g, oracle bugs