Wednesday, May 24, 2006

Solution to Dreaded ORA-07445 error

Finally got a response from Oracle, turned out they had put it with their US office who don't start work until 13:00 our time.

The eventual response was:
23-MAY-06 14:49:43 GMT

DATA COLLECTED
===============
On alert.log we can see some:
ORA-07445: exception encountered: core dump [00000001009B0B04] [SIGSEGV] [Address not mapped to obj
ect] [0x000000000] [] []

Trace file uploaded shows:
/opt/oracle/app/oracle/admin/dchr/udump/dchr_ora_5662.trc
Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production
JServer Release 9.2.0.6.0 - Production

*** SESSION ID:(37.1317) 2006-05-18 15:34:04.409
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object), addr: 0x0, PC: [0x1009b0b04, 00000001009B0B04]
*** 2006-05-18 15:34:04.410
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [00000001009B0B04] [SIGSEGV] [Address not mapped to object] [0x000000000
] [] []
Current SQL statement for this session:
select all count(dm_type.r_object_id) "the_count" from dm_type_sp dm_type where (dm_type.name=:"SYS_B_0" and ex
ists (select r_object_id from dm_type_r where dm_type.r_object_id = r_object_id
and attr_name=:"SYS_B_1"))
----- Call Stack Trace -----
ksedmp ssexhd sigacthandler evaopn2 qerixGetKey qerixStart qertbStart qerjoFetch qerstFetch
qergsFetch qerstFetch opifch2 opiall0 kpoal8 opiodr ttcpip opitsk opiino opiodr opidrv sou2o
main


******************** Cursor Dump ************************
Current cursor: 1, pgadep: 0
pgactx: 39b95a998 ctxcbk: 0 ctxqbc: 0 ctxrws: 396c5ba70
Explain plan:
Plan Table
--------
-------------------------------------------------------------------------------------------------------------------------
| Operation | Name | Rows | Bytes | Cost | TQ |
IN-OUT| PQ Distrib |Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------
| SELECT STATEMENT | | 0 | 0 | 2 |
| | | | |
| SORT AGGREGATE | | 1 | 47 | 0 | | | | |
|
| NESTED LOOPS SEMI | | 1 | 47 | 2 | | | | | |
| INDEX UNIQUE SCAN | D_1F01487D80000002 | 1 | 15 | 1 | | | | |
|
| TABLE ACCESS BY INDEX ROWID | DM_TYPE_R | 62 | 1984 | 1 | | | | | |
| INDEX RANGE SCAN | D_1F01487D80000009 | 109 | 0 | 1 | | | |
| |
-------------------------------------------------------------------------------------------------------------------------
Cursor Dump:
----------------------------------------
Cursor 1 (ffffffff7cd60418): CURFETCH curiob: ffffffff7cc465d0
curflg: 46 curpar: 0 curusr: 0 curses 392779298
cursor name: select all count(dm_type.r_object_id) "the_count" from dm_type_sp dm_type where (dm_typ
e.name=:"SYS_B_0" and exists (select r_object_id from dm_type_r where dm_type.r
_object_id = r_object_id and attr_name=:"SYS_B_1"))
child pin: 3992242b8, child lock: 397e020c8, parent lock: 3a21f62f0
xscflg: 80110476, parent handle: 39d351938, xscfl2: 6a00409
Dumping Literal Information
Bind Pos: 0, Bind Len: 12, Bind Val:
Bind Pos: 1, Bind Len: 9, Bind Val:
nxt: 4.0x00000738 nxt: 3.0x00000710 nxt: 2.0x00000220 nxt: 1.0x00000720
Cursor frame allocation dump:
frm: -------- Comment -------- Size Seg Off
bhp size: 160/600
whp size: 8688/12440

bind 0: dty=1 mxl=32(12) mal=00 scl=00 pre=00 oacflg=10 oacfl2=100 size=32 offset=0
bfp=ffffffff7cc469b0 bln=32 avl=12 flg=09
value="dmi_workitem"
bind 1: dty=1 mxl=32(09) mal=00 scl=00 pre=00 oacflg=10 oacfl2=100 size=32 offset=0
bfp=ffffffff7cc46960 bln=32 avl=09 flg=09
value="a_wq_name"

.
eos (end of section)
.
ISSUE VERIFICATION
===================
Verified on alert.log and trace file provided

.
eos (end of section)
.
RESEARCH
=========
Found bug
Bug 4098853
Abstract: Dump (qerixGetKey->evaopn2) using COUNT(NOT_NULL_COLUMN)
with a view
Fixed-Releases: 9207 A105 A201
Tags: CBO DUMP R9206 REGRESSION
Details:
This problem is not a true regression but is a problem which
is exposed by the fix for Bug 3440097 in 9.2.0.6.
SQL selecting COUNT(NOT_NULL_COLUMN) from a view may dump
under qerixGetKey->evaopn2 (or qerhjSplit->evaopn2)
Workaround:
Set Event 10122 and ensure the SQL is reparsed with this event set.

Another Workaround:
query_rewrite_enabled=TRUE


.
eos (end of section)
.
CAUSE DETERMINATION
====================
Bug 4098853
.
eos (end of section)
.
CAUSE JUSTIFICATION
====================
Stack trace and behaviour matches this bug

.
eos (end of section)
.
POTENTIAL SOLUTION(S)
======================
We may apply 9.2.0.7 patchset where bug is already fixed, or try workarounds provided
Workaround: Set Event 10122 and ensure the SQL is reparsed with this event set.

Another Workaround:
query_rewrite_enabled=TRUE

.
eos (end of section)
.
POTENTIAL SOLUTION JUSTIFICATION(S)
====================================
As per bug fix

.
eos (end of section)
.
SOLUTION / ACTION PLAN
=======================
Researching about this issue I was able to identify the Bug 4098853. The fix will be included in patchset 92070.
As workaround, you can set the Event 10122 or query_rewrite_enabled=TRUE and ensure the SQL is reparsed with this event set.

1. alter session set query_rewrite_enabled=TRUE or

2. alter session set events '10122 trace name context forever'

3. alter system flush shared_pool;

3. Try to reproduce the issue.
.
eos (end of section)
.

.
KNOWLEDGE CONTENT
==================
Platform / Port Specific? = NO

Knowledge content not created because the following note already addresses this issue: Bug 4098853


By this time we'd already found a work around which was to apply a patch to the application that avoided the error by doing the query a different way. The bug appears (if I'm reading the response correctly) when you try to do a count on a not null column.

We've decided to move up our plans to go to 10g2, which has the bug fix, to get around it.

2 comments:

Anonymous said...

Hi Stephen,

Thank you for posting your experiences for others :) May I ask you if you found the offending query and what the before-and-after versions looked like?

Stephen Booth said...

So far as I am aware there were a few similar queries that we causing this error, one example was:
select all count(dm_type.r_object_id) "the_count" from dm_type_sp dm_type where (dm_typ
e.name=:"SYS_B_0" and exists (select r_object_id from dm_type_r where dm_type.r
_object_id = r_object_id and attr_name=:"SYS_B_1"))


The error is caused when you try to use the count() function on a not null column (dm_type.r_object_id in this case).