Thursday, November 30, 2006

du and ls report very different file sizes for Oracle Tempfiles on NFS v3 mounted volume on NetApp filer under Solaris 9

We have a NetApp FAS filer attached via Gigabit Ethernet to a Sun V440 running Oracle databases the files for which are stored on the filer The NetApp volume is mounted under NFS version 3 at $ORACLE_BASE/oradata. Under that directory are 7 directories .snapshot, dctm, dchr, dctm_backup, dchr_backup, dctm_nobackup, dchr_nobackup.

.snapshot is where the NetApp filer stores snapshots of the volume.

The server runs two databases, dctm and dchr. The dctm and dchr directories hold the datafiles (and a small tempfile) for the appropriate database, *_backup holds the RMAN and cold backup sets for the appropriate databases. *_nobackup holds the bulk of the tempfiles (temporary tablespace, used for sorting) for the appropriate database (when we do cold backups we don't bother to backup the tempfiles).

For dctm, dchr and *_backup both du and ls report the same file sizes for both data and tempfiles. For *_nobackup ls reports the correct sizes (as specified in the Oracle data dictionary) but du reports a much smaller size for both the individual file size and the total for the directory.

v$sort_segment.max_blocks * db_block_size approximates to the size reported by the total size reported by du for the tempfiles in *_nobackup.

We know that ls gets file sizes by reading what is recorded in the inodes, or at least that's what the documentation says. The documentation for du doesn't specify how it gets it's space measurement but we've found pages on the web that indicate that it does this by counting allocated blocks. Normally these two ways of getting the file size should return the same result, the only exceptions being hard links (multiple directory entries point to the same file on the same file system) and sparse files. Neither directory contains any hard links and the tempfiles do not conform to our understanding of sparse files due to being formatted at creation.

Our current hypothesis is that the formattig does not write to every block and because of the way the NetApp filer handles block level writes it only bothers to allocate a block if that block is actually written to but still records the requested filesize in the inode, blocks that have not yet been written to still exist but only in a virtual state (ghost blocks?). With a NetApp filer blocks within the same file are not necessarily stored contiguously and a bloc that is read, changed and rewritten will probably not be written back to the same place they were originally read from. The Oracle buffer cache is likely to contain blocks from various segments read from various tablespaces and many different places on the disks. When Oracle flushes out dirty blocks the NetApp filer stores them in it's own cache. When the NetApp flushes it's own cache it will write as many blocks as possible into the first free space it can find, regardless of where they came from, so you get blocks from various segments in various files written next to each other on the disk, almost certainly somewhere other than where they were read from. The original blocks are then freed. Or at least that's how they explained it to me when I asked.

We are trying to verify this with NetApp but, what we believe has happened here is that when the files were created the NetApp wrote the inodes with the correct size but only allocated blocks on disk for the blocks that have actually been used (formatting of the file and the temp segments that have been written out between file creation and now), the remaining blocks are logically allocated but the physical blocks are still unallocated.

Whilst this no doubt improves speed of creation the file it would probably cause a performance hit on sorts when more sort space needs to be allocated as when blocks within the file are allocated to extents by Oracle the filer has to allocate them at filesystem level. We haven't noticed anything so far so presumably any hit is masked by the caching, it may become relevant if we get to the point where the cache becomes a bottleneck.

We await a response from NetApp.

Incidentally, the reason we don't backup the tempfiles is that the OS/Storage team were complaining about how long the backups took and said we were at risk of blowing the backup window (we weren't). After investigating I found that someone had changed the second tempfile in each database to autoextend and they had grown to 64GB each. Quite suprising as each database had under 5Gb of data (including system stuff, i.e. data dictionaty &c). After some investigation I found a lot of SQL was being run that was very unselective and returned very large datasets (over 10% of the total data) which would be sorted then sent to the application for display. When you have hundreds of users doing this concurrently you rapidly reach the point where many copies of the same data are being sorted at the same time, as Oracle cannot share sorts between sessions each copy is in a different sort segment, especially when many of the queries use joins that require sorts. We removed the large tampfiles and replaced them with 16*2Gb tempfiles in a separate directory for each database. These are not backed up. In the event of a recovery situation, and we cannot use the RMAN backups we take each night, we drop the missing tempfiles then recreate them.

The tempfiles in dctm and dchr are the originals and will have been filled before the second tempfile extended hence all blocks have been allocated at the fielsystem level.

Thursday, November 23, 2006

How people find this blog

I was just looking at my stat counter analysis for this blog, it's usually a good way to find out if anyone has linked to me. From the results it looks like the vast majority of people (around 97%) find this blog through a search engine, virtually all (over 99%) through Google. Most common search strings are or include "ORA-07445", "opmnctl failed to start process" and "ORA-01031".

Tuesday, November 21, 2006

Never volunteer

During his presentation on "Inside RAC" Julian Dyke appeared to suffer laptop difficulties so asked for 4 volunteers to assist in a demonstration. Being the helpful soul (i.e. idiot) that I am, I volunteered. Mr Dyke did seem remarkably wellprepared in terms of props, almost as if he had been expecting his laptop to have problems :-)!

Photographic evidence (courtesy of Connor McDonald) can be found here and here. I'm the fat, ugly, balding, bearded one on the far left, aka Node 1. Node 2 was Babbette Turner-Underwood, Node 3 (aka Master) was Murali Vallath and Node 4 was Carel-Jan Engel.

Sunday, November 19, 2006

Histogram problems

A question just appeared on the Oracle-L mailing list about whether there are any issues with using histograms. I've seen a number of presentations over the last couple of years that have warned about issues where you have histograms on highly skewed data and use bind variable (or have cursor_sharing=force, which synthesises bind variable from literals in queries), so I responded:
On 19/11/06, A Joshi wrote:
> Hi,
> About use of histograms : I think histograms are useful for indexes on
> columns that are not very selective. However I came across note 1031826.6 on
> metalink. About maintenance and space cost. I think space cost is negligible
> and can be ignored. About maintenance : does it mean statistics need to be
> gather often? Or does it mean some other cost.
>
> Question : Is there any other overhead or any other negative impact of using
> histograms?

One downside I've been hearing a lot about over the last couple of
years results from Bind Variable Peeking (link to the relevant Oracle Documentation). When a where clause contains bind variables (either explicitly declared or synthesised due to cursor_sharing=force) the optimizer can see what they are and use the information to build an execution plan
during the hard parse phase. If the application executes the same query again whilst the plan is still cached then it will reuse the same plan even if the bind variables have different values (no need to reparse so no peeking).


Suppose you have a table (t) where a field (thefield) has a number of possible values, one of which (say 'X') appears in 50% of the records whilst the others are uniformly distributed over the remaining 50%. Very highly skewed and obviously you would expect to get a different plan for the high frequency value (probably a full table scan) than you would for a low frequency one say 'Y' (probably an index read anbd access by row id). You have an index on thefield and have collected stats with histograms on that table and thefield.

If you don't use bind variables then a queries like:

select [field_list]
from t
where thefield='X';


and

select [field_list]
from t
where thefield='Y';


will both be hard parsed get different execution plans appropriate to the values used in the where clause.

Now suppose you rewrite the query replaciong the constants with a bind variable. The first time the query is parsed the optimizer will use the value of the bind variable and the histograms to work out the best execution plan. The next time it will reuse the same plan. This is fine if the values used in the bind variable are of similar
distribution but if they are not then you get issues.

Say the first time through the value is 'X' (appears in 50% of rows) the optimizer will peek the bind variable and probably go for a full table scan as that is the most efficient way to pull back that proportion of the rows thanks to the wonders of multiblock reads and read ahead caching in the filesystem/SAN. The second time through the value is 'Y' (appears in very few rows), because there is already an
execution plan cached the optimizer won't do the hard parse and so won't peek the bind variable, it will just use the existing plan (a full table scan). A full table scan is a really inefficient way to read a small proportion of the rows in a table.

Reverse the order of the queries ('Y' first then 'X') and you have the database doing an index read then access by rowid to retrieve 50% of the rows in a table. Hitting at least 50% of the index and at least 50% of the table blocks (probably as single block reads), a really inefficient way to a large proportion of the rows in a table.



>
> Is it advisable to use histograms just for some tables and some specific
> columns or is it OK to just set database wide?
>

It depends on your app and your data. If the data is highly skewed and the app uses bind variables (or you have cursor_sharing=force) then possibly not.


Hopefully that's OK.

Tuesday, November 14, 2006

UKOUG 2006 - Day 1

Today was the first day of the United Kingdom Oracle User Group 2006 conference.

The day itself was pretty good, lots of good talks (more on that later), but I ran into a couple of issues in the evening. The first issue was when I went into the Handmake Hamburger just accross the canal from the ICC, I was alone and the waitress/Maitre d' said they didn't have any free tables suitable for one person (looking around they had several free tables, just they were for 2, 4 or more people) so after expressing my disatisfaction and feeling about the poor quality of service I went accross Broad Street to Walkabout and had a Kangaroo Burger. The second issues was that I had hoped to go to the bloggers' meetup at All Bar One. Although I blog about Oracle I'm not sure if I qualify as a true Oracle Blogger so when everyone disapeared upstairs to a private party (UKOUG had laid on some sort of 'Thank You' event for the volunteers) I didn't follow. I know I'm probably worrying over nothing but didn't want to risk my biggest memory of this year's conference being getting thrown out of an event for gate crashing. Hopefully by next year I'll have sorted out what the criteria are for access to such gatherings, and will fulfil the criteria.

Getting back to the day, the talks.

A number of the talks I attended dealt with bitmap indexes to a greater or lesser degree. A bitmap index (for those who don't yet know and don't want to wade through the guides) is one where there is a series of bits (the bitmap) for each possible value for a field with a bit for each record in the table, so there are 4 possible values and 10 records there will be 4 bitmaps of 10 bits each (apparently bitmap indexes compress repeating identical values so the bitmaps would be smaller than that but I've only heard that from one source and haven't been able to confirm it yet, anyhow from a logical perspective there's still a bit for each records in each bit map).

For example suppose you have a people table which contains a gender column. The gender column can have 4 possible values (actually, depending on your (political/social) environment, it can have many more than that (at least 30 in some systems I know of) but that's a blog entry for another time): "Don't Know", "Refused to divulge", "Female" and "Male". The difference between "Don't Know" and "Refused to Divulge" is that in the case of "Don't Know" we haven't asked and for "Refused to Divulge" we asked but they wouldn't say, due to the nature of data privacy laws in the UK, except in very specific circumstances, we cannot compel an answer or put in what we believe the answer to be. I'll use a 1 letter code for each one in this example (D R F M) the letter at the start of the line is the field value, the string of digits is the value of the bits in the bitmap:
D 00000000000000000100
R 00001000001000000001
F 11100001100011111111
M 00010110010100000000


As you can see from this we have 1 record with "Don't Know", 3 with "Refused to Divulge", 13 with "Female" and 5 with "Male" in the indexed field.

An interesting point from the talks was that each presenter seemed to have a slightly different view on the criteria for when you use a bitmap index. They mostly agreed but there were slight variations in the detail and not all of them listed all the same criteria. Collating the criteria given ('bitwise and' :-)) gives something like:
  • Small number of possible values. The key question here is what constitutes a small number. 4 is a small number but if you only have a 5 records you might question if the overhead of the bitmap is worth it, you might even question if it's worth indexing the table at all (it might be, I recall attending a talk (by Connor McDonald IIRC) on the advantages of indexing very small tables). 100 isn't a particularly small number but if you've got 10,000 records then it is in comparison. Assuming the compression of repeating identical values does happen then, given that statistically for each of those 100 bitmaps 99% of the bits will be zero, you should get a reasonable compression rate as worst case scenario is you have sequences of a 1 followed by 9 zeros. If the distribution is highly skewed (say there's 100 different possible values but a few values account for most of the records) then your compression rate might be far better. Like many things it's probably a case of test it and see, I'll hold off judgement till I've had a chance to test it.
  • The field doesn't tend to get updated or deleted. This applies to any type of index really, if you can you want to avoid indexing fields that get updated (i.e. changed) or the records deleted as this will cause extra IO (updating the index when the field is updated or the record is deleted) and your index to hold 'dead keys' (entries that used to point to a record but don't any more because the indexed field was updated or the record deleted) which waste space and thus can slow down IO on the index (Oracle has to read blocks which contain dead space). Sometime I really must look into how a bitmap index handles dead keys. If anyone else does please comment here with a link to your results.
  • Field tends to be used a lot on where clauses. Again applies to any sort of index. Indexes generally slow down updates/inserts/deletes (something else to update) and speed up queries that use the indexed field in their where clause. If the field is rarely or never used in a where clause then the index is not used and is just wasting resources.
  • Field tends to be used in where clauses in combination with other bitmap indexable fields. Oracle can do bitwise operations (AND, OR &c) on the bitmaps, bitwise operations tend to be very fast (they're implemented at a very low level in the microcode in the processor). Suppose your people table in the example above also has fields for marital status, employment status and ethnicity. If you run queries with where clauses such as "gender='F' and marital_status='Married' and ethnicity='White British' and employment_status='Full Time Permanent'" (useful for Equal Opportunities monitoring) then Oracle can just load the appropriate bitmaps and do a bitwise AND to quickly identify all records that satisfy that clause.
I'm sure I'll have forgotten some but once the presentations are up on the web I'll read them over again along with my notes and fill in the blanks.

One suprise during the day was I ran into John Ryland, a DBA from Capita currently seconded to Service Birmingham (where I work); I hadn't known he was going to be there. Service Birmingham is a joint venture company between Birmingham City Council and Capita group in April 2006 which provides ICT and Business Transformation services to Birmingham City Council. The majority of the staff are secondees from Birmingham City Council (mostly from the Business Solutions and IT department) with some secondees from Capita and TUPE transferees from the council and other bodies. John hadn't been aware that Service Birmingham had a membership or that Birmingham City Council had had a membership, apparently when Capita asked the UKOUG (as part of their due dilligence checking) the UKOUG and denied we had a membership.

Of the presentations that I attended today the one that probably grabbed my attention and soparked my interest the most was "Accountability for System Performance (introducing Six Sigma quality in Oracle software performance)" from Cary Millsap. What I picked up from it was that everything should be instrumented, in every layer and every transation (that is from when the user clicks a button or link to when they get the result, not just a database transaction) should measure and record how long it took. When I get back to work next week I think I might propose that we put into our standards for applications (that we procure externally (most of them) and the ones we develop internally (a few)) that the application must be instrumented to record time based stats at every level. Of course the 'Holy Grail' would be to tag every transaction with a unique ID that would be resolvable at every level (a lot of our apps are browser-apps server-database-SAN/NAS or client-apps server-database-SAN/NAS) and have each level record start/end times and as many other time based statistics as possible (if the application does three things before it contacts the database and 2 things with the results it gets back then I want to know how long each of the 5 things took and what it did at each step). Over the years the vast majority of the performance problems I've seen have been the result of slow steps outside the database (e.g. a user sees a transaction taking a minute but only half a second of that is in the database, the slow step is in the client, apps server or simply network delay and invisible to the database instrumentation, you can pick up some storage related information from the database instrumentation but having instrumentation in that level would be much better). If the application had been properly instrumented then solving the problem would have been a lot quicker (if only because it would have saved a lot of time arguing over where the problem was, saying it's not in the database is one thing but being able to say where it is, with hard evidence to back it up, is much better). After the session I spoke with Cary about tacking instrumentation on to uninstrumented apps and he suggested a few UNIX tools that would be useful. I'm going to be in Tom Kyte's session on "Instrumentation 101" tommorrow, maybe I'll pick up more then.

I had intended to talk more about the other sessions but it's after midnight and I have to be up in a little over 5 hours so they'll just have to wait.

Thursday, September 07, 2006

SIG Meeting

I've just gotten the verbal OK from my line manager to attend the Management and Infrastructure SIG meeting on 4th October. Most of the opresentations look like they will be both interesting and useful. My employer is looking to introduce ITIL so the "Can it help me? What are the pitfalls?" presentation could be useful, similarly the DBA/Database ratio presentation is highly relevant to my current environment. The VLDB and Datawarehouse talks should be useful as some of ourdatabases are growing to be pretty huge, partly just datagrowth but also because we're getting situations where two or more separate applications (each with their own database) are being collapsed into one, total data volume has probably fallen but the data is now all in one place rather than being spread over two or more servers.

There's currently a big discussion going on about where the DBAs should sit in the organisation. In the background to the discussion are questions about what a DBA is and what a DBA does. There are a number of people who call themselves DBAs but are really application support people who happen to work mostly in the database, often doing little more than running prepackaged scripts and stopping/starting the database using OEM. They don't do any problem resoution (the problem resolution procedure is "Raise call with application vendor"), tuning or maintenence (other then running prepackaged scripts). Then there's a few people like myself who tend to do problem diagnosis and resolution, tune databases, set up servers, do proactive and preventative maintenence &c.

I was recently asked to give my thoughts on what a DBA does and landing up splitting it into three roles: Development DBA, Application Support DBA and Infrastructure DBA. here's my descriptions:
A Development DBA is one who does database side development as part of a development team, advises the front end/middleware developers (mostly about query tuning and fixing syntax/logical problems), create database objects and may administer (startup/shutdown, create users) the databases used in development. Tycally they will not be involved in installation of the Oracle software, creation of databases, setting up servers, server tuning, Oracle patching or anything outside of the development process. There is a good description of a development DBA here: http://www.oracledba.co.uk/tips/dba_evolution.htm

An Applications DBA is one who works as part of an applications support team to implement and support an application. They will be very focused on one particular application (or a small group of related applications) and typically will only deal with the schemas for that application. They tune queries, create and recreate objects as needed (mostly indexes) &c. Often they won't even have a login to the database server or have the sys/system passwords for the database. Typically they will not be involved in installation of the Oracle software, creation of databases, setting up servers, server tuning, Oracle patching (although they will patch the application) or anything outside of the application they support.

These first two are very much vertically oriented, stove piped even, they are only concerned with one application (either developing it or suppoorting it). Infrastrucxture DBAs, however, are very much horizontally oriented. They support the databases accross the organisation and are not restricted to just one application or small group of related applications. Infrastructure DBAs tend to be involved in pretty much everything to do with the database not covered by Application DBAs or Development DBAs, often they will advise or co-ordinate with Application DBAs.
Obviously there is a lot of grey area here, e.g. you might have an application DBA who patches the Oracle software but only on the machines their application(s) run on &c.

I'd describe myself as mostly Infrastructure with occasional flashes of Development.

Wednesday, September 06, 2006

My Agenda for UKOUG 2006

These are the sessions I'm planning on attending at this year's UKOUG conference:

08:00 - 17:45 Registration Open
09:15 - 09:30 Introduction by Ronan Miles, UKOUG Chairman
09:30 - 10:30 Opening Technical Keynote: "What's Coming Next?" by Tom Kyte, Oracle
10:30 - 11:00 Coffee Break
11:00 - 12:00 Accountability for System Performance (introducing Six Sigma quality in Oracle software performance)
Mr Cary Millsap
Hotsos Enterprises, Ltd.
Hall:
12:10 - 12:55 Moving lots of data quickly
Mr Connor McDonald
Independent
Hall:
12:55 - 13:50 Lunch
13:50 - 14:35 Understanding Logical I/O
Mr Daniel Fink
7S Consulting, Inc.
Hall:
14:45 - 15:30 All Oracle DBAs have to know about Unix Memory Monitoring
Mr Jurijs Velikanovs
"IT Alise" Ltd
Hall:
15:30 - 16:00 Coffee
16:00 - 16:45 Everything you wanted to know about indexes
Mr Ben Bor
Acuma Solutions
Hall:
16:55 - 17:40 An introduction to SQL*Trace, TKPROF and Execution Plans
Mr David Kurtz
Go-Faster Consultancy
Hall:
17:50 - 18:35 Wait Events and the Geeks Who Love Them
Kyle Hailey
Independent Consultant
Hall:
Wednesday

08:00 - 19:30 Registration
09:15 - 09:30 Welcome & Introduction by Ronan Miles, UKOUG Chairman
09:30 - 10:30 Keynote Presentations by Ian Smith, Oracle & Jesper Andersen, Oracle
10:30 - 11:00 Coffee
11:00 - 12:00 Practical database performance monitoring
Mr Peter Homes
EDS UK
Hall:
12:10 - 12:55 Instrumentation 101
Mr Thomas Kyte
Oracle
Hall:
12:55 - 14:10 Lunch
14:10 - 14:55 An introduction to the Cost Based Optimizer
Mr Jonathan Lewis
JL Computer Consultancy
Hall:
15:05 - 15:50 Advanced Research Techniques in Oracle - Part I
Mr Tanel Poder
SmartChaps.com
Hall:
15:50 - 16:25 Coffee
16:25 - 17:25 Advanced Research Techniques in Oracle - Part II
Mr Tanel Poder
SmartChaps.com
Hall:
17:35 - 18:35 Average Active Sessions - the magic metric?
Mr John Beresniewicz
Oracle USA
Hall:
18:40 - 19:30 Exhibition Drinks
19:30 - 23:00 Event Dinner

Thursday

08:00 - 19:00 Registration
09:00 - 09:45 Converting from Rule-based to Cost-based.
Mr Jonathan Lewis
JL Computer Consultancy
Hall:
09:55 - 10:40 Storage Internals for the Oracle Architect
Mr James Morle
Scale Abilities Ltd
Hall:
11:15 - 12:15 Tuning with SQL Profiles
Mr Joze Senegacnik
Senegacnik Joze - Raziskovanje v naravoslovju
Hall:
12:15 - 13:30 Lunch
14:25 - 15:10 Inside RAC
Mr Julian Dyke
JULIANDYKE.COM LImited
Hall:
15:10 - 15:40 Coffee
15:40 - 16:25 A Problem for every Solution
Mr Carel-Jan Engel
DBA!ert
Hall:
16:35 - 17:35 CBO – A Configuration Roadmap
Mr Christian Antognini
Trivadis
Hall:
17:45 - 18:45 Guest Speaker (Simon Weston)
18:45 - 23:00 Focus Pubs & Disco till late
Friday

08:00 - 14:30 Registration Open
09:00 - 09:45 Performance and Scalability Enhancements in Oracle 10g and 10gR2
Mr Tanel Poder
SmartChaps.com
Hall:
09:45 - 10:15 Coffee
10:15 - 11:00 Oracle Database 10g Release 2: Performance Diagnosis Update
Mr Graham Wood
Oracle
Hall:
11:10 - 11:55 Once around the block
Mr Connor McDonald
Independent
Hall:
11:55 - 13:10 Lunch
13:10 - 15:05 Creating and Interpreting Basic Block Dumps
Mr Joel Goodman
Oracle University UK
Hall:



Looks like it's going to be a good conference.

Monday, August 28, 2006

Random training related thought

A thought just crossed my mind. I wonder if there is a market for short (one day) task oriented training for the sort of things that aren't normally covered in general courses.

For example I'm aware that there are training courses that mention standby databases and certainly there are courses that cover RMAN, at least for basic backup and recovery. I'm not aware of any that actually take you through setting up a standby database, with a hands on practicum, or that covers things like cloning a database with RMAN. These are things that DBAs might be required to do. Sure you can read up in the documentation and, if you are lucky enough to have the kit, try out but might there be a market for, say, a one day course in how to use RMAN to create a clone database and then convert it to a standby for the database you've cloned? Or something similar with other such tasks ("Your first RAC cluster")?

I'm sure that there have been user group and technology day type presentations on those sorts of things but these lack the hands on aspect.

Monday, August 07, 2006

Search Terms

Doug Burns recently posted in his journal a list of common search terms that people have used where his journal appeared in the results. I figured I'd post mine (according to Statcounter.com). Fortunately (or unfortunately) mine are not anywhere near as exciting as his.

Perc. Search Term
5.80% ora-07445
4.35% orapwd invalid username password
2.90% exception signal: 11 oracle
2.90% alter user sys
2.90% move tempfile oracle 8
2.90% ora-01031 sysdba
1.45% oracle password file orapwd
1.45% htmldb_public_user change password
1.45% htmldb startup
1.45% oracle 10 bind parameters ora-07445
1.45% address not mapped to object ora-07445
1.45% failed to start a managed process after the maximum retry limit
1.45% failed to start a managed process htmldb
1.45% decode(true in oracle
1.45% 9.2.0.7 patchset filename
1.45% opmnctl startall htmldb
1.45% sql number exists oracle htmldb
1.45% opmnctl: starting opmn managed processes
1.45% 3440097 oracle
1.45% shutdown immediate oracle insufficient
1.45% oracle allow normal sys login
1.45% ora-01031: insufficient privileges when startup oracle
1.45% oracle startup as sysdba
1.45% passwordfile rac
1.45% http_server retry limit
1.45% starting opmn and all managed processes
1.45% use of the orapwd file
1.45% orapwd entries
1.45% oracle datafile how to now autoextend
1.45% sqlplus nolog oracle
1.45% oracle is started but my instance isn't
1.45% sqlplus sysdba insufficient priviledge
1.45% english_united kingdom.al32utf8
1.45% oracle sql user sysdba login
1.45% ora-01017 sys as sysdba
1.45% error opw-00005
1.45% orapwd password
1.45% oracle value as column heading
1.45% sysdba
1.45% ldap 32gb
1.45% opmnctl startall ias-component
1.45% 9.2.0.7 patch set - list of bug fixes by problem type
1.45% opmn shutdown failed
1.45% setup oracle ldap
1.45% pfile processes double sessions transactions oracle
1.45% why use orapwd password file
1.45% oracle users select sysdba
1.45% http_server failed to start a managed process after the maximum retry limit
1.45% ora 07445
1.45% htmldb authentication error
1.45% orapwd password file
1.45% change oracle instance
1.45% oracle import guaranteed to work in later versions
1.45% oracle select no column headings
1.45% exception signal: 11 (sigsegv), code: 1 (address not mapped to object), addr:
1.45% oracle shutdown immediate insufficient privileges
1.45% changing remote_login_passwordfile
1.45% how to change password for sysdba on oracle
1.45% oracle opmnctl fail start solaris
1.45% oracle shutdown as sysdba


I hope the people found the answers they needed.

I wonder if, given sufficient results from a variety of Oracle blogs (and editing out the ones that are obviously spurious) during a specific time period, it might be possible to identify what sort of things people are searching for? An "Oracle Zeitgeist" perhaps?

In other news one of our major apps is getting a patching (the app, not Oracle) tommorrow. I'm nervous, always am before a patching.

Tuesday, July 18, 2006

Distributed Computing

This morning someone passed me a link to "The Eight Fallacies of Distributed Computing". Looks pretty accurate. I'd say that many of them are doubly true of RAC.

Monday, July 17, 2006

Oracle Certified What?

Just recieved the following email via one of the Oracle DBA lists I'm on:
Hello,

I am XXX from Singapore.I am also an Oracle Certified Professional
(OCP).Could I know size of any database and what's the size of SGA for
that Database and how long it take to complete Logical Backup and
Physical Backup.I will be awiting for ur reply.

Thanks in Advance
Regards
XXXX
Names have been removed to protect the guilty.

Now, I have no particular problems with answering beginner questions (I've asked a fair few myself), even though the list this email appeared on is suposedly an experienced DBA list, not a newbie list. I'm quite happy to ignore the poor standard of English, the questioner probably doesn't speak English as their first language (although in Eric S Raymond's guide to Newbies, on how to ask questions, he does state that they should be asked in clear, grammatical, correctly-spelled language). The thing is the questioner is an OCP, or at least claims to be, and those questions are definite newbie type questions. They're also questions that a little thought and looking at the manual should answer if basic common sense doesn't.

Taking the last one first: "how long it take to complete Logical Backup and
Physical Backup"

It depends. Size is just one factor. Machine disk speed, competing jobs, speed of target media &c have a much bigger affect. Try it and see. As you get more experienced you'll be able to estimate how long it will take by taking all the factors into consideration. There's probably a formula somewhere that will produce a precise, supportable, validatable and utterly wrong answer.

Moving back up the mail we get to: "what's the size of SGA for
that Database"

Querying v$sga would be a start. If it's an older version of the database then they'll have to dig into v$parameter and add up the various pools.

And finally: "Could I know size of any database"

I am kinda assuming throughout that the questioner wants to be able to find the size and other factors of a database they manage, rather than for people to pick a database at random and pass on the information on that.

This one is quite hard to answer, not that the information is hard to get to but because there are so many possibe measures of the size of a database. Files on disk? Data volume? Transactions per second? Number of concurrent users?

One of the first two is most likely (given the later questions about backups). For files on disk just query dba_data_files and dba_temp_files and add up the BYTES column then query v$log and sum the BYTES column. If you want you can look at the controlfiles, pfile, spfile and archived logs, it depends how you're measuring (controlfile, pfile and spfile will constitute a tiny proportion of the size of any non-trivial database and space taken by archived logs is likely to vary according to transaction load and your management procedures).

Data volume can be obtained from querying dba_extents and summing the bytes column for all table extents. It won't give a spot on answer due to partially filled extents but should be close enough. No doubt there are scripts out there to measure down to the individual block, frankly I've never needed to be that precise.

Does make on wonder why the OCP is thought of as a desirable qualification if mails like that are flying around.

Thursday, July 06, 2006

Another SYSDBA password entry

Simon Kelsey commeted on one of my earlier entries on this subject that x$kzsrt may be involved:
SQL> select * from v$fixed_view_definition where view_name='GV$PWFILE_USERS';

VIEW_NAME
------------------------------
VIEW_DEFINITION
--------------------------------------------------------------------------------
GV$PWFILE_USERS
select inst_id,username,decode(sysdba,1,'TRUE','FALSE'), decode(sysoper,1,'TRUE
','FALSE') from x$kzsrt where valid=1 and username != 'INTERNAL'


I then got to thinking, does the 'alter user...;' just alter the copy of the password in memory or does it update the password file as well?

I constructed a little test case. First I opened two sessions on the server as the Oracle software owner user (i.e. 'oracle'). In the first I connected to the database as 'sys' using the existing SYSDBA password and then changed the password using the 'alter user...;' method:
$ sqlplus /nolog

SQL*Plus: Release 9.2.0.1.0 - Production on Thu Jul 6 14:30:23 2006

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

SQL> conn sys/sbtest@sbtest as sysdba
Connected.
SQL> alter user sys identified by foobar;

User altered.


In the second session I attempted to connect to the database with the old password and with the new:
$ sqlplus /nolog

SQL*Plus: Release 9.2.0.1.0 - Production on Thu Jul 6 14:31:21 2006

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

SQL> conn sys/sbtest@sbtest as sysdba
ERROR:
ORA-01017: invalid username/password; logon denied


SQL> conn sys/foobar@sbtest as sysdba
Connected.
This confirms that 'alter user...;' does indeed change the password. I then quit SQL*Plus in that session and back in the first session shutdown and attempted to restart the database:
SQL> shutdown abort
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
ORA-01017: invalid username/password; logon denied
Note that the instance appears to be started (i.e. the database is in no mount state) before the credentials of the user are checked.

I then attempted to shutdown the instance and, when that failed, I tried to start it up (to see if I'd get a insufficient priviledge error or a cannot start already-running ORACLE error):
ORA-01031: insufficient privileges
SQL> startup
ORA-01031: insufficient privileges
. In the second session I recreated the password file with the new password:
$ cp orapwsbtest orapwsbtest.20060706.bak
$ rm orapwsbtest
$ orapwd file=orapwsbtest password=foobar entries=16
and back in the first session tried to connect with the new password and start up the database:
SQL> conn sys/foobar@sbtest as sysdba
Connected.
SQL> startup
ORA-01081: cannot start already-running ORACLE - shut it down first
Note that this time I get the "cannot start already-running ORACLE" error, demonstrating that the instance is at least partly up. I shutdown the instrance then restarted it and changed the SYSDBA password back to the original value using the 'alter user...;' method:
SQL> shutdown abort
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 555451816 bytes
Fixed Size 730536 bytes
Variable Size 285212672 bytes
Database Buffers 268435456 bytes
Redo Buffers 1073152 bytes
Database mounted.
Database opened.
SQL> alter user sys identified by sbtest;

User altered.
I then bounced the instance again (using shutdown abort to avoid any possibility of a normal shutdown updating the password file, reconnecting with the changed password to start up). I then went to the second session and tried to connect with the old password, this failed but connecting with the new password worked.

From this it does look like changing the sys password with 'alter user...;' does indeed update the password file.

Tuesday, July 04, 2006

Oracle answers on the SYSDBA password question

QUESTION
=========
1) Does Oracle cache the SYSDBA password?
2) Is there a way to force it to re-read the password file to pick up a changed password without havi
ng to shutdown the
database?




RESEARCH
=========
Note.1029539.6 - UNIX How to Set up the Oracle Password File
Note 185703.1 - How to Avoid Common Flaws and Errors Using Passwordfile
Note 114384.1 - WIN: Checklist for Resolving CONNECT AS SYSDBA (INTERNAL) Issues





ANSWER
=======
1) Yes. this is cached. The passwordfile is read only at the beginning, when the instance is started.
2) There is no way to force Oracle read the passwordfile, but one can use :

alter user sys identified by ;

After this the "cache" should be changed to the new password.


Guess that answers that question then. I've double checked the course materials from the Oracle DBA courses I've been on and they all say the way to change the SYSDBA password is to use orapwd, none of them mention using 'alter user...'.

Saturday, July 01, 2006

SYSDBA password

A couple of days ago I posted an entry about an issue I'd run into reseting the SYSDBA password. Today I noticed my hits had jumped, checking my refer log I noticed a lot from Peter Finnigan's blog, he'd posted an entry linking to the post.

In the entry he says that, if true, it was interesting. I don't know about interesting, but it's definately a true record of what happened. I'd be suprised if I'm the first person to run into this issue.

I did recieve a comment to the entry (from an anonymous user) saying that you have to shutdown the database before recreating the password file. I don't recall ever reading that or hearing it mentioned on a course, could be true or it could be one of the many Oracle myths going around. I found an example on orafaq.com that does include shutting down the database (after recreating the password file) but also talks about changing REMOTE_LOGIN_PASSWORDFILE in the init.ora file, which would require a database restart.

It is troubling that changing the password in the password file isn't immediately reflected in a running instance. If you're changing the password then, unless it's a scheduled prophylactic change, it's likely because you know or believe someone has the password who shouldn't have or you need to set a throwaway password to allow a consultant short term access then change back when they're done. In either case you want the password to be changed immediately and not requiring a database restart. In my experience getting the business to agree a database restart on a production system is like trying to get a straight answer from Oracle Marketing.

Friday, June 30, 2006

HTMLDB - PlsqlNlsLanguage takes one argument, NLS Language

I installed HTMLDB (off the 10.2 Companion disk) on Solaris today. Install seemed to run OK and at the end it told me that the HTMLDB instance had started and could be accessed at:
http://wintermute:7778
When I tried to access it, no joy. Said it couldn't connect. I tried using opmnctl to check the status and found that opmn itself was up but none of the managed services were. Stopping and starting opmn and all it's managed processes produced:
$ opmnctl stopall
opmnctl: stopping opmn and all managed processes...
$ opmnctl startall
opmnctl: starting opmn and all managed processes...
================================================================================
opmn id=wintermute:6200
0 of 1 processes started.

ias-instance id=IAS-X-wintermute.6299
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
ias-component/process-type/process-set:
HTTP_Server/HTTP_Server/HTTP_Server

Error
--> Process (pid=24282)
failed to start a managed process after the maximum retry limit
Log:
/u03/oraracle/app/oracle/htmldb/10.2/opmn/logs/HTTP_Server~1
Looking in the log file (see the updates to the wikipedia article on Error Hiding that I made for my views on error mesages and log files) reveled many instances of a more detailed error message:
--------
06/06/30 12:20:00 Start process
--------
/u03/oraracle/app/oracle/htmldb/10.2/Apache/Apache/bin/apachectl start: execing httpd
Syntax error on line 9 of /u03/oraracle/app/oracle/htmldb/10.2/Apache/modplsql/conf/marvel.conf:
PlsqlNlsLanguage takes one argument, NLS Language
leading me to the settings file marvel.conf. The contents of this file were:
<Location /pls/htmldb>
Order deny,allow
PlsqlDocumentPath docs
AllowOverride None
PlsqlDocumentProcedure wwv_flow_file_manager.process_download
PlsqlDatabaseConnectString wintermute:1527:emrep10 ServiceNameFormat
PlsqlNLSLanguage ENGLISH_UNITED KINGDOM.AL32UTF8
PlsqlAuthenticationMode Basic
SetHandler pls_handler
PlsqlDocumentTablename wwv_flow_file_objects$
PlsqlDatabaseUsername HTMLDB_PUBLIC_USER
PlsqlDefaultPage htmldb
PlsqlDatabasePassword @BdtmwbdTRq8ljkMXfRPmbpw=
Allow from all
</Location>
Looking at the indicated line, line 9:
PlsqlNLSLanguage ENGLISH_UNITED KINGDOM.AL32UTF8
the only thing that struck me was the space in the middle of the NLSLanguage identifier, maybe this was being intrepreted as a field seperator?

I put sigle quotes around the NLSLanguage identfier:
PlsqlNLSLanguage 'ENGLISH_UNITED KINGDOM.AL32UTF8'
and tried again:
$ opmnctl stopall
opmnctl: stopping opmn and all managed processes...
$ opmnctl startall
opmnctl: starting opmn and all managed processes...
$
Success!

Thursday, June 29, 2006

Does Oracle cache the SYSDBA password?

I was just having a play around on one of my test/sandpit databases ("small database on a low power machine that I use for testing out scripts &c, no live data and if the database gets destroyed I just have to wipe out the files and create a new one" type of thing). I'd forgotten the SYSDBA password (for sandpit systems I use the SID as the SYSDBA password so don't bother to write it down, for some reason this time I didn't) but needed it (I'm installing HTMLDB to try it out on that machine) so wanted to change it. I had been getting1:
SQL> conn sys/sbtest@sbtest as sysdba
ERROR:
ORA-01017: invalid username/password; logon denied
as you might expect. I used orapwd to change the password
$ cp orapwsbtest orapwsbtest.20060629.bak
$ orapwd file=orapwsbtest password=sbtest entries=16

OPW-00005: File with same name exists - please delete or rename
$ rm orapwsbtest
$ orapwd file=orapwsbtest password=sbtest entries=16
and thought I would be good to go. So, I try logging in as sys from another window:
SQL> conn sys/sbtest@sbtest as sysdba
ERROR:
ORA-01031: insufficient privileges
Not so good to go! I looked up the error message:
$ oerr ora 01031
01031, 00000, "insufficient privileges"
// *Cause: An attempt was made to change the current username or password
// without the appropriate privilege. This error also occurs if
// attempting to install a database without the necessary operating
// system privileges.
// When Trusted Oracle is configure in DBMS MAC, this error may occur
// if the user was granted the necessary privilege at a higher label
// than the current login.
// *Action: Ask the database administrator to perform the operation or grant
// the required privileges.
// For Trusted Oracle users getting this error although granted the
// the appropriate privilege at a higher label, ask the database
// administrator to regrant the privilege at the appropriate label.
not very useful. I was getting the same error message if I attempted from the Oracle software owner user, a different user in the the dba UNIX group on the same machine or a remote SQL*Plus session on my desktop. Wondering if maybe Oracle was caching the password file, and therefore the SYSDBA password, I logged in as SYSDBA using OS authentication and forced a restart of the database:
SQL> conn / as sysdba
Connected.

SQL> startup force
ORACLE instance started.

Total System Global Area 555451816 bytes
Fixed Size 730536 bytes
Variable Size 285212672 bytes
Database Buffers 268435456 bytes
Redo Buffers 1073152 bytes
Database mounted.
Database opened.
SQL> conn sys/sbtest@sbtest as sysdba
Connected.
SQL>
Success! The password now worked from the Oracle software owner account, the other account in the dba UNIX group and the remote SQL*Plus session.

It's a good thing you can still use OS authentication to log in as SYSDBA, I'd have been really sunk if I hadn't had that.



1I've made it habit to always run SQL*Plus with the /nolog option then use connect so that the password won't appear in the process list if someone does a ps. Making good/safe practice habit can save a lot of trouble.

Wednesday, May 24, 2006

Getting information about the database

One of the tasks I've recently been given is to collate information about our Oracle and SQLServer databases. Stuff like sizes, init.ora parameters &c. I've developed a script that gets a lot of the stuff we need out of Oracle.

Here it is:
set echo off
set verify off
set feedback off
set linesize 132
set pagesize 60
set heading off
/* This script must be run as SYS (SYSDBA) */


Prompt "Audit Information"
select 'Run on: '||to_char(sysdate, 'HH24:MI DD-MON-YYYY') from dual
/
prompt
prompt

set heading on
column name heading "Database Name"
select name from v$database
/
prompt
column banner heading "Database Version Information"
select banner from v$version
/
prompt
prompt

column log_mode format a16 heading "Archivelog mode"
select log_mode from v$database
/
prompt
prompt


column option_param format a70 heading "Option"
prompt "Options Installed"
select parameter option_param
from v$option
where value='TRUE'
order by parameter
/
prompt
prompt
prompt

prompt "Parameter file parameters"
Prompt "Visible Parameters"
column name format a34 wrap heading "Parameter"
column value format a70 wrap heading "Value"
column isdefault format a9 heading "Default?"
set pagesize 300
select name, value, decode(isdefault, 'TRUE', 'Y', 'FALSE', 'N', '?') isdefault
from v$parameter
where value is not null
order by name
/
prompt
prompt "Hidden Paramteters that are non-default"
select
ksppinm NAME,
ksppstvl VALUE,
ksppdesc description,
ksppstcmnt update_comment
from sys.x$ksppi x,
sys.x$ksppcv y
where (x.indx = y.indx)
and
translate(ksppinm,'_','#') like '#%'
and
ksppstdf='FALSE'
/
prompt
prompt
prompt


Prompt "Tablespace and File Information"
column bigness format 999999.99 heading "Size (Gb)"
column tablespace_name format a20 heading "Tablespace Name"
column nofiles format 99999 heading "No. Files"
column file_name format a60 heading "Filename and Path"
column extent_management format a11 heading "Extent|Management"
prompt "Datafiles"
select a.tablespace_name tablespace_name, count(a.file_name) nofiles, sum(a.bytes)/(1024*1024*1024) bigness, b.extent_management
from dba_data_files a, dba_tablespaces b
where a.tablespace_name=b.tablespace_name
group by a.tablespace_name, b.extent_management
/
column autoextensible heading "Auto|Extensible" format a10
select tablespace_name, file_name, bytes/(1024*1024*1024) bigness, autoextensible
from dba_data_files
order by tablespace_name, file_id
/
prompt
prompt

Prompt "Tempfiles"
select a.tablespace_name, count(a.file_name) nofiles, sum(a.bytes)/(1024*1024*1024) bigness, b.extent_management
from dba_temp_files a, dba_tablespaces b
where a.tablespace_name=b.tablespace_name
group by a.tablespace_name, b.extent_management
/
select tablespace_name, file_name, bytes/(1024*1024*1024) bigness, autoextensible
from dba_temp_files
order by tablespace_name, file_id
/
prompt
prompt


Prompt "Online Redo Logs"
column group# format 999 heading "Group No."
column status format a10
column type format a10
column member format a100 heading "Filename and Path"
select group#, status, type, member
from v$logfile
order by group#, member
/
prompt

column redo_size format 99999.99 heading "File Size Mb"
select group#, bytes/(1024*1024) redo_size
from v$log
order by group#
/
prompt
prompt
prompt


prompt "System/Shared Global Area"
prompt "Summary"
column sga_size format 9999.99 heading "Size (Mb)"
select name parameter, value/(1024*1024) sga_size
from v$sga
/
prompt

prompt "Block Caches"
column cache format a30 heading "Cache Name"
select name cache, value/(1024*1024) sga_size
from v$parameter
where name like 'db_%ache_size'
/
prompt

prompt "Pools"
column pool heading "Pool|Name"
select pool, sum(bytes)/(1024*1024) sga_size
from v$sgastat
where pool is not null
group by pool
/
prompt
prompt
prompt


prompt "Users and Schemas"

select count(username) "No. Users" from dba_users
/
prompt "Users and Objects"
column no_objects format 99999999 heading "No.|Objects"
select owner, object_type, count(object_type) no_objects
from dba_objects
group by owner, object_type
/
prompt
prompt

Prompt "End Of Report


Usual disclaimers apply. I have tested this script under 9.2.0 but if you use it and it causes problems they're your problems. Test it yourself first, there's no warrantees or promises that it won't damage your systems.

It's been suggested that we use Remote Diagnostic Agent (RDA) [link require Metalink login]. I've been looking at RDA but it seems a bit clunky for what we need it for. Where I can see us using it, other than to collect information for Service Requests with Oracle Support, is maybe if we ran it for each database when it's created and each time it's patched or has some other major change (perhaps once every 6 months) and store the resulting .zip file sort of document management system indexed by the database name and date. I think that as a means of documenting the database like that it would be useful but a single report of the 'highlights' such as the script is more useful right now, the information from RDA is rather too lengthy and 'overkill'.

Solution to Dreaded ORA-07445 error

Finally got a response from Oracle, turned out they had put it with their US office who don't start work until 13:00 our time.

The eventual response was:
23-MAY-06 14:49:43 GMT

DATA COLLECTED
===============
On alert.log we can see some:
ORA-07445: exception encountered: core dump [00000001009B0B04] [SIGSEGV] [Address not mapped to obj
ect] [0x000000000] [] []

Trace file uploaded shows:
/opt/oracle/app/oracle/admin/dchr/udump/dchr_ora_5662.trc
Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production
JServer Release 9.2.0.6.0 - Production

*** SESSION ID:(37.1317) 2006-05-18 15:34:04.409
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object), addr: 0x0, PC: [0x1009b0b04, 00000001009B0B04]
*** 2006-05-18 15:34:04.410
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [00000001009B0B04] [SIGSEGV] [Address not mapped to object] [0x000000000
] [] []
Current SQL statement for this session:
select all count(dm_type.r_object_id) "the_count" from dm_type_sp dm_type where (dm_type.name=:"SYS_B_0" and ex
ists (select r_object_id from dm_type_r where dm_type.r_object_id = r_object_id
and attr_name=:"SYS_B_1"))
----- Call Stack Trace -----
ksedmp ssexhd sigacthandler evaopn2 qerixGetKey qerixStart qertbStart qerjoFetch qerstFetch
qergsFetch qerstFetch opifch2 opiall0 kpoal8 opiodr ttcpip opitsk opiino opiodr opidrv sou2o
main


******************** Cursor Dump ************************
Current cursor: 1, pgadep: 0
pgactx: 39b95a998 ctxcbk: 0 ctxqbc: 0 ctxrws: 396c5ba70
Explain plan:
Plan Table
--------
-------------------------------------------------------------------------------------------------------------------------
| Operation | Name | Rows | Bytes | Cost | TQ |
IN-OUT| PQ Distrib |Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------
| SELECT STATEMENT | | 0 | 0 | 2 |
| | | | |
| SORT AGGREGATE | | 1 | 47 | 0 | | | | |
|
| NESTED LOOPS SEMI | | 1 | 47 | 2 | | | | | |
| INDEX UNIQUE SCAN | D_1F01487D80000002 | 1 | 15 | 1 | | | | |
|
| TABLE ACCESS BY INDEX ROWID | DM_TYPE_R | 62 | 1984 | 1 | | | | | |
| INDEX RANGE SCAN | D_1F01487D80000009 | 109 | 0 | 1 | | | |
| |
-------------------------------------------------------------------------------------------------------------------------
Cursor Dump:
----------------------------------------
Cursor 1 (ffffffff7cd60418): CURFETCH curiob: ffffffff7cc465d0
curflg: 46 curpar: 0 curusr: 0 curses 392779298
cursor name: select all count(dm_type.r_object_id) "the_count" from dm_type_sp dm_type where (dm_typ
e.name=:"SYS_B_0" and exists (select r_object_id from dm_type_r where dm_type.r
_object_id = r_object_id and attr_name=:"SYS_B_1"))
child pin: 3992242b8, child lock: 397e020c8, parent lock: 3a21f62f0
xscflg: 80110476, parent handle: 39d351938, xscfl2: 6a00409
Dumping Literal Information
Bind Pos: 0, Bind Len: 12, Bind Val:
Bind Pos: 1, Bind Len: 9, Bind Val:
nxt: 4.0x00000738 nxt: 3.0x00000710 nxt: 2.0x00000220 nxt: 1.0x00000720
Cursor frame allocation dump:
frm: -------- Comment -------- Size Seg Off
bhp size: 160/600
whp size: 8688/12440

bind 0: dty=1 mxl=32(12) mal=00 scl=00 pre=00 oacflg=10 oacfl2=100 size=32 offset=0
bfp=ffffffff7cc469b0 bln=32 avl=12 flg=09
value="dmi_workitem"
bind 1: dty=1 mxl=32(09) mal=00 scl=00 pre=00 oacflg=10 oacfl2=100 size=32 offset=0
bfp=ffffffff7cc46960 bln=32 avl=09 flg=09
value="a_wq_name"

.
eos (end of section)
.
ISSUE VERIFICATION
===================
Verified on alert.log and trace file provided

.
eos (end of section)
.
RESEARCH
=========
Found bug
Bug 4098853
Abstract: Dump (qerixGetKey->evaopn2) using COUNT(NOT_NULL_COLUMN)
with a view
Fixed-Releases: 9207 A105 A201
Tags: CBO DUMP R9206 REGRESSION
Details:
This problem is not a true regression but is a problem which
is exposed by the fix for Bug 3440097 in 9.2.0.6.
SQL selecting COUNT(NOT_NULL_COLUMN) from a view may dump
under qerixGetKey->evaopn2 (or qerhjSplit->evaopn2)
Workaround:
Set Event 10122 and ensure the SQL is reparsed with this event set.

Another Workaround:
query_rewrite_enabled=TRUE


.
eos (end of section)
.
CAUSE DETERMINATION
====================
Bug 4098853
.
eos (end of section)
.
CAUSE JUSTIFICATION
====================
Stack trace and behaviour matches this bug

.
eos (end of section)
.
POTENTIAL SOLUTION(S)
======================
We may apply 9.2.0.7 patchset where bug is already fixed, or try workarounds provided
Workaround: Set Event 10122 and ensure the SQL is reparsed with this event set.

Another Workaround:
query_rewrite_enabled=TRUE

.
eos (end of section)
.
POTENTIAL SOLUTION JUSTIFICATION(S)
====================================
As per bug fix

.
eos (end of section)
.
SOLUTION / ACTION PLAN
=======================
Researching about this issue I was able to identify the Bug 4098853. The fix will be included in patchset 92070.
As workaround, you can set the Event 10122 or query_rewrite_enabled=TRUE and ensure the SQL is reparsed with this event set.

1. alter session set query_rewrite_enabled=TRUE or

2. alter session set events '10122 trace name context forever'

3. alter system flush shared_pool;

3. Try to reproduce the issue.
.
eos (end of section)
.

.
KNOWLEDGE CONTENT
==================
Platform / Port Specific? = NO

Knowledge content not created because the following note already addresses this issue: Bug 4098853


By this time we'd already found a work around which was to apply a patch to the application that avoided the error by doing the query a different way. The bug appears (if I'm reading the response correctly) when you try to do a count on a not null column.

We've decided to move up our plans to go to 10g2, which has the bug fix, to get around it.

Friday, May 19, 2006

Dreaded ORA-07445 error

Late yesterday one of our projects passed me information abvout an ORA-07445 error they'd had earlier in the day, apparently this is stopping one of the modules in their application to stop working. Why they couldn't have told me about it earlier I don't know, I would have had time to raise a Service Request with Oracle during our maintenence hours (the why we don't have 24x7 cover is a rant for another time).

I raised a call with Oracle before I went home but there's been no updates on it as of yet, other than a note saying they will update it when they have some information (fair enough I suppose). This long silence is worrying, maybe it's something serious?

Here's the header of the trace file:
*** 2006-05-19 10:57:39.159
*** SESSION ID:(21.4255) 2006-05-19 10:57:39.158
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object), addr: 0x0, PC: [0x1009b0b04, 00000001009B0B04]
*** 2006-05-19 10:57:39.159
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [00000001009B0B04] [SIGSEGV] [Address not mapped to object] [0x000000000] [] []
Current SQL statement for this session:
select all count(dm_type.r_object_id) "the_count" from dm_type_sp dm_type where (dm_type.name=:"SYS_B_0" and exists (select r_object_id from dm_type_r where dm_type.r_object_id = r_object_id and attr_name=:"SYS_B_1"))
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp()+328 CALL ksedst() 00000000B ? 000000000 ?
000000000 ? 00000004A ?
FFFFFFFF7FFF3C78 ?
102F5E0F8 ?
ssexhd()+676 CALL ksedmp() 000103348 ? 103348000 ?
103348CA8 ? 10334D000 ?
000102800 ? 000000000 ?
sigacthandler()+44 PTR_CALL 0000000000000000 000103350 ?
FFFFFFFF7FFFAD10 ?
103350000 ? 10334DE60 ?
000000000 ? 103350DB8 ?
evaopn2()+196 PTR_CALL 0000000000000000 00000000B ?
FFFFFFFF7FFFAD10 ?
FFFFFFFF7FFFAA30 ?
00010334B ? 000000000 ?
000000000 ?
qerixGetKey()+1420 CALL evaopn2() 3966E1C20 ? 000010000 ?
103350000 ? 000000001 ?
080000002 ?
FFFFFFFF7CB20288 ?
qerixStart()+956 CALL qerixGetKey() 000000000 ? 3966E1DE0 ?
FFFFFFFFFFFFFFFF ?
FFFFFFFF7CB31C98 ?
000103000 ?
FFFFFFFF7CB316C0 ?
qertbStart()+1044 PTR_CALL 0000000000000000 396780958 ? 0001A2C91 ?
000000000 ?
FFFFFFFF7CB325EC ?
FFFFFFFF7CB32528 ?
1033503A8 ?
qerjoFetch()+396 PTR_CALL 0000000000000000 102AE1000 ? 000000001 ?
10334B208 ? 000000018 ?
000103000 ? 3950B5A90 ?
qerstFetch()+212 PTR_CALL 0000000000000000 3966E24A8 ? 1016276A0 ?
FFFFFFFF7FFFB558 ?
000007FFF ? 0000005F8 ?
FFFFFFFF7CB31488 ?
qergsFetch()+2268 PTR_CALL 0000000000000000 3966E2440 ? 000000014 ?
000000005 ? 00000000F ?
0000000F0 ? 000007FFF ?
qerstFetch()+212 PTR_CALL 0000000000000000 000101400 ?
FFFFFFFF7CB31820 ?
000000000 ? 000000028 ?
1016276A0 ? 3966E23A8 ?
opifch2()+1724 PTR_CALL 0000000000000000 3966E2340 ? 00000010C ?
000000043 ? 0000000C9 ?
000000C90 ? 000000028 ?
opiall0()+3860 CALL opifch2() 100FD2000 ? 102AE1398 ?
100FA6640 ? 000000012 ?
FFFFFFFF7FFFBB20 ?
FFFFFFFF7FFFC14C ?
kpoal8()+1040 CALL opiall0() 000000000 ? 00000005E ?
FFFFFFFF7FFFC3D8 ?
103349048 ?
FFFFFFFF7CB206B0 ?
FFFFFFFF7FFFC768 ?
opiodr()+1688 PTR_CALL 0000000000000000 000000000 ? 000000028 ?
FFFFFFFF7FFFEC20 ?
000000024 ? 000000000 ?
0000022B0 ?
ttcpip()+1556 PTR_CALL 0000000000000000 000103000 ? 100F8D200 ?
103351048 ? 103349048 ?
10334B580 ?
FFFFFFFF7FFFCDC0 ?
opitsk()+984 CALL ttcpip() 103351040 ? 000000014 ?
FFFFFFFF7FFFEC20 ?
000000000 ? 000000000 ?
FFFFFFFF7FFFDF0C ?
opiino()+1572 CALL opitsk() 000000000 ? 000000000 ?
000000000 ? 000000000 ?
10334B568 ?
FFFFFFFF7FFFED74 ?
opiodr()+1688 PTR_CALL 0000000000000000 000380007 ? 10334FE98 ?
103411DE8 ?
FFFFFFFF7FFFFAB0 ?
000000000 ? 392ACB4C8 ?
opidrv()+736 CALL opiodr() 000103000 ? 100FDD5C0 ?
103351048 ? 103349048 ?
10334B580 ?
FFFFFFFF7FFFF5D0 ?
sou2o()+16 CALL opidrv() 000000000 ? 000000004 ?
103348A2C ? 00000003C ?
103348F08 ? 000103000 ?
main()+184 CALL sou2o() FFFFFFFF7FFFFAD0 ?
00000003C ? 000000004 ?
FFFFFFFF7FFFFAB0 ?
00002E708 ? 000000000 ?
_start()+380 CALL main() 000000002 ?
FFFFFFFF7FFFFC18 ?
FFFFFFFF7FFFFC30 ?
000000000 ? 000000000 ?
100000000 ?

Hope they get back to me soon!

[Edited to Add: See the solution to ORA-07445: exception encountered: core dump [00000001009B0B04] [SIGSEGV] [Address not mapped to object] [0x000000000] [] [].]

Monday, April 24, 2006

Job Offers

Just had a call from a recruitment consultancy offering me an interview with Barclay's bank for a DBA role looking after their share trading systems. Unfortunately it was a production support only role (aka DataBase Baby Sitting), which would bore me silly, so I had to turn it down.

The consultant said he'd got my CV off monster.co.uk. I only put my CV up there last Thursday and this is the third recruitment consultant to phone me having seen it there.

Fortunately I'm in the position right now where I have a job I can stand so I can afford to be picky about which jobs I go for, unfortunately all the jobs that seem to be coming through are ones that I either couldn't do (I can't drive so any jobs I can't access via public transport are out) or wouldn't want to do (e.g. the Barclay's one above).

Sunday, April 02, 2006

Linked-In

Anyone else use Linked-In?

It's kinda a social networking type of site but business/job seeking oriented. The idea being that you might not know anyone who has the skills you need for a job you have on offer or who can offer a job that would suit your skills but someone that you know might know someone or someone you know might know someone who knows someone &c. Of course it does rely on you actually knowing people and knowing that they have a profile on Linked-In.

My profile is at :http://www.linkedin.com/pub/0/131/a3b

A few people I know are on there but it just seems so crass to invite someone to link to you.

Tuesday, January 10, 2006

To OBE- or not OBE-, what's OBE- is the question

Had a problem last week. About 20 past 4 in the afternnon (I officially finish at 4pm) on Thursday (5th Jan)I had an email telling me that users couldn't run reports on a particular database. This database sits behind an OLTP app written in Oracle Forms 6i and connects, via a database link, to another database which holds metadata for an instance of Documentum (a document management system from EMC). Users login as themselves identified from their network logon and have permissions on the application schema according to their post.

It turned out that the reports that wouldn't run were ones that accessed the remote database. The database link logs onto the remote database as a user which has permissions on the Documentum schema, which is owned by a different user. I had, that afternoon, done some work on the remote database to give the username used by the database link select permission on some views that had been created since the link was set up.

After much working of the problem and raising a call with Oracle support I found the problem (actually I found the problem about 6 hours before Oracle Support even looked at the call). Turned out that someone had decided to recreate the database link but to create it as a normal link rather than a public one so the users (who logged in as themseleves) couldn't see it.

The more intersting aspect of the problem was that when they tried to run the reports from Query Builder they were getting a message along the lines of "Cannot resolve data table scott.emp@orcl. Select a new data table?" (obviously the table names have been changed to protect the guilty). Even more interesting (in a "May you live in interesting times." sort of way) was that if they tried to edit the query then they would get an error message such as "OBE-15087: Invalid Location: orcl.world". I searched metalink, technet and finally the whole web for that error message, nothuing. The only references to OBE I could find was that it was used as an abbreviation for "Oracle By Example".

Eventually I got feedback from Oracle Support that OBE is an undocumented change to Query Builder, versions 6.0 and above, and replaces the BRW error prefix. It's documented in note 108528.1 [MetaLink login required].

The key problems here seemed to be the "OBE-15087: Invalid Location ..." error isn't well documented and an unfamiliarity with database links on our part. The problem with database links, in my experience, is that they're the sort of thing that you probably create once then forget about, they just work, until they don't. Very few of the systems I've worked with have used database links, those that have it's been a create and forget sort of deal.