Monday, December 26, 2005

(Almost) All users are idiots

No, I haven't turned into BOFH.

A common comment I've heard in every IT department I've worked in has been about the idiocy of users, or how one particular user has done something really stupid, because they don't understand IT. But why should they? They're users, they do a job which isn't IT based but just happens to use IT. They're finance people, or HR people or social services people, or planning people, or something else people. They have knowledge and skills (we hope) in other areas, areas that we as IT people almost certainly don't have. Why should they know how the network works? Why should they know how the application they're using works under the hood? Why should they know how the database, the application they use connects to, works?

There are a couple of groups I do have a problem with the idiocy (although idiocy might be too strong a word) of, IT (especially application) trainers and IT administrators.

An example of why can be seen in a problem that was recently (Friday afternoon, i.e. the day before the long break for Christmas) brought to my attention. We have a document management system that uses an Oracle database to hold the metadata. The data part of the data base is only supposed to be a few gig, it's up to about 4Gb so far, so that's very small by our standards. I don't manage it directly myself, like most of our apps it's managed by the departmental IT team with input from the project team that put it in (it's not fully live yet). For a few months the person responsible for making sure it's backed up has been worried about the amount of time it's taking to complete the backup (it's currently backed up cold each night, at some point we'll be moving to hot backups). On Friday afternoon this was brought to my attention.

I first went into the database and queried dba_data_files and v$datafile to see if any tablespaces had grown. They hadn't, although I did note that a number of datafiles that had been non-autoextend were now autoextend. I then went to the command prompt and did an ls -l to check that the files on disk were different to what I was expecting. I found that the file for the temp tablespace was 32Gb. A useful demonstration that since the introduction of tempfiles you can't just rely on dba_data_files or even v$datafile for your tablespace size information (which I admit I'd forgotten, fortunately my natural inclination to "Measure twice, cut once" meant I caught it before it caught me out), from the documentation:
Tempfile information is shown in the dictionary view DBA_TEMP_FILES and the dynamic performance view V$TEMPFILE, but not in DBA_DATA_FILES or the V$DATAFILE view.
A quick check back in the database showed that this file had also been changed to autoextend.

Another important piece of information about this system was that earlier this year I had noticed a lot of queries were doing a large amount of sorting and had where clauses where a lot of the conditions were something like "table4.field7 like '%'", which seemed to imply that the form that had generated the query had a space for the contents of that field but the user hadn't entered anything so it had been wild carded.

From this I surmised that what has probably been happening is that a lot of users have been running very unselective searches which have returned a large number of rows which have been sorted (as part of the joins and to put the results in the order the user wanted) so blowing the PGA sort area and going to disk-based sorts. When the tempfile wasn't autoextend they were probably getting a lot of searches failing due to running out of space in the temp tablespace ("Cannot extend temporary segment ...") so the administrator had, without thinking of the consequences, changed the tempfile to autoextend. At some point the number of users sorting large datasets reached the point where a 32Gb tempfile was required to hold all the temp segments.

And so we get onto the trainers and the administrators.

IT application trainers (and those who write the courses they deliver) are supposed to train users on how to use the application. I'm sure they do that, but do they train the users on the best ways of using the app? In the example above it probably didn't come up in the training environment (with the small number of users and small data volume) that unselective searches might result in huge data sets. Unfortunately in the live environment I'm sure that the users not only ran into failed searches (due to not being able to extend the temporary segments) but also found that the searches that did complete were slow and produced large outputs which they then had to read through to find the results they wanted. Perhaps if they had been trained to make their searches as selective as possible from the start then the problem would not have arisen?

I know that trainers are often beholden to budgets and other constraints (and may even be asked to deliver a course in a far shorter period than they would like) but...it's still frustrating to be the one who has to pick up the pieces. Users shouldn't have to understand the underlying systems but I don't think it's unreasonable that the trainers should have a reasonable grasp of the implications of how the application works.

And then there's the administrators. I've been working with Oracle for about nine and a third years now (it'll be 10 next August). I can probably reasonably claim to have been a DBA for about 7 to 8 of those years, my first formal training in Oracle being a course in 1999 taught by Doug Burns (one of the best Oracle trainers I've had the pleasure of being trained by), I'm also the only Oracle DBA in the entire organisation (about 50,000 employees with 480 IT staff). Most of the other people, in the organisation, who manage Oracle databases don't come close to that length (or depth) of experience. Additionally a number of them have a tendency to just slap on whatever patch, change or whatever comes through the door or from a Google search with no real understanding of or thought about the implications, possibly due to years of working with Microsoft systems.

Even with those years of experience I still, as evidenced by the datafiles/tempfiles above, find details I didn't know or have forgotten. Still, one of the things that my experience has taught me is that no matter how loud management are screaming, no matter how simple a change seems, no matter how right a change seems when you first think of it, always think of the implications.

Now I just have to communicate that to the other 'administrators'.

No comments: