Friday, February 16, 2007

Where should data be validated?

This just came up on the mailing list for my local Linux Users Group following last night's meeting (which I didn't attend), the original mail and my response is below:


> In the pub, there was an interesting conversation going on regarding
> validation of data in databases.
>
> Excuse the omissions, as I said, it was overheard
>
> Someone brought up the point that in databaseX If say, you have a
> varchar field set to a limit of 10, and put 26 chars of data into it
> databaseX silently truncates it.
>
> So my question is, in your opinion, should it be up to the front end or
> the database to do this kind of data validation?
>

I'm a Database Administrator, mostly working with Oracle. The reverse of this problem (data validated in the client but not in the database) is something I come accross a lot. Most RDBMS/ODBMS/ORDBMS, certainly any that can claim to be enterprise class, will have functionality to implement data validation (key constraints, check constraints, strong datatyping, triggers &c). Unfortunately the majority of software vendors, in my experience, seem unwilling to use this functionality with their products. The most common excuse is 'database independence', which in reality translates to their app, instead of working well with one *DBMS, will work badly with three or four. They want all the data validation to happen in their application so if the business rules change you have to pay them to update the application rather than just getting your DBA to change the rules in the database, I've met a couple of vendors who have insisted that lists of values (where the user picks a value from a list) have to be hard coded in the application rather than generated from a lookup on a table so everytime you need a new value you have to pay them to write a patch.

To get back to the original question. I'd say that validation _must_ be done in the database layer, _may_ be done in the application layer and _could_ be done in the client layer (I differentiate between application layer and client layer as, in my world of work, N-Tier is very common). The database, however, when it gets invalid data should raise an error/exception which it then propergates to the application layer for handling. The database may carry out some action as a result of that error/exception (writing it out to a log file, the error plus the data that caused the error, is often a good idea) but it should pass it back to the application, what the application then does with it is up to the application, usually you'd want it passed back to the user, maybe translated to a more human readable message if necessary.


I remember attending a presentation at UKOUG Conference a few years ago about implementing business rules via constraints in the database, a very useful presentation.

2 comments:

Tonguç said...

who may guarantee that there will be only one application using your data in time?

I think you can only be sure of the integrity if you centeralize validation on the database, but if needed checks can also be done at client or mid tier of course :)

Stephen Booth said...

Tonguc,

That's what I was trying to get at, validation must happen in the database layer.

It's not just that multiple applications might use the same database schema, in theory you could possibly keep them all in sync with rule changes. The problem is that users have access to tools like SQL*Plus, Ms Access,
Toad, SQL DEveloper, JDeveloper &c that let them at the data without having to go through your application logic. Even if they don't intend to cause damage to the data it's possible, probable even, that they might cause damage accidentally or due to not understanding the data and how it interrelates. Data validation alone can't stop then chaging data (although there are things you can do to stop that) but at least it can reduce the damage.