March 14, 2007

Techniques for working with Temporal Databases.

I've been working recently with a temporal database and have gone back over my blog as a result to see what I've written on the subject.

All that I've written so far is a simple diatribe on temporal modelling.

I think that it is about time that I share a hard won lesson about working with temporal databases in multi-user environments: It is alarming how frequently concurrency can be messed up.

The problem begins with the fact that classical key based uniqueness checks just don't work. In a temporal database you must be able to support many rows referring to the same entity, each row covering a period in the entity's life. This means that any time that you wish to retrieve a view of an entity you either retrieve a set of rows defining the whole life of the entity or you choose a point in time and use a BETWEEN ... AND clause to select a single row indicating the state of the entity at that point in time. Effectively any key has a temporal element. The full key being the identity of the entity, a start date and an end date for the period of time for which the attached information was/is/will be correct.

So far so good, however databases are just not set up for checking this kind of key. They can trivially check the uniqueness of a key - the fact that a given row representing an entity does not have the same combination of identity key and start and end dates. Can you sport the flaw though? What is needed is the ability to check that the row does not have start and end dates that OVERLAP with those of another row for the same identity key. If they do overlap then we end up with two valid sets of data for an entity on a given day - obviously no good.

Keeping up? Then on with the complexity! The first pass attempt at fixing this is to add application or database level programmatic checks that check whether there exists a row that overlaps. That seems fine as far as it goes, however there is a problem - if two users are performing an update at the same time then standard 'transaction isolation' causes a problem. In most databases the default behaviour is that a user can only see committed work done by other users this means that if both users are updating the same entity then the programmatic checks will not see the changes being done by the other user until they commit it and the overlapping rows can still occur.

I've seen a number of bizarre solutions that attempt to solve this, two of the least satisfactory were to either relax transaction isolation so that each user can see what the other is working on (doesn't work in the end) or to have processes that find overlaps and manually resolve them (oh the work!).

The proven solution is to use a sensible locking strategy. The locking strategy can be used to ensure that the programmatic checks are run correctly despite multiple users maintaining the data. You can't use the temporal rows to do this useful locking. Typically I would analyse the data model and either identify non-temporal entities that can be used to lock sets of temporal entities or create my own non-temporal entities for the locking. This kind of non-temporal entity used to lock temporal entities can be referred to as a lifeline/timeline lock. When the lock is obtained you are locking out the entire lifeline of the entity being maintained and preventing other users/processes working on it simultaneously.

No comments: