Saturday, May 12, 2007

scheduling & transaction proxy wrappers

Probably one of the most confusing parts of the last two applications I have written goes in the Scheduling area. I think I could be doing something better, but I'm just not sure how...

The area of complexity comes into play when Scheduling is mixed with transactions - not so much in that it's hard to enter a transaction, more in that it's confusing to update a "status" flag on a domain object in a new discrete transaction to indicate that the object is running. I've toyed with the idea that I could just have the status a memory based object, but that seems to smell even worse.

I guess further complexity is added by some databases locking for the duration of, what can be quite a long running, scheduled batch job. Most databases are okay with their database locks, but some are quite heavy handed in this, even when transactions are only read only. Luckily I'm currently testing against PostgreSQL and Oracle which, from past experience have shown to be quite smart in their locking processes so I probably won't have to cater for the unsaid bad database - whatever solution I talk about here, when the locking problem comes in too, things just get plain horrible.

So what's my ideal process for the transaction algorithm?
1.) Scheduled Job Starts, calls performSchedule on intended business manager
2.) Manager called, creating a new transaction
2.) New "tiny" transaction which commits quickly to indicate the domain object is running; there should also be a test in this small transaction to check that this domain object isn't already running.
3.) The batch main body of work goes on
4.) The batch main body of work is committed, we also indicate that the domain object is not running

Should a runtime exception be thrown, we should abort the main block of code, but in a catch block, we should also create another "tiny" transaction to indicate that the job is also no longer running and possibly do any logging/notification regarding the failed job.

Where this gets a little interesting is when transaction proxy wrapping mechanisms are thrown into the mix. Whilst not the worst code in the world by any means, we have to now employ a special mechanism to perform the two "tiny" transactions. There are a few ways to do this, some talked about here, but even those solutions do not handle changing the existing transaction demarcation around the proxy object. Since what we need to do is call a retrieve and and update which are normally PROPAGATION_REQUIRED in a very small transaction.

Say we are using business managers in our application as the target for the transaction proxies. Assuming that the business manager for the domain object is also running the scheduling (and here could be my problem), the we need to make the manager call another manager which is wrapped by a transaction proxy to make the normally PROPAGATION_REQUIRED retrieve and update methods wrapped inside a new transaction.

Trying to do this with Spring (at least with TransactionProxyFactoryBean methods), we get circular references which cause the context load to fail. Maybe this isn't the case anymore, but with Spring 2.0.2, this was still happening.

Another option here is to us something similar to the DAO template method where we use transaction callbacks to do a segment of work inside a transaction of our choosing. Hang on I hear you say, shouldn't the code be unaware of transactions since that was the whole point of transaction demarcation? Well yes, you're correct, but this isn't exactly a very normal situation - and in this case the situation requires a different approach since the code algorithm is implicitly aware of the transaction boundaries.

So in this solution, we create a transaction callback object, which would be a really basic interface with something like "TransactionCallback#execute()". This callback would then be passed to a TransactionManager... which would be wrapped by a transaction proxy. The TransactionManager would just have available methods like "executePropagationNew" and so forth.

This is by no means a perfect solution, but should be one of the best available considering.

Another possible solution would be to "break out" the scheduled methods from the relevant manager to a new manager. Still, even using this approach, the new manager would require the same implicit awareness of transaction boundaries and the resulting algorithm would be largely the same.

I hope that all made some sense - if I've confused you horribly and something isn't clear, feel free to ask questions and I'll try fix this up.

0 comments: