Tuesday, October 2, 2007

DocBook + Images != Fun

I've been using DocBook for a little while now, and I have to say, I wish there was a better alternative (something which provides a nice base format for the creation of multiple formats). My latest niggle is working with images. Like most things with DocBook, the solution starts off only a little difficult, then gets a fair bit more complicated.

Before I go on, I realise a big part of my complaint is to do with XSLT: at the end of the day, XSLT really is just about transforming the XML, and not about building finished documents. Any yet I can't help but feel that whatever solution a person uses for document creation should encompass the whole shebang, from the core documents to the images they contain.

Back to the topic of conversation, DocBook and Images. Getting the initial block of XML needed for images isn't that hard, and here it is:


<mediaobject id="Registry Types">
<imageobject role="fo">
<imagedata format="SVG" fileref="RegistryTypes.svg">
</imagedata>
<imageobject role="html">
<imagedata format="PNG" fileref="RegistryTypes.png">
</imagedata>
</imageobject>
</imageobject></mediaobject>


You'll notice that I've specified two formats, one for the PDF generation and one for the HTML generation. This represents the fact that scalable vector graphic support is still not as widely accepted as we'd like. So, now we've got to create multiple images to support two formats. I could have just used a raster image for the PDF, but due to the need for higher print quality (and also the fact that Magic Draw doesn't provide configurable dpi on it's rasters) I chose to do both formats.

Now comes the trick: the 'fileref' is not relevant to either the HTML XSLT transformation or the FO XSLT transformation, it's only relevant to either the final destination of the HTML or as I would deem incorrectly, the base directory of the fop Ant task.

For the HTML, what this means is that you have to define appropriate tasks to copy the images to a directory accessible to the final production. Not too difficult but worth remembering.

For the PDF generation, you need to ensure that the images are relative to the base path where the Ant script is run... so the above declaration should probably be changed to something like "images/RegistryTypes.svg" so the "fop" task can access the required images.

Thursday, August 16, 2007

AONS, Rails and Other Integration Frameworks

I think I'm one of the last people to try out Ruby on Rails. Well, I will be last if I ever do.

Rather then state my opinions on RoR (how quickly one picks up another acronym) and risk the ire of the agile community, I think I'll just examine the technology stack within the context of creating AONS.

First off, I don't know Ruby on Rails very well. I think I understand the principal though:
- Add Ruby, a development language which is geared towards rapid development
- Integrate (or create) any Ruby middleware technologies in a best of breed approach, essentially abstracting the developer away from this choice (saves time on tasks like integrating with one of many persistence implementations and learning it's quirks)
- Create good facades for any commonly used subsystems (authentication, notification?)

In principal, I admire RoR for integrating all those technologies because integrating technologies on a per/project basis is a pain (and as Martin Fowler says, it's a great domain specific application for writing web applications). As an example on AONS, it took about four weeks integrating and de-quirking Hibernate JPA (and finding a nice flavour), Compass/Lucene, Spring, JFreeChart, Quartz, Castor, ROME, a REST exposure methodology and Spring Web Flow (mainly in choosing one of many, many view technologies). Most of this was spent familiarizing myself with JPA and trying a few different implementations so realistically it could be cut down to one or two weeks if I had to repeat. But the point here is that if it is a common task, why do this at all? So, on this point, I do admire Ruby on Rails.

Where I disagree with Ruby on Rails in the long term is that learning and integrating with new technologies is an ability. Whilst it is nice not to have to constantly waste effort doing these kinds of tasks, learning various approaches to wire in different technology stacks is a must have skill for all but the most trivial projects. So, I think the bonus Ruby gives should only be that it removes most of the integration tasks from a project - but at the end of the day a good developer should always be willing and able to integrate should existing technologies not prove sufficient. I believe part of the attraction to Ruby on Rails is that at the end of the day, we're lazy creatures. That's not necessarily a bad thing: having 80% of the integration job done means you'll get quicker return on investment getting that last part of integration done. But the ability to integrate is something which needs to be constantly honed otherwise it atrophies. I also believe Ruby on Rails offered a nice foot in the door to a lot of developers with minimal integration experience. Suddenly they could compete with the Java gurus who wore their hard won knowledge of integration like a badge. But here's where I believe the crux is: nothing can make up from the experience of learning to wire in some misbegotten technology. Every successful developer will have to do it, and time will be a good measure to show whether those who took up RoR learned how to take steps beyond the initial RoR framework... Did they keep learn the "art" of integration or are they left waiting for the next RoR (SEAM?).

So, integrating the technologies is an all go - now creating subsystems. I think I do agree that for a given domain specific language, you need a set of domain specific subsystems. I think RoR's success is a testament to the lack of commonly used functionality within the J2EE/EJB core (again back to the idea Martin Fowler talks about with Java being a domain independent language). Yes, in terms of "enterprisey" features they (J2EE/EJB) are fantastic, but up until recently, most of those features were hard to implement/understand. People need facades for the domain their working in... the job should be easy if you're doing a task in a common way. It may be necessary to slip past the facade should a job be more complicated, but that option should be there to do something simply. The main success of projects like SEAM/RoR is to go, "Hey, Java developers, use us and accelerate your development". And again, I'm forced to tip my hat in their direction.

Just one point before I finish. The work I did for AONS could quick easily be stripped into an agile integration core offering similar features to something like SEAM or Ruby on Rails. Where it would fall down (without proper resourcing) would be the many, many hours it takes to write good examples and documentation so that using the framework became easy and intuitive.

So, that's my thinking on Ruby on Rails (and a touch on SEAM). I'm not sure how helpful this will be to anyone... if nothing else maybe it will just resonate with thoughts going through other developers minds about the current successes in the small to medium application development world.

Cheers,

Dave

Tuesday, August 7, 2007

Got 2 phase deployment done

I've now successfully added a two phase option to the AONS build. This is a handy mechanism to get the build completed up to the point where it would require client specific information (database connection info) and then stop the build there. What this means is that someone downloading a "release" sees a much more lightweight project structure then someone working with the full development structure. Another benefit is that we don't need to ask the deployer to compile the code against any required server libraries which can't be included in the release (due to more restrictive licensing).

This release also contains the compiled DocBook documentation (PDF, HTML single and HTML split) generated from the source DocBook XML which is included in the main project. So far I've got a basic Install Guide written and am working on the As Build Specification.

I know there's a lot of room for improvement, but between writing new features, bug fixing and finishing off existing documentation it's still a bit overwhelming. If anyone is reading this and downloads the release build from the SourceForge, please let me know any issue encoutered or any "usability" ickiness which needs improvement. Any tweaking with included file structures is actually pretty easy to do, it's really just needs a bit of perspective from someone who hasn't had their face right against the coal face for the past few months.

Well, for what it's worth, the build release is over here. Remember, we're still in Beta so this build should be taken with a hefty grain of salt.

Wednesday, August 1, 2007

hit with the documentation stick

Getting towards the end of this iteration of AONS, I'm starting to shift gear and work more on documentation. In an ideal world, I'd get the following done:

  1. As Built Specification (design/implementation focused document)
    1. How AONS works (methodologies, design patterns etc)
    2. Developing AONS (code conventions, source control etc)
    3. Building AONS (development build and two phase distribution build)
    4. Installation Guide (may also be an independent document)
  2. User/Admin Guide
  3. REST Interface Guide
  4. Service Usage Model (SUM)
  5. Future Extension Guide (anything we'd really like to have implemented in AONS, which there's a lot, but just plain ran out of time)
  6. Fix up any not completed Javadoc
As always, time is heavily against us... and documenting this stuff with the effort it deserves is a little daunting. Ah well, if I just spend a few hours every day until the end of the project, I should be able to knock over most of it.

From a technical perspective, one facet of this activity I am enjoying is that I'm using DocBook as the source format for my documentation. This required a little learning to get started, but the idea of modular documents really suits my way of thinking. There's nothing worse then having to repeat sections within documentation and knowing that they'll get out of synch with the first update.

One (minor) criticism with DocBook is the WSYIWYG tools are a little clunky. Not that I'm really using them much now that I've got my nice handful of tags under my belt. DocBook seems to be one of those things which you will just keep being surprised by it's breadth of features, but most tasks only require a very small subset.

Another (minor) concern with DocBook is the usage of the DTD. I was (and am) a little confused as to whether there is an XML Schema available for DocBook. The main reason I'd personally much prefer it is that the Eclipse content assist XML editor seems to only handle XML Schema, not DTD. I guess it comes down to DocBook being closer to SGML then XML so maybe they can't use it... but it's kind of bucking the trend with most tools preferring XML Schema. Ah well, if it aint broke...

Okay, seriously, last concern with DocBook. Xml Entity objects. They really are outdated and there are many better ways to reference other objects. It kind of feels like a poor mans Dependency Injection. Ant used to use them, but there were all sorts of issues... there must be a better way to reference other documents. My preference would be to leave the actual dependencies as undefined within the file which requests them (if they are resolved locally) and wait until all documents are composed together with a top level import document.

[Update: Okay, just found the documentation for DocBook with XIncludes... looks like XML Entities were thrown out.]

Wednesday, July 11, 2007

file identification funny

Occasionally one stumbles upon a code comment which both accurately states what the program is doing and also brings a smile to the face of the reader. Whilst reading through the source code for the "file" utility in it's attempts to identify a file, I came across this one which I thought was quite good:

/*
* Try compression stuff
*/
if (!zflag || __lf_zmagic(buf, nb) != 1)
/*
* try tests in /etc/magic (or surrogate magic file)
*/
if (__lf_softmagic(buf, nb) != 1)
/*
* try known keywords, check for ascii-ness too.
*/
if (__lf_ascmagic(buf, nb) != 1)
/*
* abandon hope, all ye who remain here
*/
__lf_form("data");

Thursday, July 5, 2007

People Forget

After reading this about Spring I kind of got a little bit annoyed. The number of people who forget or don't even know how bad a job configuration management used to be before Spring gave it one hell of a kick in the pants is staggering. Also to be thanked are Martin Fowler and the other IoC containers for starting the push.

Just yesterday I found a lovely refresher on 'ye old days of bizarre configuration' in Lucene's SegmentReader via a global property which is still valid in it's current release. But nasty things like that don't happen much anymore do they? Now that the bar has been raised, isn't everyone quick to bemoan nasty things like well structured XML configuration.

Oh geez, and the major complaint that "how would we live without Spring?". Continuing on from that, why would you? I'd be willing to bet that if you're using Spring in a deployable Application and not a piece of middleware, you're going to need a lot more then a Spring jar to run your application, and guess what, it's going to be a small to large job to change any of them. That's the world we live in as integrators first and developers second. If a developer doesn't like it, don't use any middleware and see how long it takes.

Next, complaining about the start up time: again... is this really relevant? Not many people have weekly 'Friday server startup races'. Also, an empty Spring container will take a little while to start up, but once done, those load operations were concentrated in one tiny bit of the applications life: meaning that you don't keep doing them for every operation, saving time. See that last part there, that was key: do things which have to get done once, then they don't have to be done every time you want to do the action. That logic will work as long as you don't do any unnecessary load operations in the Spring load... but then I guess you can also use Lazy loading if you want. See, options... it's all about options.

And the Spring people are nice too... Yatzaa!!

Wednesday, July 4, 2007

lucene issue

I feel sorry for Shay: he's written a beautiful object indexing layer, Compass, on top of Lucene (the equivalent of writing the first good Object Relational mapping tool but for searching), yet is let down by the underlying interface to Lucene. Don't get me wrong, Lucene is great, but it does have some quirks.

Take this one for example, setting a custom SegmentReader. Compass uses a custom SegmentReader (unsurprisingly called CompassSegmentReader), which in an of itself isn't a huge modification of Lucene. The quirk comes into play when we deploy multiple applications into the same Servlet container. It gets back to the method of setting the custom SegmentReader in Lucene; not changing a property on a factory or another configuration parameter... no, Lucene needs to set a system wide property. Yuck. This means that when Compass set's it's custom SegmentReader, it sets it for everything in that JVM. So App1 using Compass is okay, but App2 using pure Lucene then falls over unless the Compass jar file is included. Here's a thread on the topic over at the Compass forums.

In my case I've found this out when deploying DSpace on the same application server as AONS. I'm not sure how I'll get around it: I can probably ask the DSpace developers to include Compass in their distribution, or I'll probably just put in a README note about it. Either solution is less then ideal but probably the least path of resistance by far. Maybe the Lucene developers would be willing to change this... but the original post about this problem in the Compass forums was in 2005 so I have my doubts.

Well, here's the offending code from org.apache.lucene.index.SegmentReader (valid as of Lucene 2.2.0):

static {...

try {

String name = System.getProperty("org.apache.lucene.SegmentReader.class", SegmentReader.class.getName());

IMPL = Class.forName(name);

} catch (ClassNotFoundException e) {

throw new RuntimeException("cannot load SegmentReader class: " + e, e);

} catch (SecurityException se) {

try {

IMPL = Class.forName(SegmentReader.class.getName());

} catch (ClassNotFoundException e) {

throw new RuntimeException("cannot load default SegmentReader class: " + e, e);

}

}

}

This kind of code is exactly why Inversion of Control (aka Dependency Injection) was so good when it came along: it got rid of static calls and global system properties like this and made property injection obvious and intuitive. I'm sure when this segment of code was first written, it was completely acceptable to do things like this... but nowadays configuration and anticipated customisation should be a little bit more elegant.