"I have a mind like a steel... uh... thingy." Patrick Logan's weblog.

Search This Blog

Wednesday, September 19, 2007

Atom and RelaxNG

(Update: Links now work, thanks to Sjoerd Visscher.)

I'm fairly new to the details of Atom format and brand-spanking new to the details of RelaxNG. So here's what I did, and if you have any comments, then add them here or send an email to patrickdlogan at gmail dot com. I also did not find any explicit examples like this on the internets, so this may help someone else -- it's not complex, and that makes me feel good about atom.rnc and relaxng in general.

My objective was to use relaxng to validate an atom entry where the content is inline'd xml. The validation should be for the atom stuff per se, as well as for the specific xml content. A second objective was to apply the relaxng schema for that specific xml to that content separate from the atom stuff. So the xml content should stand alone or be inline'd in an entry and one schema definition should be usable for both cases.

I grabbed atom.rnc, the compact relaxng definition for atom. I grabbed the jing validator. Applying just this schema worked. Good. Good.

Then I spent some time looking at that schema and reading through various relaxng combination mechanisms. The combination takes place at the definition of atomInlineOtherContent in atom.rnc. This is where the inline'd xml content goes.

An interleaved combination did not work, as I found out, because that definition as well as the definition of my own xml includes "text", which causes relaxng trouble. Although I don't fully understand why this should cause trouble if my schema is more specific.

The combination mechanism that did work is to redefine atomInlineOtherContent in my own schema...

namespace atom = "http://www.w3.org/2005/Atom"

# Usage:
# java -jar jing.jar -c atom_policy.rnc example_insurance.atom

# Note: this includes an unmodified atom.rnc grammar from the ietf
# standard. This grammar here redefines the inline content pattern to
# specify that such content should include appropriate attributes from
# the atom spec, e.g. type="application/xml", but the content itself
# should be a  tree of elements.

include "policy.rnc"

include "atom.rnc"
{
  atomInlineOtherContent =
     element atom:content {
        atomCommonAttributes,
        attribute type { atomMediaType }?,
        amitPolicy
     }
}
This looks just like the original definition except the actual content has to match the definition of amitPolicy as defined in policy.rnc. (I should probably redefine the attributes to require a specific type of "application/xml" or something, but this is close enough for now.)

The example entry with policy content validates using this more specific grammar. More goodness.

I met the second objective with just a little bit of cruft. I'd like to use policy.rnc to validate just some xml that starts with amitPolicy and has no atom elements at all.

I could not apply policy.rnc directly because I'd have to add a "start" to that grammar, which then gets the validator bogged down when using that grammar with atom, two starting points, not combined correctly.

Instead I wrote another small schema to define the starting point for a standalone policy...

include "policy.rnc"

# This currently exists to support validating policy xml content
# without being part of an atom entry. There may be another way to
# avoid having this separate little grammar.

start = amitPolicy
There may be another way to do this without defining another schema like this.

No comments:

Blog Archive

About Me

Portland, Oregon, United States
I'm usually writing from my favorite location on the planet, the pacific northwest of the u.s. I write for myself only and unless otherwise specified my posts here should not be taken as representing an official position of my employer. Contact me at my gee mail account, username patrickdlogan.