Tuesday, February 9, 2010

RDFa Change Sets

With so many sophisticated applications on the Web, the key/value HTML form seems overly simplistic for today's Web applications. The browser is increasingly being used to manipulate complex resources and an increasingly popular technique for encoding sophisticated data in HTML is RDFa.

RDFa defines a method of encoding data within the DOM of an HTML page using attributes. This allows complex data resources to be connected to the visual aspects that are used to represent them. RDFa provides a standard way to convert an HTML DOM structure into RDF data for further processing.

Instead of encoding your data in a key/value form, encode your data in RDFa and use DHTML and AJAX to manipulate the DOM structure and in turn manipulate the data. The conversion from HTML to data can be done on the server or client using existing libraries.

There are a few ways that RDFa can help with communication to the server. The simplest would be to send back the entire HTML DOM for RDFa parsing on the server. However, an HTML page might contain an excessive amount of bulk and therefore this would not be appropriate as a general solution. Instead, using an RDFa parser on the client, the resulting RDF data can be sent to the server, ensuring only the data is transmitted back. This would reduce excessive network traffic and move some of the processing to the client.

In a recent project, we went further and used rdfquery to parse before and after snapshots on the client to prepare a change-set for submission back to the server. In JavaScript, the client prepared an RDF graph of removed relationships and properties and an RDF graph of added relationships and properties. These two graphs represent a change-set. By using change-sets throughout the stack, enforcing authorization rules and tracking provenance became much more straight-forward. Change-sets also gave more control over the transaction isolation level, by enabling the possibility of merging (non-conflicting) change-sets. Creating change-sets at the source (on the client) eliminated the need to load/compare all properties on the server, making the process more efficient and less fragile.

RDFa on the client and submitting change-sets can help streamline data processing and manipulation and avoid much of the boilerplate code associated with mapping data from one format to another.

Reblog this post [with Zemanta]

11 comments:

  1. Awesome stuff.

    I've been thinking about this sort of thing for synchronization of large graphs between servers. I want to tweak the HTTP Object Server so that a client can do something like:

    GET http://foo.com/foo.rdf

    accept: application/rdf-diff;...

    if-modified-since: 2010-02-15:00:00:00

    And receive a document with only the changes since the time specified.

    ReplyDelete
  2. I have been thinking about the same issue myself -- about how to propagate changes. I have been wanting to use SPARQL Update 1.1 as the changeset format.

    I am still pondering what the protocol would look like, but I was thinking of doing a multiple step solution. First request would list the changesets since a particular time or version, and each changeset could then be requested individually before being applied in the target store.

    What do you think?

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. How about using the Talis ChangeSet protocol?

    http://n2.talis.com/wiki/Changeset_Protocol

    Or if you are just looking to maintain multiple copies of the same data then something like RDFSync?

    http://semedia.deit.univpm.it/papers/RDFSyncISWC2007.pdf

    ReplyDelete
  5. Thanks for the link. Once SPARQL 1.1 is out, I'd like to use it to describe all triple operations (like a changeset). Of course the problem with that is that it is missing metadata about the reason and original date, but that could always be embedded in the INSERT (perhapes using this CS schema).

    The use of URI is not clear to me. Is the URI pattern fixed? Can multiple RDF stores exist on the same authority?

    ReplyDelete
  6. Interesting. Would you have a different changeset for every transaction that has been done on the store?

    For my use case I'm less concerned with the different changesets, more interested in the server generating individual changesets with all the changes since a particular timestamp/version.

    ReplyDelete
  7. My use case is for store synchronization. Whether we use a diff or a list of changesets depends on the frequency of store synchronization. If the stores synchronize regularly (say within an hour), a list of changesets is best because it allows the changesets to propagate among a cluster of stores in a non-linear way.

    What would you use the diff of changes for?

    ReplyDelete
  8. Could you please introduce some useful link to me about how implement java RDFa parser?I want to create a scraper that must be able to extract information from RDFa statements.This scraper must be generic.
    In addition,what is the relationship between the ontology and RDFa?because all the information relevant to the ontology must be extracted from the site.(An ontology is made available for the ... domain)
    Please reply as soon as possible

    ReplyDelete
  9. In the above post, I used the rdfquery parser. There are a few parsers that use XSLT (many from within Java). A popular XSLT for RDFa is by Fabien Gandon. Jena's GRDDL parser can parse RDFa, but I haven't used it myself.

    In a related project to the above post, we implemented our own RDFa parser in pure Java (we needed more then just triples). To that end I found the RDFa specification to be light on details and the test suite contained at least one contradiction. I would caution others to ensure they understand what they are getting into before implementing their own parsers. This pure Java RDFa parser will be released under an Open Source license next month. Watch this blog for details.

    RDFa must use an RDF vocabulary in the typeof, property, rel, and rev attributes. An ontology defines the meaning and definition of an RDF vocabulary.

    ReplyDelete
  10. Hi,
    Could you please give some examples about how to create individuals for ontologies classes (in Java) since the ontology classes are generic not specific to some sites.

    NB:We faced a problem of how to get the individual name and individual class to create the individual by using createIndividual(); for any websites not for a specific site.

    Best regards,
    Andi

    ReplyDelete
  11. Not sure what you are asking Andi, try expanding you question with more examples on the sesame mailing list.

    ReplyDelete