Sunday, November 28, 2010

Status Code 200 vs 303

The public LOD has been dominated by discussions on using 303 in response to a GET request for distinguishing between the requested resource identifier, and a description document identifier.

Some resources can be represented completely on the Web. For these resources, any of their URLs can be used to identify them. This blog page, for example, can be identified by the URL in a browser's address bar. However, some resources cannot be completely viewed on the Web - they can only be described on the Web.

The W3C recommends responding with a 200 status code for GET requests of a URL that identifies a resource which can be completely represented on the Web (an information resource). They also recommend responding with a 303 for GET requests of a URL that identifies a resource that cannot be completely represented on the Web.

Popular Web servers today don't have much support for resources that can't be represented on the Web. This creates a problem for deploying (non-document) resource servers as it can be very difficult to set-up resources for 303 responses. The public LOD mailing list has been discussing an alternative of using the more common 200 response for any resource.

The problem with always responding to a GET request with a 200 is the risk of using the same URL to identify both a resource and a document describing it. This breaks a fundamental Web constraint that says URIs identify a single resource, and causes URI collisions.

It is impossible to be completely free of all ambiguity when it comes to URI allocation. However, any ambiguity can impose a cost in communication due to the effort required to resolve it. Therefore, within reason, we should strive to avoid it. This is particularly true for Web recommendation standards.

URI collision is perhaps the most common ambiguity in URI allocation. Consider a URL that refers to the movie The Sting and also identifies a description document about the movie. This collision creates confusion about what the URL identifies. If one wanted to talk about the creator of the resource identified by the URL, it would be unclear whether this meant "the creator of the movie" or "the editor of the description." Such ambiguity can be avoided using a 303 for a movie URL to redirect to a 200 of the description URL.

As Tim Berners-Lee points out in an email, even including a Content-Location in a 200 response (to indicate a description of the requested resource) "leaves the web not working", because such techniques are already used to associate different representations (and different URLs) to the same resource, and not the other way around.

Using any other 200 status code for representations that merely describe a resource (and don't completely represent it) causes ambiguity because Web browsers today interpret all 200 series responses (from a GET request) as containing an complete representation of the resource identified in the request URL.

Every day, people bookmark and send links of documents they are viewing in a Web browser. It is essential that any document viewed in a Web browser has a URL identifier in the browser's address bar. Web browsers today don't look at the Content-Location header to get the document URL (nor should they). For Linked Data to work with today's Web, it must keep requests for resources separate from requests for description documents.

The community has voiced common concerns about the complexity of URI allocation and the use of 303s using today's software. The LOD community jumped in with a few alternatives, however, we must consider how the Web works today and be realistic on further Web client expectations. The established 303 technique works today using today's Web browsers. 303 redirect may be complicated to setup in a document server, but let's give Linked Data servers a chance to mature.

Monday, September 13, 2010

HTML-Oriented Development

The heart of all Web applications is the user interface (UI) design - this is what its user interact with. As any consultant knows: clients are more satisfied with a well designed UI and mediocre business logic then they are with a poorly designed UI with minimal transparency and fully automated business rules.

What is surprising (when you think about it) is that most Web application frameworks orient around the business model and treat the HTML like a second class citizen. The conceptual model may be important, but even more important is the representation of the model in HTML. Good UIs provide the user with full transparency to the state and operations of the underlying model. It doesn't matter how well the model is if the HTML is too confusing or too obscure; users will avoid using it.

The HTML of Web applications is surprisingly rich with domain concepts. Most well designed UIs contain all the classes, relationships, and attributes found in the underlying model and present them to the user in a language everyone involved can understand. There is a lot of emerging standards that can help turn this human readable data in HTML into machines readable data using RDFa, microformats, or microdata.

Recently, David Wood and I started the project Callimachus; it has taken a different approach to Web application design/development. Callimachus reads the domain model from your HTML templates! In Callimachus there is no need to maintain multiple models, no SQL schema, no query languages, no object-relation mapping, it's all embedded in HTML using RDFa.

RDFa allows your HTML to include resource identifiers, their relationships, and properties using additional attributes such as: about, rel, and property. Consider the following HTML snippet. Using RDFa the data is readable by both humans and machines alike. It says that James Leigh knows David Wood using the relationship "foaf:knows" and the property "foaf:name".

<div about="james">
<span property="foaf:name">James Leigh</span>
knows
<div rel="foaf:knows" resource="david">
<span property="foaf:name">David Wood</span>
</div>
</div>

Written using a Callimachus HTML template it might look like the snippet below. Here is an embedded query asking who knows "david" and what is their name.

<div about="?who">
<span property="foaf:name" />
knows
<div rel="foaf:knows" resource="david">
<span property="foaf:name" />
</div>
</div>

Callimachus provides the framework necessary to create HTML templates to query, view, edit, and delete resources. This technique allows Web developers to save time and maintenance costs by applying the DRY principle (Don't Repeat Yourself) to Web application development.

For more information about Callimathus see http://callimachusproject.org or turn into my live Webcast on Wednesday at http://www.wilshireconferences.com/semtech2010/email/email-webcast-091510.html

Thursday, May 27, 2010

The Future of RDF

At the end of June, immediately after SemTech, I'll be attending the W3C RDF Next Step Workshop. This workshop has been set up with the goal of gathering feedback from the Semantic Web community to determine if (and how) RDF should evolve in the future. I'll be presenting two papers with David Wood which I hope will generate good discussions...(To review the papers or for more information on the workshop, go to NextStepWorkshop.)

The first paper I'm presenting will show a new RESTful RDF Store API supporting named queries and change isolation. (I blogged about this earlier this year.) This proposed API would combine basic CRUD operations over RDF constructs (graphs, services and queries) and mandate RDF descriptions of services. With the ability to modify an RDF store's state in SPARQL 1.1 comes the challenge of managing store versions and the need to manage them (and their differences) over HTTP.

The other paper is a proposed alternative handling of rdf:List in SPARQL. The way we currently deal with ordered collections in RDF, whether through tools or in SPARQL, is so difficult that it limits adoption of RDF. So much of data retrieval, which is currently dominated on the Web by XML, includes the notion of ordered collections - RDF must align the RDF representation with the conceptual notion of ordered collections if it has a chance of making inroads into already established networks.

Where do you think RDF needs to go in the future? Does it need to change if it is going to stay viable?

Reblog this post [with Zemanta]

Monday, March 8, 2010

Reinventing RDF Lists

Last month the SW interest group discussed alternatives to containers and collections as part of a discussion around what the next generation of RDF might look like. Below is my opinion on the matter.

RDF's simplistic approach makes it possible to encode most data structures, both simple and complex. The challenge people have with RDF, coming from other Web formats, is the lack of basic ordered collections (a concept common in XML). In RDF you are forced into a linked list structure just to preserve resource order. The linked list structure known as rdf:List is difficult to work with and highly ineffective within modern RDF stores.

Most RDF formats provide syntactic sugar to make it easier to write rdf:Lists. In turtle this is done using round brackets (parentheses); in RDF/XML this is done using the parseType collection attribute. However, because rdf:List is not a fundamental concept in RDF, no RDF store implementation preserves them, instead opting to use the fundamental triple form -- a linked list.

RDF is made of the following fundamental concepts: URI, Literal, and Blank Node. A fundamental list concept should be added to make it easier and more efficient to work with ordered collections. This would not have a significant effect on RDF formats, as their syntax would not change, but would have a significant impact on the mindset of RDF implementers.

With this change RDF implementers would strive to ensure that lists are implemented efficiently and provide convenient operations on them, just as they would other fundamental RDF concepts. The triple (linked list) form should be kept for compatibility with RDF systems that don't preserve lists, but the goal would be that RDF systems would not be obligated to provide a triple linked list form that has proven to be ineffective.

By making lists a fundamental RDF concept, there is no required change for RDF libraries to continue to be compatible with existing standards. Most libraries and systems may already understand list short hand and some may also preserve it.

Reblog this post [with Zemanta]

Monday, March 1, 2010

Improving RDF/XML Interoperability

All the permitted variations in RDF/XML make working with it using XML tools difficult, at best. Most of the time assumptions are made about the structure of the RDF/XML document. These are based on particular RDF/XML implementations. However, there is no standard spec that says what this simplified structure should be. The next generation of RDF specs should correct this and create a subset of RDF/XML for working with XML tools.

A good place to start is with the document Leo Sauermann has started. I like the design, but feel the rules could be improved, based on my experience.

The design of SimpleRdfXml, as proposed by Leo, is:
1. be compatible with RDF/XML
2. but only a subset
3. restrict to simplicity

The rules I try to follow when serializing RDF/XML for use with XSLT are:

1. No nested elements (references to resources must be done via rdf:resource or rdf:nodeID).
2. No property attributes.
3. All blank nodes identified by rdf:nodeID.
4. Only full URIs, no relative URIs.
5. No type node elements, always as rdf:type element.
6. The same rdf:about maybe repeated on multiple nodes.
7. Never use rdf:ID.
8. Always use rdf:li when possible.
9. Always use rdf:parseType="Collection" when possible.
10. All rdf:XMLLiterals written as rdf:parseType="Literal".
11. Never use rdf:parseType="Resource".
12. White-space is preserved within literal tags.

By standardizing on these (or another RDF/XML subset), interoperability between XML and RDF tools becomes possible. This allows existing shops to reuse their current XML skills to work with RDF, easing their transition.

Reblog this post [with Zemanta]

Wednesday, February 17, 2010

Beyond the SPARQL Protocol

The SPARQL Protocol has done a lot to bring different RDF stores together and make interoperability possible. However, the SPARQL Protocol does not encompass all operations that are typical of an RDF store. Below are some ideas, that would extend the protocol enough that it could become a general protocol for RDF store interoperability.

One common complaint is the lack of direct support for graphs. This is partly addressed in the upcoming SPARQL 1.1, which includes support for GET/PUT/POST/DELETE on named graphs. However, it is still missing the ability to manage these graphs. What is still needed is a way to assign a graph name to a set of triples as well as a vocabulary to search and describe the available graphs. The service could accepted a POST request of triples and responded with the created named graph to support construction. The graph metadata could be available in a separate service or as part of the existing SPARQL service, made available via SPARQL queries.

The use of POST in SPARQL ensures serializability of client operations. However, it prevents HTTP caching (with reasonably sized queries), which is necessary for Web scalability. This can be rectified by introducing standard named query support. By providing the client with the ability to create and manage server side queries (with variable bindings), many common operations can become cachable. These named queries can be described in their own service or as part of the existing SPARQL service. The named query metadata would include optional variable bindings and cache control settings. The queries could then be evaluated on HTTP GET to the URI of the query name, using the configured cache control, enabling Web scalability.

Another requirement for broad RDF store deployments is the ability to isolate changes. Many changes are directly dependent on a particular state of the store and cannot be represented in a update statement. Although SPARQL 1.1 allows update statements to be dependent on a graph pattern, many changes have indirect relationships to the store state and cannot be related directly within a WHERE clause.

To accommodate this form of isolation, separate service endpoints are needed to track the observed store state and the triples inserted/deleted. Metadata about the various available endpoints could be discoverable within each service (or through a dedicated service). This metadata could include such information as the parent service (if applicable) and the isolation level used within the endpoint.

To support serializable isolation, each endpoint would need to watch for Content-Location: headers, which would indicate the source of the update statement in the POST requests. When such a update occurs, the service must validate that the observed store state in the source endpoint is the same as the store state in the target endpoint before proceeding.

By standardizing graph, query, and isolation vocabularies within the SPARQL protocol, RDF stores would be much more appealing to a broader market.


Reblog this post [with Zemanta]

Tuesday, February 9, 2010

RDFa Change Sets

With so many sophisticated applications on the Web, the key/value HTML form seems overly simplistic for today's Web applications. The browser is increasingly being used to manipulate complex resources and an increasingly popular technique for encoding sophisticated data in HTML is RDFa.

RDFa defines a method of encoding data within the DOM of an HTML page using attributes. This allows complex data resources to be connected to the visual aspects that are used to represent them. RDFa provides a standard way to convert an HTML DOM structure into RDF data for further processing.

Instead of encoding your data in a key/value form, encode your data in RDFa and use DHTML and AJAX to manipulate the DOM structure and in turn manipulate the data. The conversion from HTML to data can be done on the server or client using existing libraries.

There are a few ways that RDFa can help with communication to the server. The simplest would be to send back the entire HTML DOM for RDFa parsing on the server. However, an HTML page might contain an excessive amount of bulk and therefore this would not be appropriate as a general solution. Instead, using an RDFa parser on the client, the resulting RDF data can be sent to the server, ensuring only the data is transmitted back. This would reduce excessive network traffic and move some of the processing to the client.

In a recent project, we went further and used rdfquery to parse before and after snapshots on the client to prepare a change-set for submission back to the server. In JavaScript, the client prepared an RDF graph of removed relationships and properties and an RDF graph of added relationships and properties. These two graphs represent a change-set. By using change-sets throughout the stack, enforcing authorization rules and tracking provenance became much more straight-forward. Change-sets also gave more control over the transaction isolation level, by enabling the possibility of merging (non-conflicting) change-sets. Creating change-sets at the source (on the client) eliminated the need to load/compare all properties on the server, making the process more efficient and less fragile.

RDFa on the client and submitting change-sets can help streamline data processing and manipulation and avoid much of the boilerplate code associated with mapping data from one format to another.

Reblog this post [with Zemanta]