Over the past couple of days there have been a number of queries as to the technical reasons why we need a new syndication format and API, rather than using RSS and the metaWeblog and/or Blogger API. Below I've outlined what I see as the primary technical ways that Echo can improve upon the foundation laid by RSS and the current weblog APIs (metaWeblog and Blogger). Where appropriate, I've tried to put these into context to show how they can benefit users and developers.
We've pledged our support for the format. We want to see it succeed for the technical reasons discussed below, and because a truly interoperable format for syndication, archiving, and communication benefits all of the existing tools and any tools that are written in the future.
The post below refers to the format as Echo, and the API as the EchoAPI. Because the name Echo is taken by another project, a new name will be chosen at some point. For now, it's easiest just to refer to it as Echo.
1. The RSS spec does not say how to encode content.
Double-encoding entities? Using content:encoded with CDATA? Using xhtml:body?
Content is the most important part of a syndicated feed. As such, the feed specification should be perfectly clear on how to represent that content. This is probably the toughest part of defining the format and is in the process of being hashed out right now.
Benefits to users and developers: With a clear idea of what content is, and how it is encoded, Echo feeds and APIs can handle more than just textual content. We can combine the functionality that we currently have in metaWeblog.newPost and metaWeblog.newMediaObject, for example, into a "newItem" method that specifies the encoding and MIME type of the content item being created. After all, it's just content either way.
2. XML-RPC is severely lacking in internationalization (I18N) support.
The specification says that all strings are ASCII-encoded, which is an artificial limitation on the type of content that can be passed around via XML-RPC (there is no such limitation in XML itself). Treating content as utf-8 is technically breaking the spec, and since the spec is frozen, there's no way to change this.
Benefits to users: Technically, any application that treats text as being encoded in anything other than US ASCII characters is breaking the spec. This means that XML-RPC technically supports only English-language posts. An API that takes internationalization into account will not have this limitation, and will allow posting in any language, using any encoding.
Non-English weblogs are
no longer the minority (if they ever were) , as the NITLE Weblog Census shows. Out of 536,935 likely weblogs, only 263,577--less than half--are in English . We need an API whose spec can support non-English weblogs (and we need a way to identify the language in the feed).
3. Content is represented differently in an API than it is in a syndicated feed.
This is another artificial distinction. It's the same content either way--in some cases, it's been marked up or treated by a post-processor on a content management system. But in the end there are only two forms of content: untreated and treated.
An API should leverage the data model used in the syndication format; once a tool supports Echo, adding support for the EchoAPI becomes much easier. Granted, the metaWeblog API's content struct has gone part of the way towards normalizing the representation of content between a feed and an API, but it uses only some of the RSS elements, and the similarities apply only in the data model. We can take this to its logical conclusion by using both the data model and the serialization in the feed and the API.
Benefits to developers: using the same data model and serialization for syndication, archiving, and editing simplifies the development of tools to work with (produce and consume) these formats, for obvious reasons: code written to produce an item in an Echo feed, for example, can also be used for producing data sent in an API request or packaged up for archiving.
4. Confusion over elements.
We need to eliminate the confusion over what elements mean, and which elements should be used. For example, <link> is not clearly defined. Some tools treat it as a permanent link to the content item, and some treat it as a link to the referenced item (for example, commentary on a news story).
We also have problems with namespaced versus native elements. For example, <dc:date> vs <pubDate> and <dc:creator> vs <author>. In both of these cases, the Dublin Core elements are technically superior: dc:date is specified in ISO-8601 format (easier to parse and sort than RFC 822), and dc:creator does not have the restriction that it be a valid email address (which causes spam/privacy concerns). But because they are part of an extension namespace and not native elements, there has been confusion over which is the proper element to use.
Benefits to users and developers: a well-supported set of core elements simplifies tool development, and could enrich the experience of using aggregators and other tools that consume content. Currently, most (all?) of the fields in an RSS <item> are optional. This forces aggregators to tailor the user experience for the lowest-common denominator feed, one that has only a link (and a headline, maybe). Elements like date and author are important to the reading experience and can be required by making a fresh start at a format.
5. No universally-supported and -defined extensions.
By their nature, extensions need not all be well-defined--the purpose of putting an extension mechanism in place is for applications to add new functionality without having to modify the core specification.
That said, once we have defined a core set of elements for the new data model, we should define some extensions that can be agreed upon by all tools that support the concepts therein. (In other words, if we have a comment extension, tools that don't support comments could obviously not support the comment extension; but any tool that supports comments should support the comment extension, so that it can interoperate with other tools.) Some possible extensions are in discussion on Sam's wiki.
Benefits to users and developers: similar to #4. If all tools supported the same extensions for including comment data, for example, aggregators might be more willing to add support for the extra metadata.
So, those are what I see as the main technical improvements that Echo can bring. The other question that has been asked is, can't we just make these improvements to RSS?
And that's the problem: we really can't. Setting aside any of the political issues--because, for this initiative to be accepted, it needs to be done for technical reasons--the RSS specification is frozen, and even if it were to be changed, changes would need to be backwards-compatible. This is fully acceptable and understandable, but I believe that the time has come to shed backwards-compatibility and start up with a fresh start. We've all learned a lot about how we're using RSS, and we can apply this knowledge to creating a new format built by the community without any of the baggage--political or technical--that has been built up over the years.