Six Apart News & Events

A Proposal: RSS for Weblogs

On Scripting News, Dave writes:

We could establish a profile of RSS 2.0 and implement strict compliance with that profile in the major blogging tools.

This has been followed by a discussion of the issues on Sam's weblog, with a lot of discussion over whether the core profile should be based on RSS 1.0, 2.0, or whether it's really necessary at all.

What we need is a profile of RSS specific to weblogs: "RSS for Weblogs".

RSS 1.0 and 2.0 are designed for extensibility, and can be used to represent non-weblog data. Currently they're really only being used for weblogs/news feeds, and Dave has said in the past that RSS is intended only as a news/syndication format. But the point of making RSS extensible is so that new features can easily be added, and new types of data can be represented.

But the extensibility has come at a price, thus far. There are multiple ways of representing dates (<pubdate>, <dc:date>), subjects/categories, post bodies, etc. And the biggest divisions about RSS have come between people fighting for it to be a generalized RDF format and those who want to make it a simplified Weblog/news syndication format. By defining this core subset as "RSS for Weblogs", the energy spent on divisive semantic issues (no pun intended) can be refocused on implementation.

So establishing a profile is the right idea. Despite the hundreds of RSS modules, the lowest-common denominator today is truly only <link> and <title>. We need to raise the bar; We can do a lot better than that.

But as soon as we start discussing a core profile for RSS, we need to define the context in which that profile applies. If we're going to talk about a set of namespaces and elements that define an RSS profile, we need to narrow down the scope of discussion--if we don't, we'll end up with the same lowest-common denominator that we have now, and that's useful for no-one.

So we define this "RSS for Weblogs" (other names welcome): a profile of RSS that weblog tools--tools that both produce and consume weblogs--must support. We need to choose a standard set of elements and modules and define what they mean for weblogs--it doesn't matter what they mean when they are used in other contexts. For example, relative URLs in RSS are currently broken, because the <link> tag is not necessarily the base URL. Well, it may not be in general RSS, but it is in a weblog context. If we set it as a standard, a weblog tool that doesn't support it is broken.

Here's what we see as the basic elements of RSS for Weblogs:

Weblog Data
Information at the <channel> level: the basics (link, title, description), plus language, creator, date, syndication instructions, and copyright.
Weblog Posts
A no-brainer, obviously, but defining it as a weblog post and not just a general "item" allows us to make the following part of the profile:
  • Comments
  • Categories/Distributed Taxonomies
  • TrackBack
In addition, we need the basics (title, link, description), a GUID, date, creator, and an agreed-upon method or representing the full post content: <content:encoded> using CDATA-encoding.
Identity
We need a way to represent federated identity. Obviously, this is a can of worms, but we could standardize on FOAF being used to represent identity, and embed <foaf:Person> into RSS.
GUIDs
We have them now, but we should make them a requirement.
Encoding/Internationalization
Tools that produce and consume RSS must be more I18N-aware--currently, weblogging tools (Movable Type included) don't handle character encoding very well. We'll be getting this right in TypePad and Movable Type Pro.

Now, obviously, this isn't the whole answer. Each of these elements requires discussion over which namespaces/elements to use, etc. But by defining this profile as applying to a weblog context only, we can actually define a standard that goes beyond links and headlines.

21 Comments
May 10, 2003 6:01 PM

Whoa, I hope I never said that RSS 2.0 was only for weblogs. I don't believe it.

Look at the NY Times feeds. It's not a weblog (obviously), and it makes use of several important 2.0-only features.

Here's an example.

May 10, 2003 6:06 PM

Otherwise, based on a quick read, this looks right-on. I esp like that you agree that guid is necessary. It's an immediate upgrade for aggregagors (perhaps subject to user preference). Getting rid of gratuitous differences is obviously a good thing, making aggregators simpler, and helping retire a lot of the needless controversy in RSS space. I hope after this process is complete, that people no longer think of RSS as an embattled space, that it can be seen as something settled and the no-brainer it should be. I'll have more comments in the morning. Thanks, Ben, for participating. Only good will come from this.

May 10, 2003 6:08 PM

I absolutely agree that the profile should be defined as RSS For Weblogs, and not RSS as a whole. RSS is being used for many things right now, despite original intentions, such as search engine queries, learning objects, weather information, GeoURL's near web pages, etc, etc.

Actually, I think I agree with everything else in this post.

dc:subject can't hold all the data that ENT can. Subject support is important, imho.

Ben said:
May 10, 2003 6:22 PM

Dave--You're right, it was poorly-worded. I've fixed it above.

May 10, 2003 6:50 PM

Another advantage of a weblog-specific RSS format will be to search engine crawlers, since it will serve as a good positive ID for weblogs. As Dave points out, many non-blog sites use RSS feeds and it can be a headache to disambiguate between blogs and other kinds of sites programatically.

I would suggest that the "language" attribute be capable of holding multiple values. I've noticed that there is a significant minority of bilingual bloggers, and we don't want to do anything to discourage them! A great example is Emmanuelle Ricard, who writes her posts in both French and English. Bloggers like should be able to indicate that they blog in multiple languages in their RSS feed.

Mean Dean said:
May 10, 2003 9:34 PM

Sure, why not? Isn't the entire XML thing supposed to be about transformation? Isn't that to some degree what's going on with BlogWorks XML?

It would sure make aggregation a snap. No need to dive into autodiscovery.

The question becomes, can those who aren't techbloggers make the leap, both in faith and technology? Perhaps if TypePad renders content in such a format, it would prove it doable at the lowest common denomenator and thereby drive the RSS bus for the rest of us.

Then again, I've been wrong before.

Mean Dean said:
May 10, 2003 9:36 PM

Oh, in case I lost you, what I'm driving at is why not just render our weblogs as RSS and instead build clients apps to present them as we prefer?

Kevin Fox said:
May 10, 2003 9:38 PM

I completely agree Ben. Personally though, I think the real changes will come when RSS aggregators are incorporated into the browser, or will otherwise share cookies with them.

As dynamic as we pretend RSS feeds are, for many legitimate purposes they could still do with more personalization, without having to resort to cut-n-pasting custom parameterized URLs into your aggregator.

Bo said:
May 10, 2003 10:01 PM

This is a bad idea on several levels.

First, if you really want to drive the adoption of RSS then you should make it as simple as possible. The purpose of an RSS profile should be to define an extremely simple, strict subset that ALL RSS feeds must implement. Based on this profile it should be able to write an aggregator in a day. This profile shouldn't even need a specification--developers should be able to pick it up at a glance. This profile should define what the bare minimum of what it means to syndicate data and when developers think 'Hey, I wanna syndicate this data' then their next thought should be to whip out an RSS feed. When you begin introducing things like RDF you kill this dream.

Second, things like FOAF reek of politicking. These extensions to RSS shouldn't be imposed from above they should be decided by the market place. The simple truth is there's a lot of limits to what FOAF can do, it's definitely not the only identity game in town (Liberty, Passport, PingId) and an argument can be made that what it does it doesn't do well, and so dictating the adoption of FOAF seems particularly unfair and simply seems like a way to stifle competition. That being said, it's clear that, if the Big Three (Movable Type, Blogger, and Radio) were to adopt FOAF then it'd pretty much become a de-facto standard.

Lastly, i18n may seem like a good idea but again let me reiterate the point that it's very wrongheaded to assume that RSS only applies to weblogs and will only be consumed by desktop news aggregators. So let RSS be i18nized--and it definitely should be and in fact it'd be pretty trivial to do it with the xml:lang attribute--but don't complicate the format. Producing and consuming RSS needs to be as absolutely easy as possible.

Ben said:
May 10, 2003 10:11 PM

Bo:

1) Anything that we come up with will be simpler than what we have now, what with 2-3 different formats, 2 different ways of representing dates, 3 (4?) different ways of representing full weblog posts, etc. The point of this is to standardize on a subset of what we are currently using, but to do so in a way that we can actually raise the lowest common denominator of what is expected in an RSS feed. The simplest possible spec would be just headline and link, but I don't think anyone wants to stick with that.

2) Yes, FOAF was just an example. But we have to pick *something*--the TMTOWDI approach may work well for Perl modules, but when defining a format, it just creates a complete mess.

3) Re: "it's very wrongheaded to assume that RSS only applies to weblogs" -- of course RSS is not just for weblogs, but the point of the proposal above is that we define a profile (tentatively called "RSS for Weblogs") that *does* just apply to weblogs, because that's the only way we'll ever get beyond headline and link.

As for I18N, I'm talking about using the correct character encoding, making it possible to write and syndicate a multilingual weblog, etc. Sure, producing and consuming RSS needs to be simple, but it also needs to be *correct*, and not just for English speakers.

Mean Dean said:
May 10, 2003 10:26 PM

Look, I'd be all for strict compliance. But think about it, we can't even get users to upgrade their free browsers. Nor can we get web sites to give up deprecated HTML. Now we're going to get everyone to sit up straight and play right with RSS?

Which is why I suggest rendering blogs as RSS by default, providing HTML as a secondary interface for those still nursing a browser dependency. It would be alot easier then to say "this is the RSS for blogging, and this is how you're going to render it if you want to play."

Bo said:
May 10, 2003 10:27 PM

Ben:

I agree, I think a weblog profile makes sense. But I'd request that you first consider a more fundamental profile that would take into consideration scenarios like displaying RSS on extremely limited devices (mobiles, pagers) and the existence of feeds which would both be consumed and produced by machines--humans play no part of the process. When we understand this core profile, then we can create a weblog profile which would be a strict superset. (This core profile doesn't have to be more than item, pubDate, plaintext description, author and guid with the isPermalink attribute).

As for FOAF, I don't think you really have to pick something. People will use what they want to use. I don't think FOAF is a particularly bad technology I just don't think it should be standardized in any sort of profile. That being said, if/when enough people use it, it'll be a de-facto standard--still I feel its poor form to require producers to use such a technology. Identity is one of those things that's still so up in the air it's simply too early to be standardizing on it. This is why I say the market should decide. Until then, we can do with the author's email address.

As for i18n, I'm all for it, I'm just worried that it'd be too invasive. i18n is hard which is why you often find it implemented so poorly or even not at all. Again, you'd have to be careful not to make people jump through hoops to work with RSS or even to require tools to produce it correctly.

May 10, 2003 10:28 PM

Hey Ben - this sounds great.

I agree that this is the simplest model, while providing a clear "Weblog RSS". And OF COURSE they'll be other usages for RSS, but let's at least get the weblog stuff right.

FOAF is fine - for now - to at least signify what the blogger's name, info is. FOAF is not Passport, Liberty or PingID (which is an implementation of Liberty.) All FOAF is - is a way to say "I am me". Nothing more. The Liberty Alliance's standard will enable that "person" to control what aspects of themselves - gets shared. Passport is something different as well. Including FOAF is a no-brainer - thank you.

May 10, 2003 10:53 PM

Just sending words of encouragemnet. If a profile can act as a safeguard against anti-competitive dynamics, it's very worthwhile.

May 11, 2003 2:41 AM

Certainly 'RSS for WebLogs' could ringfence a profile that producers/consumers could say they support (or not) without interfering with any other RSS applications.

An assumption here (by Bo especially) is that using the RDF model somehow has to make things more complex. Nothing could be farther from the truth. The RSS 2.0 syntax could be used for a common profile, but if it's defined with the entities and relationships (item, title etc) mapped to the RDF model then it gains an unambiguous definition from RDF and the simplicity from RSS 2.0.

There is no problem of using FOAF as an extension, as long as any use within the syntax was unambiguously mapped to the RDF model.
I've written up a systematic approach to defining the mappings here : http://purl.org/stuff/ssr

I personally think that done as in RSS 2.0 is a very bad idea. If the item is unique (on the web), then it should have a URI. This is already well-defined as the value of the rdf:about attribute. If there is a requirement to track small modifications to a post, then something like including a previousVersion attribute would IMHO be more appropriate.

May 11, 2003 2:42 AM

That last paragraph should read "I personally think that GUIDs done..." - my markup got eaten.

May 11, 2003 3:24 AM
"Sure, producing and consuming RSS needs to be simple, but it also needs to be *correct*, and not just for English speakers."

Ben, I'd like to express my unconditioned appreciation for your position. I am suffering from incorrect implementations for foreign languages for many years. I can't tell you how many days I spent fixing problems that came from this. I know this is »pain in the ***« for developers. But while it is also for users, its only the developers than could and *should* do somethiing about it.

May 11, 2003 10:01 AM

Ben: You might consider writing up a short summation to go with this, because in reading some of the Trackbacks, it's clear that there are quite a few folks out there who think you're agitating for a reworking of RSS, rather than an industry-specific definition of what our blog tools are expected to produce.

Given all the rancor over RSS, I suppose I can understand the confusion. Maybe further clarification would keep everyone focused on the simple thing you'd like to achieve.

jazer said:
May 11, 2003 12:43 PM

Seeing as most blogs are generated by one of a few tools (I won't attempt to list them, cuz I'd miss one or three), it shouldn't be too much trouble to get even beginning bloggers to adopt an RSS standard for blogs (whatever it is).

May 13, 2003 9:33 AM

I hope I'm not too late to join this party...

I'm all in favour of a standard RSS profile. Following up on what Michael Fagan and Marc Canter have been saying I'm also interested in how taxonomy and topics fits into the picture.

Obviously we would be happy to see ENT become a part of the profile for handling this part of the aggregation process.

For our k-collector application we needed a way to transport topic information via RSS. We documented our approach and published it as an open standard to offer a way for others to collaborate in developing similar applications (or, at least, applications that use similar data).

However if the decision is to use another standard we will gladly support that also.

Regards,

Matt

Mentata said:
May 28, 2003 11:53 AM

Good article and discussion.

Color me an outsider, but as someone who is trying to write a single handler to parse these feeds, I'm faced with the ugly truth: not likely. Aggregators may earn their bread by working out the inconsistencies, but I'm inclined to think it could be easier in a more cooperative space.

Just as web services are more than WS-I, weblogs are more than Radio and Movable Type. They represent a (small) set of requirements destined to be implemented with *many* architectures. Ditto for news feeds. The two are similar enough that they could both use something like RSS for convergence, if only RSS stood for one thing.

You have two customer sets: those who produce feeds and those who consume them. A producer wants it to be easy, a consumer wants it to be rich and structured. Both can be satisfied, but IMHO the status quo falls short.

Leave a Comment