Shuffling Atom with XML::Atom::Filter
Atom is a standardized format for describing weblog content. With a standard format, you can exchange weblog content not only between web services but between command line tools--that's the idea behind some Java programs called "atomflow." To the same end as the atomflow project, I'm delighted to present a Perl module for creating command line Atom tools. I'm using it to post Flickr photos to a TypePad sidebar, but I'm sure you'll have many more ideas for it.
XML::Atom::Filter
The Atom format is being designed as a lingua franca of weblog content. Recognizing that a standard format makes meaningful integration between software programs easier, Diego Doval, Matt Webb, and Ben Hammersley invented atomflow at EuroFoo last year. The core idea in atomflow is using Atom not only for aggregation and posting to weblogs, but to exchange web content between command line tools; that is, wrapping the UNIX and Web ends of “small pieces, loosely joined” around to meet again.
To help myself build atomflow style tools, I wrote XML::Atom::Filter, a Perl module to build command line Atom processing tools around the XML::Atom library.
Other ingredients
As existing Atom content will probably be on the web, you'll find yourself downloading feeds from the command line. While you could use a simple program like the standard GET command to download Atom feeds, keep in mind you should use a tool that supports conditional HTTP GET such as Diego Doval's atomflow-spider or DJ Adams' Perl script to honor ETags with wget.
For some uses such as posting to a blog or sidebar, you'll also want to prevent reposts. For my use (and in the example atompost.pl in the XML::Atom::Filter package), I used Perl's Storable module to keep a list of what Atom entry IDs were previously posted to what Atom endpoint URIs. That's a fairly brute force solution; the atomflow storage system takes duplicates in stride, so you might store the entries in atomflow then retrieve the new ones by creation date.
When writing an XML::Atom::Filter script, keep in mind that Atom entry IDs are supposed to be “universally unique.” Transformative processes like flickr-nice and atom2linkTL should modify the IDs too, to show that the content is different. In my scripts, I prepended a tag: URI prefix so not only were the IDs unique, they were the same each time the process is performed:
## Keep the ID unique from the unprocessed entry.
my $id = $e->get($atomNS, 'id');
$e->set($atomNS, 'id', 'tag:markpasc.org,2004,atom2linkTL:'. $id);
Also keep in mind that, if you're fetching and posting web resources as automated processes, you'll probably be saving passwords for web sites in your scripts. If you're running your scripts on a shared server, make sure you set the permissions on the scripts so that only you can read them. For posting to Movable Type blogs, you might also set up a second account that can only post to that blog, so you don't even have to keep the password to your full account there.
Example: A feed TypeList
The main itch I was scratching with XML::Atom::Filter was to post Flickr photos tagged nintendo ds to the sidebar of my Nintendo DS blog. Flickr provides the former as an Atom feed, and you can post to TypeLists with Atom, so gluing the two together was quick work. There were four parts to the task:
- Get the photos as an Atom feed from Flickr.
- Clean up the Flickr content for sidebar use. Flickr's Atom feeds have descriptions and rather large thumbnails, so I decided to convert the feed to just titles and thumbnails, and shrink the thumbnails' display a bit with HTML.
- Fill in the Link TypeList extension fields. Because TypePad supports special fields for Link TypeList entries, I wanted to fill those fields in with the content from the Flickr feed for better display on the blog.
- Post the new entries to my TypeList.
As noted above, there are several ways to grab an Atom feed from the command line. I had to implement the other steps as Perl scripts with XML::Atom::Filter. (The last step, posting entries over Atom, is provided in the XML::Atom::Filter package as an example script.) Now that I have, I can recombine them in any order or context to perform that specific task.
To post nintendo ds photos to my DS blog's sidebar, that order is:
wget.pl 'http://flickr.com/services/feeds/photos_public.gne? tags=nintendods&format=atom_03' | ./flickr-nice.pl | ./atom2linkTL.pl | ./atompost.pl 'http://www.typepad.com/t/atom/lists/list_id=150635' markpasc password
If instead I wanted to post the Flickr photos to a Movable Type weblog (and didn't want to use one of these plugins), I could set up another weblog to display as a sidebar to the other then write:
wget.pl 'http://flickr.com/services/feeds/photos_public.gne? tags=nintendods&format=atom_03' | ./flickr-nice.pl | ./atompost.pl 'http://example.org/mt/mt-atom.cgi/weblog/blog_id=12' markpasc 'atom authentication token'
Example: personal aggregation
You can also use XML::Atom::Filter to build a personal profile page from many web services. (Services like Foopad promise to make this kind of thing easier, but you can DIY with a system like this.)
I set up one blog with a category for each blog and service I wanted to syndicate, then wrote an XML::Atom::Filter script, categorize.pl, to add a category from the command line to each entry in the feed. When the newly categorized entries are posted to the Movable Type blog, they're ready to be placed in the right place by the templates.
wget.pl 'http://www.livejournal.com/users/markpasc/data/atom' | ./categorize.pl 'lj' | ./atompost.pl $POSTURL profileposter $POSTPW
wget.pl "http://rss.netflix.com/TrackingRSS?id=$NETFLIXFEEDID" | ./rss2atom.pl | ./netflix-nice.pl | ./categorize.pl 'netflix' | ./atompost.pl $POSTURL profileposter $POSTPW
You can then use the category attribute on the <MTEntries> tag to select which feeds go where in your templates. For example, you might put your regular weblog posts in the content column, but your del.icio.us links in a sidebar. (I used the category descriptions for site URLs.)
The only problem I had was keeping the URL of the original link. Movable Type entries don't yet have a native field for an external link, so I had to abuse the excerpt field for that by copying the entries' links into the summary field before posting them.
If you do the same, then using template code like:
<div class="sidebar">
<MTEntries category="delicious OR netflix">
<div class="<MTEntryCategory>">
<h2><a href="<MTEntryExcerpt>"><MTEntryTitle></a></h2>
<p><MTEntryBody></p>
</div>
</MTEntries>
</div>
Can make something a lot like this, and syndicate all your content in one place for an easy glance of (almost) all your web output.
A tool for you
The atomflow idea is a useful one: Atom is merely a tool for transferring weblog entry data between programs, and that goes for single purpose command line tools as much as it does web applications. I hope this example shows how powerful the interoperability of reading and writing tools speaking the same extensible format can be.



3 Comments
Cool another great tutorial big thanks for author. Greetings
Thank you again Mark for this great tut, was very useful for me!
Hey Mark,
I'm trying to write an app in Ruby on Rails to post from my localhost to my Typepad account.
I have no trouble authenticating and connecting to my account and GETting information but POSTing has been unsuccessful so far, despite using the format provided on the Typepad Atom API page.
I keep getting an Internal Server 500 error which says the stream I am sending isn't a proper XML stream. I tried sending XML code as a String and also as XML using the builder gem but without success.
Could you guide me with the formation of a proper xml entry?
Much appreciated if you do! thanks!