Six Apart News & Events

hCard Hacking in Perl

Microformats Logo

This morning one of our interns finished up one of his hackathon projects (every Wednesday is hack day -- our coders spend all day working on projects that they think are interesting, valuable, or just plain cool. And yes, we are hiring) which is a Perl library to create new hCards and parse existing ones from around the web. The Microformat hCard describes how to represent people, companies, organizations, and places by using a 1:1 representation of vCard properties and values in HTML. When developing this library, we focused on making sure that it worked with hCards in the wild -- including those that might not fully follow the specification -- and making sure that you didn't need to already know how hCard worked to make use of the library.

Some more information from the library's README:

This module handles three existing specifications from Microformats.org:

  1. hCard -- http://microformats.org/wiki/hcard
  2. adr -- http://microformats.org/wiki/adr
  3. geo -- http://microformats.org/wiki/geo

Each of them can be used on their own (hCard uses adr and geo to parse addresses and geolocations, but adr and geo have no dependencies on any others, and hCard doesn't need either unless the corresponding elements appear in an hCard), though the primary appearance of adr and geo "in the wild" is as subparts of hCards.

This module exists both to parse existing hCards from web pages, and to create new hCards so that they can be put onto the Internet.

To use it to parse an existing hCard (or hCards), simply give it the content of the page containing them (there is no need to first eliminate extraneous content, as the module will handle that itself):

my $card = Data::Microformat::hCard->parse($content);

If you would like to get all the hCards on the webpage, simply ask using an array:

my @cards = Data::Microformat::hCard->parse($content);

The module respects nested hCards using the parsing rules defined in the spec, so if one hCard contains another, it will return one hCard with the other held in the relevant subpart, rather than two top-level hCards.

To create a new hCard, first create the new object:

my $card = Data::Microformat::hCard->new;

Then use the helper methods to add any data you would like. When you're ready to output the hCard, simply write:

my $output = $card->to_hcard;

And $output will be filled with an hCard representation, using tags exclusively with the relevant class names.

If you would like to have the parser determine the representative hCard for a page, simply pass the page's URL as an additional parameter to the parse or from_tree methods, and the appropriate property will be found if it can be determined.

Check it out on CPAN and let us know what you think.

Comments
Leave a Comment