Six Apart News & Events

hCard Hacking in Perl

Microformats Logo

This morning one of our interns finished up one of his hackathon projects (every Wednesday is hack day -- our coders spend all day working on projects that they think are interesting, valuable, or just plain cool. And yes, we are hiring) which is a Perl library to create new hCards and parse existing ones from around the web. The Microformat hCard describes how to represent people, companies, organizations, and places by using a 1:1 representation of vCard properties and values in HTML. When developing this library, we focused on making sure that it worked with hCards in the wild -- including those that might not fully follow the specification -- and making sure that you didn't need to already know how hCard worked to make use of the library.

Some more information from the library's README:

This module handles three existing specifications from Microformats.org:

  1. hCard -- http://microformats.org/wiki/hcard
  2. adr -- http://microformats.org/wiki/adr
  3. geo -- http://microformats.org/wiki/geo

Each of them can be used on their own (hCard uses adr and geo to parse addresses and geolocations, but adr and geo have no dependencies on any others, and hCard doesn't need either unless the corresponding elements appear in an hCard), though the primary appearance of adr and geo "in the wild" is as subparts of hCards.

This module exists both to parse existing hCards from web pages, and to create new hCards so that they can be put onto the Internet.

To use it to parse an existing hCard (or hCards), simply give it the content of the page containing them (there is no need to first eliminate extraneous content, as the module will handle that itself):

my $card = Data::Microformat::hCard->parse($content);

If you would like to get all the hCards on the webpage, simply ask using an array:

my @cards = Data::Microformat::hCard->parse($content);

The module respects nested hCards using the parsing rules defined in the spec, so if one hCard contains another, it will return one hCard with the other held in the relevant subpart, rather than two top-level hCards.

To create a new hCard, first create the new object:

my $card = Data::Microformat::hCard->new;

Then use the helper methods to add any data you would like. When you're ready to output the hCard, simply write:

my $output = $card->to_hcard;

And $output will be filled with an hCard representation, using tags exclusively with the relevant class names.

If you would like to have the parser determine the representative hCard for a page, simply pass the page's URL as an additional parameter to the parse or from_tree methods, and the appropriate property will be found if it can be determined.

Check it out on CPAN and let us know what you think.

1 Comments
December 6, 2008 1:25 PM

My name is Etienne Taylor and I am the CEO of a small nonprofit, Clinical Trial Semantics Incorporated, we have been incubated at the American Cancer Society facility here in Oakland, CA and have (over the last eight years) been entirely focused on the problem of clinical trial access in the Untied States.

We have developed a clinical trial web service that we have trying to deploy within Movable Type.

Recently, it has become clear that cloning or xml mapping of over 25,000 blogs might be better accomplished through OpenSocial templating techniques, deploying MT within the OpenSocial gadget wrapper. (We have third hands news through bug reporting from SpikeSource that cloning 25,000+ blogs isn't practical. We would love to know if this is incorrect. )

Our friends at SpikeSource were good enough to arrange a meeting a few months ago with Chris Alden and Ed Anuff and although we came away excited about the prospects of our two companies working together we have some trouble getting back in touch at the executive level. (We understand just how busy things have been)

We are fortunate enough to have a core group of developers that include the original data architect for HL7 CDA (The electronic medical record document format) and one the most respected perl developers in the country.

Therefore in an effort solve a problem in meatspace we have solved in the lab (the problem of clinical trial access) we would like to deploy our own clinical trial dial tone web service as a social application that takes advantage of Six Apart's social networking domain expertise in an opensocial gadget ruining in a perl version of the shindig opensocial container we would develop and/or co-develop.

We intend to contribute the results of this effort back to the community, and would be very interested in speaking to someone at Six Apart about how we might work together not just as Six Apart customers but in common cause with our friends and neighbors in deploying the solution to a problem costs more lives each year in our country then car accidents do.

This problem does not need to continue, rather the solution needs to be deployed.

If you have got this far reading my post we are asking for your help.

Respectfully,

Etienne Taylor (etaylor at ctrx dot org)
CEO
Clinical Trial Semantics Incorporated
American Cancer Society
California Clinical Trials Collaboration
500 12th Street suite 320
Oakland, CA 94607

Leave a Comment