Just days after Treasury Board President Stockwell Day vowed the federal government would make more data freely available through its new portal, data.gc.ca, along comes an interesting data set from the office of the federal ethics commissioner, Mary Dawson.
Her office released its annual list of Members of Parliament sponsored travel. It shows which organizations paid to send MPs and their staff and spouses on overseas junkets. Taiwan and Israel are typically the big spenders. Trips to Taipei and Jerusalem are a long-standing perk for Hill people.
Journalists would like to know which organizations spent the most this year, and which MPs took the most travel. That’s exactly the sort of thing open data is supposed to let us do. And it’s easy if we have the data in a workable format.
Unfortunately, the Ethics Commissioner releases the lengthy list in a PDF format that is utterly impossible to analyze. As anyone who works with data regularly will tell you, PDFs are an enormous pain in the ass, especially when they’re formatted in the manner of sponsored travel list.
Some specialized software can sometimes decode PDFs and turn them into text files or Excel files, but the results are rarely perfect. It’s an on-going headache for data journalists. (I recently had a department release a 7 million record data set in a series of PDFs, even though I’ve previously requested the same data in text format).
I’ve asked the Ethics Commissioner’s media staff for an Excel version but I’m not holding my breath. I’ll post it here if I get it.
UPDATE: Ethics Commissioner spokesperson Jocelyne Brisbois responds to my request, and repeats a familiar excuse used by governments to deny open data requests — the integrity of the document, as if it could be altered or tampered with if released:
Unfortunately at this time we cannot assist you with your request. We make electronic data accessible by making it available in PDF format on our website. Documents are not available in Word or Excel format, particularly when there are risks involved in maintaining the integrity of the document. We are always trying to ensure information is quickly and readily accessible; PDF allows for accessibility while ensuring the integrity of an official document. The public registry is searchable and you can access information by name should you wish to do so.
We are not participating in the Government’s Open Data Pilot Project to which you refer since we are a Parliamentary entity. The pilot project is for participating departments in the executive branch. However, we are monitoring the project with interest.
I hope that this information is of assistance to you.