Pity the poor stenographers charged with transcribing the ongoing debate in the House of Commons over the Harper’s government’s back-to-work legislation.
With the NDP filibustering the legislation to force locked-out postal workers to accept an imposed settlement, the debate is now into its 20th hour. The official transcript of the debate, Hansard, won’t be available until it concludes. But the “Blues” drafts that are circulated to press gallery members show that the stenographers have been up just as long as the MPs.Their work has given us not only a permanent record of the debate, but also a massive data set to play with.
The last available version of blues runs until about 6 am Friday. I pulled this block of text into the open-source MySQL database manager to analyze.
A quickie word count shows that about 97,000 words had been spoken during the debate between 3 pm Thursday and 6 am Friday morning.
Some members, of course, spoke more than others. Here’s the rankings for the top-20 speakers during the debate, ranked by words spoken:
Hon. Jack Layton (NDP) | 7,114 |
Hon. Steven Fletcher (CPC) | 6,899 |
M. Yvon Godin (NDP) | 5,187 |
Mr. Peter Julian (NDP) | 4,906 |
M. Claude Patry (NDP) | 4,531 |
Mr. Kevin Lamoureux (LPC) | 4,023 |
Ms. Chris Charlton (NDP) | 3,965 |
Mr. Don Davies (NDP) | 3,918 |
Mr. Rodger Cuzner (LPC) | 3,761 |
Mr. Brad Butt (CPC) | 3,524 |
Mr. Robert Chisholm (NDP) | 3,455 |
Mr. Matthew Kellway (NDP) | 3,149 |
M. Philip Toone (NDP) | 3,050 |
Mr. Phil McColeman (CPC) | 2,672 |
Mr. Pat Martin (NDP) | 2,366 |
M. Alexandre Boulerice (NDP) | 2,323 |
Mr. Larry Miller (CPC) | 2,154 |
Mr. Jack Harris (NDP) | 2,123 |
Mr. Charlie Angus (NDP) | 2,076 |
Mr. Devinder Shory (CPC) | 2,052 |
I was also curious about how the MPs were managing to keep talking so long into the night on a rather narrow topic. Rather than read the entire transcript, I turned to an online text analysis tool called OpenCalais, which I learned about at a data journalism conference in the U.S. earlier this year.
OpenCalais will perform a task called “entity extraction,” which pulls out names of people, places, organizations and companies. A nice feature of OpenCalais for data journalists is that it can be accessed by an API and there’s a Python wrapper for it, so you can write a little script to upload thousands of lines of text individually to the service and it will capture the results of the analysis.
These are some of the entities extracted from the Canada Post debate that were referenced only once during the debate. Not sure what it means, other than to illustrate the need to digress to keep filibuster going, but I found it mildly amusing.
Denmark |
Sweden |
Alcan |
An Act Respecting Queen’s University |
Billy Graham Evangelistic Association of Canada |
Copenhagen |
Falconbridge |
FedEx |
first Postal Clerks Association |
gas price gouging |
Haiti |
Hans Island |
Heart and Stroke Foundation |
Le monde |
Leeds |
Libya |
McDonald’s restaurant |
New Carlisle |
Norway |
Sackville |
Shippagan |
Tragualishalow |
Wag The Dog |
Isn’t it a bit premature to post this while they’re still in session? Good read, though.
Thanks for a fascinating analysis of the filibuster in real time. Also for mentioning Open Calais, which I am going to check out.
I hope when all is said and done that you’ve give us a final analysis.
Daly