Baby Steps in Data Journalism

Starting from zero, this Tumblr provides tools, links and how-to information for people just beginning to explore data journalism.
Recent Tweets @macloo

p. 31 DOES NOT WORK:

urllib2.urlopen(“www.wunderground.com/history/airport/KBUF/2009/1/1/DailyHistory.html”)

DOES WORK:

urllib2.urlopen(“http://www.wunderground.com/history/airport/KBUF/2009/1/1/DailyHistory.html”)

p. 32 DOES NOT WORK:

from BeautifulSoup import BeautifulSoup

DOES WORK:

from bs4 import BeautifulSoup

p. 33 FURTHER EXPLANATION:

After you have found that the value you want (maximum temperature, which is 26°F) is enclosed by span tags with class=”nobr”, you need to know how to find out WHICH class=”nobr” you will be scraping. Nathan tells you it’s nobrs[5] … but how can you find that number (5) for yourself? (I will assume you know how arrays work.)

  1. View Source on the HTML page you want to scrape.
  2. Command-F to find text in the source.
  3. Type (in this case) the class you’re seeking: nobr
  4. Find repeatedly and count until you reach the maximum temperature value (26°F).* On the example page, you will have counted to 6. Why then does Nathan tell us to use 5? Because items in an array are numbered starting at 0. So the first item in your array named nobrs would be nobrs[0], and the sixth item is nobrs[5].

Memo to self: Journalism students are not likely to understand what an array is and how it works.

*You may see the temperatures in °C, depending on which country you’re in.