Baby Steps in Data Journalism

Starting from zero, this Tumblr provides tools, links and how-to information for people just beginning to explore data journalism.
Posts tagged "excel"

In a graduate course about tools for making journalism, I gave the students these two assignments:

Data 1: Use a CSV file and Excel to make a chart

Data 2: Maps and Google Fusion Tables

Both assignments include links to handouts and other resources.

The idea was to introduce the students to how data are used in journalism, and what can be done with big data sets. For each exercise, each student was assigned a different data set to use in a structured task.

Lisa Williams notes the difficulties of working with a CSV dataset in Excel when there are too many records and Excel keeps crashing. And she found a solution!


Over the past couple of days, I’ve been playing with data from the Hubway Data Challenge. Hubway is a bikeshare program in Boston, Massachusetts, and they have data on every trip ever taken with their rental bicycles, which are stationed at 65 automated bike racks around the city.

In the year…

To try it out for yourself, just download the worksheet, open it with Excel 2010, and click on any of the colored pills in the section to the lower right under the “Global Filters” header (you can hold down Ctrl to select multiple attributes and press Alt + C or click the little icon in the upper right of the pill grouping to clear the filter).

From Ben Jones’s blog, DataRemixed.

Last weekend I was looking for ways to extract Twitter search data in a structured, easily manageable format. The two APIs I was using (Twitter Search and Backtweets) were giving good results – but as a non-developer I couldn’t do much with the raw data they returned. Instead, I needed to get the data into a format like CSV or XLS.

Some extensive Googling led me to this extremely useful post on Labnol, where I learnt about how to use the ImportXML function in Google Spreadsheets. Before too long I’d cracked my problem. In this post I’m going to explain how you can do it too.

Click the link to learn how!

One night on Twitter …  a few dozen people joined me in a conversation about computer programming and its place in the journalism curriculum. Here are selected tweets …

This is a great (free!!!) site (from Shan Carter of The New York Times) for converting Excel data to other formats, such as XML.

I used this site last year in a journalism course in which I taught XML and ActionScript. You can download a PDF from my page showing examples of XML used in Adobe Flash. The PDF was used to walk students through the steps of taking an MS Word document into Excel, cleaning data, and then creating an XML file with Mr. Data Converter. After doing that, the students used AS3 to read the XML into Flash — but you don’t need to use Flash to understand the Word > Excel > XML conversion process.

Results from Data Scraping

Okay, this is even better than the first one. I modified Nathan’s script to scrape both the maximum and minimum temperatures for 365 days (meaning 365 Web pages!) and dumped them into one comma-delimited text file. Then I imported it into Excel to make this graph. I just used the Excel chart tools to make it (Excel for Mac 2011).

Python (partial):

      # Get temperature from page
      soup = BeautifulSoup(page)
      # maxTemp = soup.body.nobr.b.string
      maxTemp = soup.findAll(attrs={"class":"nobr"})[5].span.string
      minTemp = soup.findAll(attrs={"class":"nobr"})[8].span.string
      # Above I added a scrape for lowest temperature too 

Results from Data Scraping

So I’m pretty happy with today’s work: In a little less than 3 hours (including blogging about all this and looking up lots of related stuff), I was able to use Python to scrape 365 Web pages and export a comma-delimited file of the maximum recorded temperature for every day in 2011 for Gainesville, Florida.

I opened the file with Excel and used the built-in chart tools to create the graphic above, which is quite simple — but it’s showing all the data from that scrape! So cool!

You can view a Google Spreadsheets version > here.