Baby Steps in Data Journalism

Starting from zero, this Tumblr provides tools, links and how-to information for people just beginning to explore data journalism.
Posts tagged "code"

I have to use a Windows 7 computer for a couple of months (I know, pity me, right?). I wanted to start using Python on it, and sure enough, Zed’s sweet instructions worked just fine.

The hard part was Zed’s step 5: “Run your Terminal program. It won’t look like much.” You are ALREADY IN THE TERMINAL PROGRAM when you open the PowerShell, which he already told you to do. Geez.

Next, I got a big fat error message after I typed python and pressed Enter. That is because Python is not installed. Zed has a link for that (in step 6.1). While installing, I carefully accepted all the defaults, changing NOTHING, because I know Windows is like that.

After installing Python 2.7 (oh yes — I had to check whether my Windows computer was 64-bit or not. Open the Control Panel, click System, and look for “System type” there), I had to copy and paste the line Zed supplies in step 6.4. This sets the environment variable so the PowerShell can find Python.

Next, I had to exit and restart PowerShell. 

And then — voilà! I typed python at the command prompt, and I was in Python! 

Thanks, Zed!

Chapters 10 and 11 in Learning Python, 5th edition are the first in section 3, “Statements and Syntax.”

"[Y]ou can make a single statement span across multiple lines. To make this work, you simply have to enclose part of your statement in a bracketed pair—parentheses (()), square brackets ([]), or curly braces ({}). Any code enclosed in these constructs can cross multiple lines … Parentheses are the catchall device …" p. 32

"An older rule also allows for continuation lines when the prior line ends in a backslash … This alternative technique is dated, though, and is frowned on today because it’s difficult to notice and maintain the backslashes.” (my bold)

Introducing the try statement:

while True:
	reply = raw_input('Enter text: ')
	if reply == 'stop': break
		num = int(reply)
		print 'Bad! ' * 8
		print num ** 2
print 'Bye'

Chapter 10 is quite short! Just if, while, and try statements.

Assignments, expressions, and print are covered in chapter 11.

"Tuple assignment leads to a common coding trick in Python":

>>> nudge = 'NUDGE'
>>> wink = 'WINK'
>>> nudge, wink = wink, nudge
>>> nudge
>>> wink

"[W]e must generally have the same number of items on the right as we have variables on the left" — not surprising, but this is an example of how clearly this book explains most things.

This is funky — I had to think it out (p. 344):

>>> L = [1, 2, 3, 4]
>>> while L:
... front, L = L[0], L[1:]
... print(front, L)
1 [2, 3, 4]
2 [3, 4]
3 [4]
4 []

Reserved words:
"because module names in import statements become variables in your scripts, variable name constraints extend to your module filenames too."

Names that begin with a single underscore (_X) are NOT imported by a “from module import xyz” statement.

(Python filenames may begin with an underscore.)

Expression statements:
Expressions are commonly used as statements in two situations:

  • calls to functions and methods
    For functions and methods that do not return a value.
  • printing values at the interactive prompt
    i.e. in Terminal, we don’t need to write “print.”

"[A]lthough expressions can appear as statements in Python, statements cannot be used as expressions."

The rest of chapter 11 is about the print statement, with thorough coverage for both Python 3.x and 2.x (yay!). This is a simple thing:

print x, y   # adds space between
print x + y  # no space

"[W]hereas file write methods write strings to arbitrary files, print writes objects to the stdout stream by default, with some automatic formatting added." p. 358

Taking a break from Python to work on jQuery:

Here is the GitHub repo: jQuery Exercises. Enjoy!

Chapter 9 in Learning Python, 5th edition has this rather unintuitive title, but it’s the end of the section about “Object Types and Operations,” so it’s a kind of catch-all. (Seems to me the tuples could have gone in the previous chapter with lists and dictionaries.)

Tuples: ”They work exactly like lists, except that tuples can’t be changed in place (they’re immutable) and are usually written as a series of items in parentheses, not square brackets.” (p. 276)

You CAN reassign the variable name of a tuple, however — like with everything in Python. So you could have T = (1, 2, 3) and then later change it, with T = (4, 5, 6).

You can change a tuple to a list: L = list(T) 

Or even T = list(T)

"Tuples can also be used in places that lists cannot — for example, as dictionary keys …", which seems useful.

Files: Zed gave me a lot of practice with files (writing, reading) in LPTHW. Not much new on that score.

"Python also includes advanced standard library tools for handling generic object storage (the pickle module), for dealing with packed binary data in files (the struct module), and for processing special types of content such as JSON, XML, and CSV text.” (p. 284)

JSON - Python standard library module

Python XML and CSV modules:

Then there’s a set of review questions (pp. 313-315). Good review. I keep forgetting how index works:

L.index(4) means “get the index number of the number 4 in the list L.”

>>> L = ['apple', 'peach', 'pear', 'strawberry']
>>> L.index('pear')
>>> L[2]

New files added to my python_notes repo:

I didn’t write a new file for demonstrating pickle, because I already did that in (3 weeks ago).

Newly updated with brand-new exercises to help beginners:

This is the PowerPoint I use in my journalism “advanced online” course after students have completed the first three units in this free Code School course.

Here is the GitHub repo for exercises.

The certificate doesn’t mean much — you get it just for watching all 10 videos in the course. However, I learned a lot from Game Development Fundamentals with Python, and I logged it all in a GitHub repo.

Chapter 8 in Learning Python, 5th edition covers these useful objects. I think every programming language I’ve ever looked at has lists, but the dictionaries in Python are new to me. And they are so flexible!

Both lists and dictionaries can contain a mix of strings, numbers, tuples, lists, and dictionaries. Yes, lists can contain lists and dictionaries, and dictionaries can contain lists and dictionaries. Here’s a list that contains three lists:

>>> matrix = [['x', 'o', 'x'],
...     ['x', 'x', 'o'],
...     ['o', 'x', 'x']]
>>> matrix[2][2] = 'M'
>>> matrix
[['x', 'o', 'x'], ['x', 'x', 'o'], ['o', 'x', 'M']]

Both lists and dictionaries are mutable and can be expanded or shortened. You can stuff almost anything into them and extract anything that they contain.

Indexing and slicing both work on lists because a list is a sequence.

A dictionary is not a sequence, so you can’t index or slice. Generally you access the contents of a dictionary via keys:

>>> mydict = {'rumah': 'house', 'duabelas': 12, 'kucing': 'cat'}
>>> mydict['kucing']

Any immutable object can be used as a key. Numbers. Tuples!!

We can use four different syntactic structures to add data to a dictionary:

{'name': 'Bob', 'age': 40}

D = {}
D['name'] = 'Bob' 
D['age'] = 40

dict(name='Bob', age=40) # requires all keys to be strings

dict([('name', 'Bob'), ('age', 40)])

I thought that was pretty cool. This is VERY cool:

>>> list1 = ['a', 'b', 'c']
>>> list2 = ['apple', 'banana', 'cherry']
>>> dict1 = dict(zip(list1, list2))
>>> dict1
{'a': 'apple', 'c': 'cherry', 'b': 'banana'}

Finally, there are list comprehensions and dictionary comprehensions. Here’s an example of a list comprehension:

>>> mylist = ['east','west','north','south']
>>> compass = [s.upper() for s in mylist]
>>> compass

And here is a dictionary comprehension:

>>> dict2 = {c: 'my'+ c.upper() for c in list2}
>>> dict2
{'cherry': 'myCHERRY', 'apple': 'myAPPLE', 'banana': 'myBANANA'}

This was a long chapter.

I love Mike Rugnetta and PBS Idea Channel! 

Some more skimming was possible, as chapter 5 covers numbers and chapter 7 covers strings, and I have been working with them for a while now. Of course, I also discovered new things in Lutz’s very thorough chapters — even though he admits he is not going over everything we can do with numbers or strings in Python.

>>> math.floor(-2.5)
>>> math.trunc(-2.5) 

I also learned more about sets than I had ever known before. They can be useful for comparisons and for getting rid of duplicate items in a list.

Chapter 6 sings the praises of dynamic typing in Python, which gives us lots of flexibility — the type of a variable is not fixed or set or even declared. The variable name points to an object, and the object has a type — NOT the variable. Many Python texts refer to this property, and yet I have never before seen such a sensible and clear explanation of what it really is, how it really works.

You can’t CHANGE a string (it is immutable). To achieve a similar effect, you create a new string — “by concatenating, slicing, running formatting expressions, or using a method call like replace — and then assigning the result back to the original variable name.” (The difference is not really detectable in most cases!)

Immutables: numbers, strings, tuples, frozensets
Mutables: lists, dictionaries, sets, bytearray

Lutz spends an amazingly LONG time talking about format expressions and format method calls and the difference between them. Thanks to Zed Shaw (LPTHW), I am very comfy with format strings (expressions). I was able to follow Lutz’s explication, but it was LONG! (Did I say that already?)

If you are familiar with the format strings (%s, %d, %r, etc.), suffice to say there is a rather different way to do essentially all the same stuff, and it involves use of curly braces — {} — instead.

>>> s = "beam"
>>> s.replace('ea', 'oo')
>>> s = s + " crash "
>>> s
'beam crash '
>>> s.rstrip()
'beam crash'
>>> s[5:]    # slice
'crash '
>>> s.upper()
>>> s.endswith('ash ')
>>> s = "0123456789"
>>> s[::2] # two-step slice
>>> s = "bumblebee"
>>> s[::-1] # reverse order slice


So … these are not surprising things, but I do love Python!

Oh, and: object.method(arguments) … s.find()

>>> breakfast = "We love to eat green eggs and ham."
>>> result = breakfast.find('eggs')
>>> result
21    # position of 'eggs' in breakfast


Love, love Python.

Book: Learning Python, 5th edition 

This chapter (pp. 93-132) gives a really good overview of what Python can do, and how it does it. I think if you are familiar with some other programming language(s), like C or C++, you would pretty much think Python is amazing after you read this, and you would want to learn more. On the other hand, if you are a relative newcomer to programming, you would probably be very confused and starting to think this is not a good book for you. (Note: The author says up front that this is not a “how to learn programming” book.)

I fall in the middle of those two types, although decidedly more toward the beginner. Yet I have been working on learning Python for more than a year, so I was not daunted by chapter 4. Also, I learned some new things that made me feel even more excited about Python.

In particular, nesting within both lists and dictionaries. OMG! You can really travel all over a table format, go in and out of JSON or other tabular data formats, change or find anything.

I learned that these two are the same, which I thought was pretty cool:


squares = [x ** 2 for x in [1, 2, 3, 4, 5]]


squares = []
for x in [1, 2, 3, 4, 5]:
    squares.append(x ** 2)

Book: Learning Python, 5th edition

I was able to do a lot of skimming here, so I read the first 100 pages in one day. Definitely NOT for people who are new to programming. The author, Mark Lutz, takes a very thorough approach to Python, very systematic. In chapter 4 we are finally getting our hands on some Python, but mixed in with introductory material is what seems to me to be some rather advanced material.

I’m torn between loving the quality (and detail and organization) and hating the stuff that makes me think, “Really? Do I have to know this right now?”

Still, it’s great to read explanations from someone who seems super-comfortable with Python and how it compares to other languages.

Book: Learning Python, 5th edition

Today I started reading:


See the book’s Web page.

Author’s Web page.

Yes, there are literally HUNDREDS of books about Python. I’m a pretty selective book buyer when it comes to code books. I love O’Reilly’s DRM-free digital books. I’m reading this one in the PDF format (I just like it better than the e-book formats, and it’s easy to hop over to Terminal and try some code when everything’s on one screen with a keyboard).

This book has 41 chapters and 1,420 pages, NOT counting the appendices. I’m impressed by the author’s long experience with using and training people to use Python (20 years!). I like the numerous code samples and the thoroughness of the book.

I also like it very much that it covers both Python 2.7 and 3.3, pointing out differences.