Baby Steps in Data Journalism

Starting from zero, this Tumblr provides tools, links and how-to information for people just beginning to explore data journalism.

I installed GitHub for Windows on a university desktop computer, which is on a university’s hard-wired Internet/LAN. In many cases, including mine, that means there is a proxy server between my computer and the real Internet. (Download GitHub for Windows, free.)

That turned out to be the cause of my problem: When I tried to clone a repo using GitHub for Windows, or using the GitHub website, it would not work. When trying to clone with GitHub for Windows, the process always stopped at 9% of download and eventually gave me this message:

Failed to clone the repository …
Please check your internet connection.

What’s needed: You tell Git the address of the proxy server, and your username and password in the university system.

Here’s how:

First, find the URL (address) and port number of the proxy server.

How to find information about your proxy address (on Windows)

  1. Open Control Panel.
  2. Select Internet Options.
  3. Click the Connections tab.
  4. Click the LAN settings button.
  5. Copy and paste these two items into a plain text file:
    Address
    Port

Using your own login (username) and password (passw) for that network or computer, fill in the following

http://username:passw@www.secdomain.domain:8080

Then open PowerShell (Windows) and paste in the following (with your own info) ALL ON ONE LINE:

git config --global http.proxy http://username:passw@www.secdomain.domain:8080

NOTE 1: Make sure you have:

config [space] [hyphen] [hyphen] global

There is NO space between the hyphens and global.

NOTE 2: The stuff after the @ is the two items you copied from LAN settings in Internet Options (above).

NOTE 3: After I had installed GitHub for Windows on this Windows computer, I also installed the Git SCM. Get it here:

msysgit.github.io

Just download the installer .exe file and run it! 

I have to use a Windows 7 computer for a couple of months (I know, pity me, right?). I wanted to start using Python on it, and sure enough, Zed’s sweet instructions worked just fine.

The hard part was Zed’s step 5: “Run your Terminal program. It won’t look like much.” You are ALREADY IN THE TERMINAL PROGRAM when you open the PowerShell, which he already told you to do. Geez.

Next, I got a big fat error message after I typed python and pressed Enter. That is because Python is not installed. Zed has a link for that (in step 6.1). While installing, I carefully accepted all the defaults, changing NOTHING, because I know Windows is like that.

After installing Python 2.7 (oh yes — I had to check whether my Windows computer was 64-bit or not. Open the Control Panel, click System, and look for “System type” there), I had to copy and paste the line Zed supplies in step 6.4. This sets the environment variable so the PowerShell can find Python.

Next, I had to exit and restart PowerShell. 

And then — voilà! I typed python at the command prompt, and I was in Python! 

Thanks, Zed!

Chapters 10 and 11 in Learning Python, 5th edition are the first in section 3, “Statements and Syntax.”

"[Y]ou can make a single statement span across multiple lines. To make this work, you simply have to enclose part of your statement in a bracketed pair—parentheses (()), square brackets ([]), or curly braces ({}). Any code enclosed in these constructs can cross multiple lines … Parentheses are the catchall device …" p. 32

"An older rule also allows for continuation lines when the prior line ends in a backslash … This alternative technique is dated, though, and is frowned on today because it’s difficult to notice and maintain the backslashes.” (my bold)

Introducing the try statement:

while True:
	reply = raw_input('Enter text: ')
	if reply == 'stop': break
	try:
		num = int(reply)
	except:
		print 'Bad! ' * 8
	else:
		print num ** 2
print 'Bye'

Chapter 10 is quite short! Just if, while, and try statements.

Assignments, expressions, and print are covered in chapter 11.

"Tuple assignment leads to a common coding trick in Python":

>>> nudge = 'NUDGE'
>>> wink = 'WINK'
>>> nudge, wink = wink, nudge
>>> nudge
'WINK'
>>> wink
'NUDGE'

"[W]e must generally have the same number of items on the right as we have variables on the left" — not surprising, but this is an example of how clearly this book explains most things.

This is funky — I had to think it out (p. 344):

>>> L = [1, 2, 3, 4]
>>> while L:
... front, L = L[0], L[1:]
... print(front, L)
...
1 [2, 3, 4]
2 [3, 4]
3 [4]
4 []

Reserved words:
"because module names in import statements become variables in your scripts, variable name constraints extend to your module filenames too."

Note:
Names that begin with a single underscore (_X) are NOT imported by a “from module import xyz” statement.

(Python filenames may begin with an underscore.)

Expression statements:
Expressions are commonly used as statements in two situations:

  • calls to functions and methods
    For functions and methods that do not return a value.
  • printing values at the interactive prompt
    i.e. in Terminal, we don’t need to write “print.”

"[A]lthough expressions can appear as statements in Python, statements cannot be used as expressions."

The rest of chapter 11 is about the print statement, with thorough coverage for both Python 3.x and 2.x (yay!). This is a simple thing:

print x, y   # adds space between
print x + y  # no space

"[W]hereas file write methods write strings to arbitrary files, print writes objects to the stdout stream by default, with some automatic formatting added." p. 358

Taking a break from Python to work on jQuery:


Here is the GitHub repo: jQuery Exercises. Enjoy!

Chapter 9 in Learning Python, 5th edition has this rather unintuitive title, but it’s the end of the section about “Object Types and Operations,” so it’s a kind of catch-all. (Seems to me the tuples could have gone in the previous chapter with lists and dictionaries.)

Tuples: ”They work exactly like lists, except that tuples can’t be changed in place (they’re immutable) and are usually written as a series of items in parentheses, not square brackets.” (p. 276)

You CAN reassign the variable name of a tuple, however — like with everything in Python. So you could have T = (1, 2, 3) and then later change it, with T = (4, 5, 6).

You can change a tuple to a list: L = list(T) 

Or even T = list(T)

"Tuples can also be used in places that lists cannot — for example, as dictionary keys …", which seems useful.

Files: Zed gave me a lot of practice with files (writing, reading) in LPTHW. Not much new on that score.

"Python also includes advanced standard library tools for handling generic object storage (the pickle module), for dealing with packed binary data in files (the struct module), and for processing special types of content such as JSON, XML, and CSV text.” (p. 284)

JSON - Python standard library module
http://docs.python.org/2.7/library/json.html

Python XML and CSV modules:
http://docs.python.org/2/library/xml.html
http://docs.python.org/2/library/csv.html

Then there’s a set of review questions (pp. 313-315). Good review. I keep forgetting how index works:

L.index(4) means “get the index number of the number 4 in the list L.”

>>> L = ['apple', 'peach', 'pear', 'strawberry']
>>> L.index('pear')
2
>>> L[2]
'pear'

New files added to my python_notes repo:

I didn’t write a new file for demonstrating pickle, because I already did that in useful_modules.py (3 weeks ago).

Newly updated with brand-new exercises to help beginners:

This is the PowerPoint I use in my journalism “advanced online” course after students have completed the first three units in this free Code School course.

Here is the GitHub repo for exercises.

The certificate doesn’t mean much — you get it just for watching all 10 videos in the course. However, I learned a lot from Game Development Fundamentals with Python, and I logged it all in a GitHub repo.

Chapter 8 in Learning Python, 5th edition covers these useful objects. I think every programming language I’ve ever looked at has lists, but the dictionaries in Python are new to me. And they are so flexible!

Both lists and dictionaries can contain a mix of strings, numbers, tuples, lists, and dictionaries. Yes, lists can contain lists and dictionaries, and dictionaries can contain lists and dictionaries. Here’s a list that contains three lists:

>>> matrix = [['x', 'o', 'x'],
...     ['x', 'x', 'o'],
...     ['o', 'x', 'x']]
>>> matrix[2][2] = 'M'
>>> matrix
[['x', 'o', 'x'], ['x', 'x', 'o'], ['o', 'x', 'M']]

Both lists and dictionaries are mutable and can be expanded or shortened. You can stuff almost anything into them and extract anything that they contain.

Indexing and slicing both work on lists because a list is a sequence.

A dictionary is not a sequence, so you can’t index or slice. Generally you access the contents of a dictionary via keys:

>>> mydict = {'rumah': 'house', 'duabelas': 12, 'kucing': 'cat'}
>>> mydict['kucing']
'cat'

Any immutable object can be used as a key. Numbers. Tuples!!

We can use four different syntactic structures to add data to a dictionary:

{'name': 'Bob', 'age': 40}

D = {}
D['name'] = 'Bob' 
D['age'] = 40

dict(name='Bob', age=40) # requires all keys to be strings

dict([('name', 'Bob'), ('age', 40)])


I thought that was pretty cool. This is VERY cool:

>>> list1 = ['a', 'b', 'c']
>>> list2 = ['apple', 'banana', 'cherry']
>>> dict1 = dict(zip(list1, list2))
>>> dict1
{'a': 'apple', 'c': 'cherry', 'b': 'banana'}


Finally, there are list comprehensions and dictionary comprehensions. Here’s an example of a list comprehension:

>>> mylist = ['east','west','north','south']
>>> compass = [s.upper() for s in mylist]
>>> compass
['EAST', 'WEST', 'NORTH', 'SOUTH']

And here is a dictionary comprehension:

>>> dict2 = {c: 'my'+ c.upper() for c in list2}
>>> dict2
{'cherry': 'myCHERRY', 'apple': 'myAPPLE', 'banana': 'myBANANA'}

This was a long chapter.

I love Mike Rugnetta and PBS Idea Channel! 

Some more skimming was possible, as chapter 5 covers numbers and chapter 7 covers strings, and I have been working with them for a while now. Of course, I also discovered new things in Lutz’s very thorough chapters — even though he admits he is not going over everything we can do with numbers or strings in Python.

>>> math.floor(-2.5)
-3
>>> math.trunc(-2.5) 
-2
    

I also learned more about sets than I had ever known before. They can be useful for comparisons and for getting rid of duplicate items in a list.

Chapter 6 sings the praises of dynamic typing in Python, which gives us lots of flexibility — the type of a variable is not fixed or set or even declared. The variable name points to an object, and the object has a type — NOT the variable. Many Python texts refer to this property, and yet I have never before seen such a sensible and clear explanation of what it really is, how it really works.

You can’t CHANGE a string (it is immutable). To achieve a similar effect, you create a new string — “by concatenating, slicing, running formatting expressions, or using a method call like replace — and then assigning the result back to the original variable name.” (The difference is not really detectable in most cases!)

Immutables: numbers, strings, tuples, frozensets
Mutables: lists, dictionaries, sets, bytearray

Lutz spends an amazingly LONG time talking about format expressions and format method calls and the difference between them. Thanks to Zed Shaw (LPTHW), I am very comfy with format strings (expressions). I was able to follow Lutz’s explication, but it was LONG! (Did I say that already?)

If you are familiar with the format strings (%s, %d, %r, etc.), suffice to say there is a rather different way to do essentially all the same stuff, and it involves use of curly braces — {} — instead.

>>> s = "beam"
>>> s.replace('ea', 'oo')
'boom'
>>> s = s + " crash "
>>> s
'beam crash '
>>> s.rstrip()
'beam crash'
>>> s[5:]    # slice
'crash '
>>> s.upper()
'BEAM CRASH '
>>> s.endswith('ash ')
True
>>> s = "0123456789"
>>> s[::2] # two-step slice
'02468'
>>> s = "bumblebee"
>>> s[::-1] # reverse order slice
'eebelbmub'

    

So … these are not surprising things, but I do love Python!

Oh, and: object.method(arguments) … s.find()

>>> breakfast = "We love to eat green eggs and ham."
>>> result = breakfast.find('eggs')
>>> result
21    # position of 'eggs' in breakfast

    

Love, love Python.

Book: Learning Python, 5th edition 

This chapter (pp. 93-132) gives a really good overview of what Python can do, and how it does it. I think if you are familiar with some other programming language(s), like C or C++, you would pretty much think Python is amazing after you read this, and you would want to learn more. On the other hand, if you are a relative newcomer to programming, you would probably be very confused and starting to think this is not a good book for you. (Note: The author says up front that this is not a “how to learn programming” book.)

I fall in the middle of those two types, although decidedly more toward the beginner. Yet I have been working on learning Python for more than a year, so I was not daunted by chapter 4. Also, I learned some new things that made me feel even more excited about Python.

In particular, nesting within both lists and dictionaries. OMG! You can really travel all over a table format, go in and out of JSON or other tabular data formats, change or find anything.

I learned that these two are the same, which I thought was pretty cool:

(1)

squares = [x ** 2 for x in [1, 2, 3, 4, 5]]


(2)

squares = []
for x in [1, 2, 3, 4, 5]:
    squares.append(x ** 2)


Book: Learning Python, 5th edition