2.5.4. More useful Python constructs

To demonstrate some of these constructs, let’s start by loading some data to work on. As before, let’s work on the 10,000 quadratic equations we defined before, and load this data from JSON.

data = json.load(open('/dev/stdin', 'r'))

We read the JSON data from stdin, meaning that we can define the input file on the command line like this:

$ python2 useful.py < test.json

In the following, we assume the data is stored in a list of lists, i.e.:

[
   [
      6.44887349885013,
      -6.80408794830961,
      5.87205564819696
   ],
   [
      0.839489029433732,
      -8.86306739443021,
      -1.69246328305772
...

2.5.4.1. List comprehensions

List comprehensions are a shorthand notation for doing things with lists. They spare you at times from having to write tedious for loops.

bs = [l[1] for l in data]

This line fetches the second column of our data, i.e. all the ‘b’ values for our quadratic equations (hence the variable name, ‘b’s). What it means is:

“For each data point (l) in our input list (data), take the second element.”
print "Number of b's: " + str(len(bs)) # prints 10,000

The variable bs is a list of 10,000 numbers. With “len(bs)” we can obtain the length of the list. (With “str()” we can convert data to a string for output purposes.)

We can now calculate the average of all ‘b’s:

b_avg = sum(bs) / float(len(bs))

(We need to convert the denominator to float to ensure the result is also float.)

In other words, while for delimiter separated data we could write a one-liner in awk to calculate the average, for JSON we’ll need to write about three lines of Python. On the other hand JSON is in general more powerful than delimiter separated data.

2.5.4.2. Slices

first_hundred = data[:100] # len(first_hundred) is 100

Here, with “[:100]” we say “slice the list such that only the first hundred elements are included”.

You can also slice the end of the list by using negative index.

2.5.4.3. Sorting

Python has sorting built in:

print sorted(data, reverse=True)[:3]

This line introduces a few concepts:

  • By calling the function sorted(data), we can obtain a sorted copy of the data.
  • sorted() takes an optional parameter “reverse” which we set to True. This means that the returned list will be sorted in descending order instead of ascending.

2.5.4.4. Useful built in functions

“sum” sums up all the numbers in a list:

sum_of_all_numbers = sum([sum(l) for l in data])

“all” returns True if all values are True in a list, and False otherwise. (“any” return True if any element in a list is True.)

def all_numbers_in_list_are_above_zero(l):
    return all([n > 0 for n in l])

List comprehensions also support filtering, i.e. applying a function to only include certain elements in the resulting list:

above_zero_equations = [f for f in data if all_numbers_in_list_above_zero(f)]

Exercise: For your data set, find the quadratic equations where the sum of its values (a, b and c) is the highest.

Exercise: For your data set, find the quadratic equations where the sum of its values (a, b and c) is the highest and all values a, b and c are negative.

2.5.4.5. Tuples

Tuples in Python are quite similar to lists, but the size of a tuple is fixed. This means that while you can remove or add elements to a list in Python, this isn’t possible with tuples, making tuples a nice way to implicitly document a fixed nature of some of your data.

Tuples can be created using parentheses:

my_tuple = ('a', 1)

They can be accessed like you access lists:

my_tuple[0] # returns 'a'
my_tuple[1] # returns 1

They can be used like any other variable, e.g. added in lists:

my_list = [('a', 1), ('b', 2)]
print my_list[1][0] # first element of the second element in the list, i.e. 'b'

A fun function is “zip” - which zips two given lists to one list of tuples:

list1 = ['a', 'b', 'c']
list2 = [1, 2, 3]

zipped_list = zip(list1, list2)
print zipped_list # prints [('a', 1), ('b', 2), ('c', 3)]