Reinforcement Learning for Atari using PyTorch


I have recently been playing around with reinforcement learning, training agents to play Atari games like Pong. The agent observes the environment using only the pixels on the screen, similar to the way a human player would observe the game, and takes actions, which are button presses. The agent also receives rewards; for Pong, the reward is +1 when the agent scores a point, and -1 when the opponent scores a point.

Initially, the agent moves randomly, and loses almost every time. But occasionally, that random movement hits the ball, which either delays the negative reward, or even scores a point, for a positive reward, which reinforces the preceding actions.

This poses some interesting challenges: the rewards are intermittent, and often come long after the relevant actions. For example, if the agent hits the ball in a good direction, where the opponent cannot respond, it will not receive a reward until many frames later, after taking many other actions.


This was implemented in Python, using OpenAI Gym to simulate the Atari and handle the observations, actions and rewards of the agent. The agent architecture is based on the Asynchronous Advantage Actor-Critic (A3C) training algorithm, a state of the art algorithm for reinforcement learning. It uses a deep neural network, which is trained on two different outputs: the actor--a probability distribution, from which the next action is sampled--and a critic--an estimation of future reward. An accurate critic allows the performance of the actor to be more accurately scored after a short interval of time--perhaps we have not yet scored a point, but the critic can predict that we will score soon. Both of these values use the same underlying neural network, with different final layers--this helpful for training, because features useful for the actor are also likely to be useful for the critic, and vice-versa.

I decided to use PyTorch instead of TensorFlow for this project. TensorFlow is great for the sort of static neural networks used for things like image categorization, but the time dimension makes this problem more dynamic, which PyTorch handles elegantly.

The code is available on Github.

Gravity Simulation in Haskell+TensorFlow

After playing around with TensorFlow for machine learning in a recent competition, I got curious about how well TensorFlow might work for other highly parallel problems. I also found out that TensorFlow has Haskell bindings (although they are not nearly as polished or complete as the Python bindings). To test out both of these, I decided to write a simple N-body gravity simulation in Haskell using TensorFlow. Given the initial positions, velocities and masses of several objects, it simulates their movements and interactions.

Using data from NASA on the current positions and velocities of various planets and other solar system bodies, I was able to simulate approximate for centuries into the future, and produce plots like the following:

A plot of the orbits of the planets together with Pluto, some asteroids, and a comet over a period of 100 years.

A plot of the orbits of the planets together with Pluto, some asteroids, and a comet over a period of 100 years.

The code is available on github.

Counting Sea Lions

I decided to dive into neural networks by tackling a Kaggle competition. The problem is, given a large aerial image, count the number of sea lions and, more specifically, the number of adult males, sub-adult males, adult females, juveniles and pups. Pups were especially challenging (try to find the 7 pups in this image):

One small piece of a training image, showing a large adult male in the centre, with surrounded by adult females and pups, which are almost invisible against the gray rocks.

One small piece of a training image, showing a large adult male in the centre, with surrounded by adult females and pups, which are almost invisible against the gray rocks.

The basic approach was to train a deep convolutional neural network to approximate a blurred density function for each of the classes of sea lions. The predicted counts are just the sum of this density function over the entire image. 

This was implemented in Python using Keras, with some raw TensorFlow to deal with the density function. The neural network used a pre-trained image categorization network, with the final layer removed (Xception seemed to work the best).

This project was a collaboration with Victor Veitch and Geoffrey Scott. Code is available on Github.

3D Printers and Pliers

I have been slowly building a 3D printer based on this design. A friend has a printer of the same design, which I have been using to print the plastic parts for my printer. 

While printing the parts, I ran into a bit of an issue: the plastic filament is pushed through a tube to the print head, where it is melted and deposited to build up the object. The plastic only has a tiny hole to come out of at the print head, and can build up considerable pressure in the tube, and this eventually broke the mechanism that is supposed to secure the tube in place. So, in the middle of the print, it comes lose; the motor  just keeps spooling out filament, but instead of being pushed through the print head, it just coils around until someone notices it. The original mechanism that holds the tube is not 3D printed, and has to be ordered from China, which could take weeks.

First thing I tried was, of course, duct tape, but the persistent pressure was just too much.

But that's OK, surely I can design a 3D printed part to hold the tube in place! I found a nut that was just the right diameter to securely thread onto the outside of the tube.


Then I design and arm to hold the nut in place using OpenSCAD, a fun little programming language for designing 3D models.

So I can 3D print a fix for the 3D printer! Which requires a working printer... But I still only had the one, and it wasn't working. Luckily this piece is small, and I was able to get the print time down to about 1 hour. So I sat down with a podcast and some locking pliers to hold it in place, while I printed the brace.


And it worked! Here is the final piece


And here it is, holding the tube in place.


The STL model, and OpenSCAD source (so you can adjust the dimensions) is available on Thingiverse.

PhD Thesis

My PhD thesis, Connectivity, tree-decompositions and unavoidable-minors, was in the field of Graph Theory. I proved that any very large graphs is either made by attaching smaller graphs in a particular way, or contains a certain kind of substructure, illustrated here.

For those who are curious, I've included the abstract, and a link to the full thesis.


The results in this thesis are steps toward bridging the gap between the handful of
exact structure theorems known for minor-closed classes of graphs, and the very general,
yet wildly qualitative, Graph Minors Structure Theorem.

This thesis introduces a refinement of the notion of tree-width. Tree-width is a measure
of how “tree-like” a graph is. Essentially, a graph is tree-like if it can be decomposed across
a collection of non-crossing vertex-separations into small pieces. In our variant, which we
call k-tree-width, we require that the vertex-separations each have order at most k.

Tree-width and branch-width are related parameters in a graph, and we introduce a
branch-width-like variant for k-tree-width. We find a dual notion, in terms of tangles, for
our branch-width parameter, and we prove a generalization of Robertson and Seymour’s
Grid Theorem.

Download thesis (PDF)

Terra Mystica box organizers

Terra Mystica has been one of my favorite board games for many years. It has a lot of pieces (a "tree in a box" as Shut Up & Sit Down put it), so set up and clean up are a pain. So I decided it really needed some organizers.

This is built from foam core, cut and glued by hand (so there are, unfortunately, no useful plans to download).

Multi-Output Conduits?

In my last two posts I've been looking at ways to extend the conduit library to create conduits with 2 inputs and conduits with more than 2 inputs. One might wonder, then, about conduits with multiple outputs. I have found that multi-outuput conduits are much less interesting than multi-input conduits, and I want to explain why.

For multi-input conduits, we have await, lift await, and so on to take input from one input stream and leave the others undisturbed. This lets you say

I need an i1, not an i2, before I go any further.

Closely related to this is a 1-input conduit which takes an Either i1 i2 input. This lets you say

I need either an i1 or an i2, before I go any further.

Now lets consider these two strategies applied to the output side. We can have a nested, two-output conduit and use yield and lift yield. Then yield says

I have an o1, but no o2. You have to take it before I go any further.

If instead we have a 1-output conduit which gives values of type Either o1 o2, we can use yield Left and yield Right. Then yield Left says the same thing again!

I have an o1, but no o2. You have to take it before I go any further.

Whats going on here? When we have multiple input streams, we are assuming the streams are independent. If the input streams were coupled together in some way, then taking 1000 values from the first input without taking anything from the second input might cause memory leaks or worse. When we have two different output "streams", they are most certainly not independent. Anything trying to use these values better take them from the right stream at the right time or something bad will happen.

If 2-output conduits are allowed, one might naturally want to connect the two output conduits of A to the two input conduits of B. But this can lead to nonsensical situations, like A yielding on output 1 while B is awaiting on input 2:


So, multiple inputs can be very helpful, and are genuinely more powerful than zipped or Either tagged inputs, but multiple outputs are not worth the trouble. If you just want to send output to multiple sinks, you should use zipSinks.

(>2)-Input Conduits

Last time I described how to create a conduit that can read from either one of two input streams. Today I want to look at conduits with more than two input streams.

The key idea in my last post was to use "stateful monad morphisms", which can be hoisted into some monad transformers (specifically, the ConduitM i o transformer).

This works great to fuse to the inner input of a 2-input conduit. When I tried a 3-input conduit, however, I realized I got the abstraction a little wrong. I start off with fuseStateful on the inner input, then statefulHoist that to the second layer. But then I'm left with a plain old function, which I can't statefulHoist again to the outer layer. Using plain-old transPipe/hoist to get back to the outer conduit is no good for the same reason that it doesn't work for the 2-input conduit: all my careful stateful fusing and hoistng gets reset whenever there is an action in the outer conduit.

Luckily this is easy to fix: I just need fuseStateful to return another StatefulMorph rather than a function:

Now I can fuse to conduits of arbitrary depth:

I have updated the repository with these changes.

Multi-Input Conduits

I've been working with the Haskell conduit library lately. It is great for building pipelines to process streams of data. The Conduit monad allows you to await for an input value, yield an output value, or perform any action in an underlying monad (e.g. IO actions like writing to a file).

I did, however, run into difficulties when I tried to handle multiple inputs.

First, what does a "conduit with multiple inputs" mean? Well, a conduit with one input can await on that input whenever it needs an input value, so a conduit with two inputs should have two await operations, so await1 and await2 that respectively get an input value from the first and second input stream.

The solution, inspired by this section of the pipes manual, is to write a conduit whose inner monad is another conduit, e.g.

Then await1 and await2 become

This can be connected to two sources and a sink like this:

This works because source1 $$ merged returns a value in the underlying monad, which is ConduitM i2 o m, which the outer ($$) and (=$) can then connect to.

This is a good start. But I also wanted to transform the incoming streams by fusing (=$=), and this is where things got complicated.

Fusing the outer stream is easy: plain old (=$=) works. But how do I fuse the inner stream? First lets figure out the type of the function I want. It should take a conduit transforming values of some type i2' into i2 (so they can be consumed by the inner input), together with a 2-input conduit as described above. It should produce a 2-input conduit with the same outer input type (i1) but a new inner input type (i2'). So I want a function fuseInner with the following signature:

This signature shows why its hard to write this function: the underlying monad of the outer conduit is changing. As far as I could find, there is only one function in the conduit library that changes the underlying monad: transPipe. transPipe takes a monad morphism from the underlying monad of the conduit to some other monad, and a conduit, and produces a new conduit in the new monad. And (left =$=) appears to give the, monad morphism that we need:

This compiles just fine, but it doesn't do what we want. Every time the old conduit performed an action in the underlying monad, transPipe applies our monad morphism (left =$=) to get an action in the new underlying monad. So left is fused separately to each action in the inner conduit. If left is stateful (e.g. isolate), this fails completely, since the state is reset on every lift await. You might hope for some "better" implementation of transPipe that avoids this, but if you think about it, there isn't really any other option: there are multiple, separate actions in the first monad that you want to turn into multiple, separate actions in the second monad, so each one needs to be transformed separately.

Morally, we want (left =$=) to be a kind of "stateful transformation", in which each call fuses to the part of left that was remaining after the last call. This clearly isn't a (pure) function, but we can represent it with the following data structure

To use such a stateful transformation, we replace transPipe (a special case of hoist) with the following:

It is possible to write the instance StatefulHoist (ConduitM i o) by only slightly changing the code for transPipe. A replacement for (=$=) which uses a StatefulMorph can be written by slightly adjusting the code for the internal function pipeL. Its signature is

Putting it all together, we can now write fuseInner

I've posted the full implementation here, along with a test to demonstrate it really does what I say it does.

I have searched far and wide for a better way to do this, and would be very interested in any suggestions.

Another aspect of this approach that I haven't touched on is multiple outputs. When we have nested conduits, we not only have two possible input actions (await and lift await) we also have two possible output actions: yield and lift yield. What possible fun could be had with conduits with multiple inputs and multiple outputs? I will take a look at that in a later post.