art taylor

 
« Back to blog

Adventures in Haskell optimization

Phil-1
After getting some grief about the tools used in my last foray into graphing philosophical heritage[1] I decided to re-implement the influence graph in Haskell.  How hard could it be?  I've written a lot of "scripts" with Haskell, along with about half of my Project Euler problems, although I haven't touched those in a couple years.

OMG.  Seven lines of admittedly contortionist Mathematica turned into about 170 lines of contortionist Haskell, but at this point I could cut it in half (at least) by writing two monads.  Compared to Mathematica, Haskell is a bit slower for the same operations, but the "implicit" threading enabled by the type system helps even the score, snarfing up to six cores at once.

I did a simple pull of a network of philosophers (no .dot generation at that point, just displaying it with show) and Haskell's laziness hit me harder than it ever has before.  I had a huge number of thunks carrying around copies of large Maps and Wikipedia pages, and it was leaking memory like a thing that has a lot of holes in it and cannot hold water, but instead of water, it holds memory.  But not very well at all.  

Because ghc still can't build 64-bit executables on Snow Leopard, I ran up to about 2-3GB and then the program would barf.  I tried to find a way to get ghc to emit x86_64 code on 10.6, but I was stymied.  I had to make do in the cramped quarters of 32-bit addressing.  This was one area where Mathematica spanked it, and while 32- vs 64-bitness doesn't completely justify the cost difference between the "home" and "professional" versions of Mathematica 7, it does make me happy to have my aging copy of Mathematica 6.  Not gc'ing 97% of the time makes a big difference.

Because of ghc's memoization of let bindings and thunks, It became faster for me to retrieve a Wikipedia page for every piece of information I needed rather than to bind it to a variable (which I then persisted in a local file for subsequent requests).  This is because my hackish approach was to grab the name, "influences", and "influenced by" chunks from the page, and then recur up and down to grab the same information for the afferent and efferent influence lists until I hit bottom.  I recurred with proper tail calls, but I'm not sure if ghc eliminates them -- I haven't looked at the core output yet.  I wasn't about to code an iterative version.

OK, that's exactly what I ended up doing, but I used map and lazy node evaluation, so it doesn't really count.  Map is still functional!

Some minor issues with UTF-8 string formatting (thank you Codec.Binary.UTF8.String and decodeString) later, I could generate a monochrome version of the graph that I was able to export from Mathematica.  It doesn't import well into OmniGraffle Pro because OGP doesn't support penwidth for edges in dotfiles (*sigh*), but the layout is pretty fast.  I also don't have the REPL ability to get shortest paths as I could with Mathematica, but I suspect I could have done it if I could stomach using ghci.  Then again, calling ghci a REPL is stretching the definition, at least on the Mac.

When it came to emitting the dotfile, I ended up being lazy about scaling nodes based on relative influence.  I had planned an exponential moving value based on the upstream and downstream influence, so someone with no influences but who influenced a lot of people, who influenced a lot of people. who influenced a lot of people, until reaching bottom. (see where those thunks came from?) would be really large, requiring logarithmic scaling.  I wimped out at midnight due to some cycles in the directed graph, and just did a simple ratio of immediate influences and influenzas with a telomeric recursion.  If I dig this up on another night, I might finish it up, but it's already exceeding my attention span.  I figure I'll put it aside and stumble upon it in six months and wonder what I was smoking.

My last step was to apply coloring to the graph to help trace the lines.  I did a simple CLUT based on the "net clout" mentioned above.

Anyway, enjoy the ugly picture.  Mathematica was in no way harmed in the production.

Click here to download:
phil-1.pdf (1.23 MB)
(download)

NB: To be fair, I am the worst Haskell programmer in the world.  At least, the worst Haskell programmer who can get anything to compile.  The nice thing about Haskell is that if it compiles, it will come close to doing what the programmer intended, delivering on the promise of "if it builds, ship it!"  That doesn't mean it's right for me to perpetrate disasters like this:

producePhilosopherDescriptions :: Philosopher -> [String]
producePhilosopherDescriptions phil = map (\k -> "\"" ++ k ++ "\" [label = \"" ++ k ++ "\" height=" ++ (show $ philosopherInfluenceScore phil 1 k) ++ " width=" ++ (show $ philosopherInfluenceScore phil 1 k) ++  " fontsize=" ++ (show $ 10 * (philosopherInfluenceScore phil 1 k)) ++ "];") $ Map.keys $ nameToURLMap phil

[1] "Who has a copy of Mathematica lying around‽"

Comments (2)

Sep 11, 2010
applicative said...
Are you using the build of ghc (6.12 at the moment) that comes with the Haskell Platform OS X binary installer? Unlike it, the MacPorts ghc (6.10 at the moment) is 64 bit (I haven't been using it myself, since I wanted some feature of 6.12, I can't remember which.) You should post the module, it sounds like fun.
Sep 11, 2010
Chris K said...
No. GHC on OS X cannot create a 64-bit application.

Leave a comment...