Tea & Tech (šŸµ)

Clojure on Metal

December 21, 2019

Once a year, when December rolls around, we dust off our Clojure skills for some fun Advent of Code programming challenges. For a week or so, anyway šŸ™‚

For me, though, solving the problems is just an excuse to become a better engineer. Sure, you get that dopamine hit when you solve the problem, the real rush comes when you get your correct solution running FAST.

Itā€™s not uncommon for me to spend twice as long ā€œcleaning upā€ my solution as it took to solve the problem. The results of these efforts are almost always blog-worthy, and I have some fun stories to share with you this year!

The first such adventure happened on the very first problem: Advent of Code 2019: Day 1.

If you have any interest in solving Advent of Code problems yourself, please be advised that this post contains spoilers! But only for day 1. Which Iā€™m sure you could solve very quickly. Go on, then! I promise this post will be here when you get back.

The Problem: Advent of Code 2019: Day 1

I solved this problem using the map-reduce paradigm:

  • Map over the input to calculate the fuel requirements
  • Reduce all the requirements into a single sum

In Clojure, the function to calculate the fuel requirements looks like:

(defn calc-fuel [mass]
  (-> mass (/ 3) (Math/floor) (int) (- 2)))

Take the mass, divide it by three, round down, cast it to an int, and subtract 2. And really this was just an excuse to test out the syntax highlighting in the blog.

The full solution can be found up on GitHub.

Running

Initially, I used the lein run command to run the solution:

$ lein trampoline run -m advent-2019.day01

The trampoline makes it run faster. From the lein docs:

For long-running lein run processes, you may wish to save memory with the higher-order trampoline task, which allows the Leiningen JVM process to exit before launching your projectā€™s JVM.

On my computer, day01 takes about 1.5s to run:

$ time lein trampoline run -m advent-2019.day01
Day 01, Part 1: 3337766
Day 01, Part 2: 5003788

real    0m1.571s
...

This is pretty slow for what the program does, but the above command takes the time to compile our code before running it. If we compile our code ahead of time, we donā€™t have to sit through the compilation!

Running Faster

Compiling our solution to a JAR means that the time it takes to run no longer includes the time to compile the code:

$ lein with-profile day01 uberjar

Compiling advent-2019.core
Compiling advent-2019.day01
Created /[...]/advent-2019/target/advent2019-0.1.0-SNAPSHOT.jar
Created /[...]/advent-2019/target/advent2019-day01.jar
$ time java -jar ./target/advent2019-day01.jar
Day 01, Part 1: 3337766
Day 01, Part 2: 5003788

real    0m0.519s
...

You can see weā€™ve shaved about 1 second off, reducing our execution time by about ~67%.

But we can do better. We need more speed.

Running ON METAL

At this point, having followed the development on and off of GraalVM for a couple of years, I was really at the point where I wanted to see what it could do for Clojure code. Iā€™ve always been of the mindset that, in Clojure, you have to sacrifice runtime speed in exchange for ease of development and high-level thinking. I was hopeful that, by compiling Java bytecode down to machine code, weā€™d be able to see substantial gains both in terms of execution time as well as memory consumption.

So I set out to see if we could run this program on ā€œthe metal.ā€ If we could go fast.

For those of you following along at home:

First, follow BrunoBonacciā€™s excellent instructions to get GraalVM installed and on your path to replace your default Java installation.

If GraalVM is installed correctly, you should see:

$ java -version
openjdk version "11.0.5" 2019-10-15
OpenJDK Runtime Environment (build 11.0.5+10-jvmci-19.3-b05-LTS)
OpenJDK 64-Bit GraalVM CE 19.3.0 (build 11.0.5+10-jvmci-19.3-b05-LTS, mixed mode, sharing)

Then, weā€™ll rebuild our JAR using GraalVM, and compile the new JAR down to machine code:

$ lein do clean, with-profile day01 uberjar
$ native-image --report-unsupported-elements-at-runtime --initialize-at-build-time -jar ./target/advent2019-day01.jar -H:Name=./target/day01

Weā€™re on the metal now! Our executable is over twice as large as the original JAR:

$ du -sh ./target/*
4.9M ./target/advent2019-day01.jar
10M  ./target/day01

ā€¦but it contains all of the Java and Clojure we need to run our program (in addition to the program itself) without use of the JVM. Large binaries are the tradeoff we make for superior performance and memory utilization.

It is time. We measure the speed:

$ time ./target/day01
Day 01, Part 1: 3337766
Day 01, Part 2: 5003788

real    0m0.004s
...

Our program is now two whole orders of magnitude faster than running a JAR on the JVM. When compared with running our program using lein run, it is 39,275% faster.

Memory Utilization

Instead of making vague hand-wavy claims about memory utilization, letā€™s run the numbers using /usr/bin/time -v, which is different than BASH time. We are interested in the ā€œMaximum resident set sizeā€, but I have included the other timing again because itā€™s just so interesting.

# JAVA
$ /usr/bin/time -v java -jar ./target/advent2019-day01.jar
    ...
    Command being timed: "java -jar ./target/advent2019-day01.jar"
    User time (seconds): 1.41
    System time (seconds): 0.21
    Percent of CPU this job got: 263%
    ...
    Maximum resident set size (kbytes): 319272
  ...

# BARE METAL
$ /usr/bin/time -v ./target/day01
    ...
    Command being timed: "./target/day01"
    User time (seconds): 0.00
    System time (seconds): 0.00
    Percent of CPU this job got: 33%
    ...
    Maximum resident set size (kbytes): 9916
    ...

Looks like we have a one-two-three punch when it comes to bare metal performance:

  • One OOM better CPU utilization
  • Two OOM better Memory utilization
  • Three OOM better real time elapsed

Gotta go Fast

I cannot take credit for this image.

OK, but what does Clojure think?

Letā€™s be honest: weā€™re mostly measuring things around program execution, rather than the time the computer spends actually performing calculations. Most, if not all, of the speedups we measured above come from the fact that weā€™re not using the JVM anymore, and donā€™t need to wait for it to warm up, read bytecode, and spit out an answer.

So letā€™s see what Clojure thinks.

Clojure provides a (time) function for us. It measures the amount of time the computer spends executing a function. Itā€™s pretty easy to use:

Before

(defn -main []
  (let [input (get-input "day01.txt" true)]
    (println "Day 01, Part 1:" (part1 input))
    (println "Day 01, Part 2:" (part2 input))))

After

(defn -main []
  (let [input (get-input "day01.txt" true)]
    (time (println "Day 01, Part 1:" (part1 input)))
    (time (println "Day 01, Part 2:" (part2 input)))))

When we recompile our program and run it with Java, we get:

$ java -jar ./target/advent2019-day01.jar
Day 01, Part 1: 3337766
"Elapsed time: 5.3791 msecs"
Day 01, Part 2: 5003788
"Elapsed time: 9.0699 msecs"

When we run it on bare metal, we get:

$ ./target/day01
Day 01, Part 1: 3337766
"Elapsed time: 0.4455 msecs"
Day 01, Part 2: 5003788
"Elapsed time: 0.5274 msecs"

Oh. So itā€™s still an order of magnitude faster than Java even forsaking all the advantages we get for not using the JVM.

Nice.

Conclusion

It took a bit of time and research getting compilation working, but I was excited to finally see if it made a difference for running Clojure. Man oh man, what a difference it made.

As a result of running these experiments, Iā€™ve had to change my mindset about Clojure. There are still many compelling arguments online for using a language like Rust if you need ultra-fast performance on the metal, but for the vast majority of applications, I feel that Clojure no longer requires that one compromise performance in exchange for all its other amazing benefits.

Basically:

  • It is faster to develop software using Clojure
  • It is faster and more memory-efficient to run (compiled) Clojure than it is to run JARs on the JVM
  • Clojure suffers from far less entropy and churn than the JavaScript and Python ecosystems, which further reduces development costs
  • But, like our large binary sizes, thereā€™s a high upfront cost to learning Clojure

As engineers, learning is part of the job. Iā€™m of the opinion that taking the time to learn while off the clock is a much more efficient way to become a better developer, because youā€™re in charge of how you spend your time.

Learning Clojure is probably one of the best things you can do to become a better developer, even if you donā€™t end up using it in your day job. Once youā€™ve made the decision to learn it, there really arenā€™t any downsides left! šŸ˜„


Andrew J. Pierce collects Yixing teapots and lives in Virginia with his wife, son, and Ziggy the cat. You can follow him on Twitter.

BTC: 121NtsFHrjjfxTLBhH1hQWrP432XhNG7kB
Ā© 2020 Andrew J. Pierce