Tea & Tech (🍵)

Reflecting on Reflection

December 21, 2019

The other day I was working on Day 4 of Advent of Code. I was pretty delighted with myself for solving this one in a handful of minutes when days 2 and 3 took significantly longer. Once I had my code returning the correct value for all the sample inputs, I went ahead and ran it on the main input.

“Elapsed time: 9002.5386 msecs”

Nine seconds. My computer crunched numbers for 9 whole seconds before it spit out the answer.

I didn’t like that. I’ll be the first to admit that I’m no Clojure expert, but my solution wasn’t inelegant.

(defn get-digits
  "Transforms a string or a number into a list of individual digits"
  [number] (map #(Character/digit % 10) (str number)))

(defn valid-number? [number]
  (let [digits (get-digits number)]
    (and (not (apply distinct? digits))
         (apply <= digits))))

(defn part1 [[start end]]
  (count (filter valid-number? (range start (inc end)))))

(defn -main []
  (time (println "Day 04, part 1:" (part1 [402328 864247]))))

Nice and short! And yet, when run, my computer ground its gears for nine whole seconds (on an Intel i7-8700K, no less).

Well, at least I had access to part 2 of the problem now, and it looks like part 2 needed access to the same range of integers. Maybe generating that giant list of integers was adding a lot of time? I decided to exclude it from the timing calculations:

(defn part1 [numbers]
  (count (filter valid-number? numbers)))

(defn -main []
  (let [[start end] [402328 864247]
        numbers (range start (inc end))]
    (time (println "Day 04, part 1:" (part1 numbers)))))

“Elapsed time: 9057.4244 msecs”

Wait, it’s slower? But… oh yeah, it’s not actually generating all those numbers at once. From the range page on clojuredocs.org:

Returns a lazy seq of nums from start (inclusive) to end

Because range is lazy, it’s not actually building a list of numbers before we start timing the solution to our problem.

At this point, I was curious to see how much faster it would be to run our code on metal rather than on the JVM, so I compiled the code using the GraalVM native-image tool, and ran it again:

$ ./target/day04
Exception in thread "main" java.lang.ClassNotFoundException: java.lang.Character
        at com.oracle.svm.core.hub.ClassForNameSupport.forName(ClassForNameSupport.java:60)
        at java.lang.Class.forName(DynamicHub.java:1197)
        at clojure.lang.RT.classForName(RT.java:2211)
        at clojure.lang.RT.classForName(RT.java:2220)
        at advent_2019.day99$get_digits$fn__475.invoke(day99.clj:6)
        at clojure.core$map$fn__5866.invoke(core.clj:2755)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:51)
        at clojure.lang.RT.seq(RT.java:535)
        ...

A stack trace? At this point, I knew something in my code was to blame, and I took to the internet to see if I could find a clue.

Many of the search results turning up with Clojure and native-image troubleshooting tips mentioned adding a check for warn-on-reflection, so I thought it would be a good idea to turn on and try to compile my code again.

Reflection warning, advent_2019/day04.clj:6:18 - call to static method digit on java.lang.Character can’t be resolved (argument types: unknown, long).

Whoa! Now we’re on to something! Apparently my code was relying on reflection to determine function argument types.

But wait a minute — Clojure is dynamically typed. How else are you going to determine types? Is there a way to not rely on reflection in Clojure?

As a matter of fact, there is! It turns out that Clojure supports type hinting, which was news to me! From the official Clojure reference:

Clojure supports the use of type hints to assist the compiler in avoiding reflection in performance-critical areas of code.

Immediately followed by a very helpful example:

(defn len [x]
  (.length x))

(defn len2 [^String x]
  (.length x))

user=> (time (reduce + (map len (repeat 1000000 "asdf"))))
"Elapsed time: 3007.198 msecs"
4000000
user=> (time (reduce + (map len2 (repeat 1000000 "asdf"))))
"Elapsed time: 308.045 msecs"
4000000

Wait a minute… an order of magnitude faster?? And all I need to do is add some type hints?

Let’s refactor our code a little bit:

Before

(defn get-digits
  "Transforms a string or a number into a list of individual digits"
  [number] (map #(Character/digit % 10) (str number)))

After

(defn get-digit [^Character x] (Character/digit x 10))

(defn get-digits
  "Transforms a string or a number into a list of individual digits"
  [^Integer number] (map get-digit (str number)))

Having made the change, let’s run our code again:

“Elapsed time: 517.1315 msecs”

Holy smokes! From 9.05s down to 0.52s! That’s over 17x faster!

Turns out that, in this case, the reflection was really expensive. The get-digits function is critical to our solution, and is called thousands of times. By “avoiding reflection in performance-critical areas of our code,” we saw a pretty significant improvement in run time!

Since making this discovery, I have added the following to my project.clj file:

:global-vars {*warn-on-reflection* true}

Now, the compiler (and my REPL of choice as well!) will bark at me every time I make ambiguous typing choices. Doing so allowed me to find type ambiguities in my Day 02 and 03 solutions as well!

In the end, with a little type hinting, it turns out my solution wasn’t so bad after all 😊

…but we can still make it faster. Stay tuned for next time!


Andrew J. Pierce collects Yixing teapots and lives in Virginia with his wife, son, and Ziggy the cat. You can follow him on Twitter.