Reflecting on Reflection

December 21, 2019

The other day I was working on Day 4 of Advent of Code. I was pretty delighted with myself for solving this one in a handful of minutes when days 2 and 3 took significantly longer. Once I had my code returning the correct value for all the sample inputs, I went ahead and ran it on the main input.

“Elapsed time: 9002.5386 msecs”

Nine seconds. My computer crunched numbers for 9 whole seconds before it spit out the answer.

I didn’t like that. I’ll be the first to admit that I’m no Clojure expert, but my solution wasn’t inelegant.

(defn get-digits
  "Transforms a string or a number into a list of individual digits"
  [number] (map #(Character/digit % 10) (str number)))

(defn valid-number? [number]
  (let [digits (get-digits number)]
    (and (not (apply distinct? digits))
         (apply <= digits))))

(defn part1 [[start end]]
  (count (filter valid-number? (range start (inc end)))))

(defn -main []
  (time (println "Day 04, part 1:" (part1 [402328 864247]))))

Nice and short! And yet, when run, my computer ground its gears for nine whole seconds (on an Intel i7-8700K, no less).

Well, at least I had access to part 2 of the problem now, and it looks like part 2 needed access to the same range of integers. Maybe generating that giant list of integers was adding a lot of time? I decided to exclude it from the timing calculations:

(defn part1 [numbers]
  (count (filter valid-number? numbers)))

(defn -main []
  (let [[start end] [402328 864247]
        numbers (range start (inc end))]
    (time (println "Day 04, part 1:" (part1 numbers)))))

“Elapsed time: 9057.4244 msecs”

Wait, it’s slower? But… oh yeah, it’s not actually generating all those numbers at once. From the range page on clojuredocs.org:

Returns a lazy seq of nums from start (inclusive) to end

Because range is lazy, it’s not actually building a list of numbers before we start timing the solution to our problem.

At this point, I was curious to see how much faster it would be to run our code on metal rather than on the JVM, so I compiled the code using the GraalVM native-image tool, and ran it again:

$ ./target/day04
Exception in thread "main" java.lang.ClassNotFoundException: java.lang.Character
        at com.oracle.svm.core.hub.ClassForNameSupport.forName(ClassForNameSupport.java:60)
        at java.lang.Class.forName(DynamicHub.java:1197)
        at clojure.lang.RT.classForName(RT.java:2211)
        at clojure.lang.RT.classForName(RT.java:2220)
        at advent_2019.day99$get_digits$fn__475.invoke(day99.clj:6)
        at clojure.core$map$fn__5866.invoke(core.clj:2755)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:51)
        at clojure.lang.RT.seq(RT.java:535)
        ...

A stack trace? At this point, I knew something in my code was to blame, and I took to the internet to see if I could find a clue.

Many of the search results turning up with Clojure and native-image troubleshooting tips mentioned adding a check for warn-on-reflection, so I thought it would be a good idea to turn on and try to compile my code again.

Reflection warning, advent_2019/day04.clj:6:18 - call to static method digit on java.lang.Character can’t be resolved (argument types: unknown, long).

Whoa! Now we’re on to something! Apparently my code was relying on reflection to determine function argument types.

But wait a minute — Clojure is dynamically typed. How else are you going to determine types? Is there a way to not rely on reflection in Clojure?

As a matter of fact, there is! It turns out that Clojure supports type hinting, which was news to me! From the official Clojure reference:

Clojure supports the use of type hints to assist the compiler in avoiding reflection in performance-critical areas of code.

Immediately followed by a very helpful example:

(defn len [x]
  (.length x))

(defn len2 [^String x]
  (.length x))

user=> (time (reduce + (map len (repeat 1000000 "asdf"))))
"Elapsed time: 3007.198 msecs"
4000000
user=> (time (reduce + (map len2 (repeat 1000000 "asdf"))))
"Elapsed time: 308.045 msecs"
4000000

Wait a minute… an order of magnitude faster?? And all I need to do is add some type hints?

Let’s refactor our code a little bit:

Before

(defn get-digits
  "Transforms a string or a number into a list of individual digits"
  [number] (map #(Character/digit % 10) (str number)))

After

(defn get-digit [^Character x] (Character/digit x 10))

(defn get-digits
  "Transforms a string or a number into a list of individual digits"
  [^Integer number] (map get-digit (str number)))

Having made the change, let’s run our code again:

“Elapsed time: 517.1315 msecs”

Holy smokes! From 9.05s down to 0.52s! That’s over 17x faster!

Turns out that, in this case, the reflection was really expensive. The get-digits function is critical to our solution, and is called thousands of times. By “avoiding reflection in performance-critical areas of our code,” we saw a pretty significant improvement in run time!

Since making this discovery, I have added the following to my project.clj file:

:global-vars {*warn-on-reflection* true}

Now, the compiler (and my REPL of choice as well!) will bark at me every time I make ambiguous typing choices. Doing so allowed me to find type ambiguities in my Day 02 and 03 solutions as well!

In the end, with a little type hinting, it turns out my solution wasn’t so bad after all 😊

…but we can still make it faster. Stay tuned for next time!