wtf is Clojure inlining anyway


If you spend your time digging around clojure.core like I find myself from time to time, you might come across a interesting metadata keyword called :inline. I have never seen :inline in any of the non-core libraries in my career, so I've been digging around to figure out what the intent behind it is.

The term inline seems reminiscent of inline functions from C and C++. In short, inline is a compiler directive to suggest that the compiler place the assembly/object code of the inline function in the calling code. The immediate benefit being all the effort for stack frames aren't necessary, very important in the days when hackers only had thousands of CPU cycles to work with.  Not so important today, or is it?

What it looks like

Clojure's inline functions serve a similar purpose, telling the Clojure compiler to use this bit of code directly. Let's take a look at one:

(defn neg?
  "Returns true if num is less than zero, else false"
  {
   :inline (fn [num] `(. clojure.lang.Numbers (isNeg ~num)))
   :added "1.0"}
  [num] (. clojure.lang.Numbers (isNeg num)))
neg? from clojure.core

The :inline keyword has a value of a function with fixed arity, returning a syntax quote much like a macro, or "function template", yet it still has a normal function body as well. When calling the function directly the Clojure compiler will opt to use the inline version, and avoid the the lookup in the symbol table. However, if using the function as a callback, the compiler will opt to do the lookup. Let's look at a simple Clojure example1:

(ns inline-fun.core
  (:gen-class))

(defn -main
  "A very complicated function"
  [& args]
  (neg? -1)
  (filter neg? [1 -1 2]))

Looking at the invokeStatic method of the decompiled Java class, we can see the Clojure compiler did not opt to look up the var for neg? like it did for filter and filter's predicate2.

package inline_fun;

import clojure.lang.*;

public final class core$_main extends RestFn
{
    public static final Var const__0;
    public static final Var const__2;
    public static final AFn const__5;

    public static Object invokeStatic(final ISeq args) {
        Numbers.isNeg(-1L); //our inline function
        return ((IFn)core$_main.const__2.getRawRoot()).invoke(core$_main.const__0.getRawRoot(), core$_main.const__5); //our regular function
    }

    public Object doInvoke(final Object o) {
        return invokeStatic((ISeq)o);
    }

    @Override
    public int getRequiredArity() {
        return 0;
    }

    static {
        const__0 = RT.var("clojure.core", "neg?");
        const__2 = RT.var("clojure.core", "filter");
        const__5 = (AFn)Tuple.create(1L, -1L, 2L);
    }
}
A good reminder of why we write Clojure

Since it has the same function body as it's inline,neg? is a bad example to use, but we can see why the inline function template uses the Java interop syntax. A call to a Clojure function would cause an error because the complier skips fn parsing all together if the inline exists, applying it to the next immediate form in the s-expression. Intuitively, inline functions feel like they should be performant though I'm bit skeptical about the performance pay off here.

Performance

I decided to do some benchmarking to see if it's worth the effort, and I switched to inc for this test:

(defn inc
  "Returns a number one greater than num. Does not auto-promote
  longs, will throw on overflow. See also: inc'"
  {:inline (fn [x] `(. clojure.lang.Numbers (~(if *unchecked-math* 'unchecked_inc 'inc) ~x)))
   :added "1.2"}
  [x] (. clojure.lang.Numbers (inc x)))
inc from clojure.core
user> (bench (mapv #(inc %) (vec (range 10000)))) ;;inlined-ish
Evaluation count : 99660 in 60 samples of 1661 calls.
Execution time mean : 622.482492 µs
Execution time std-deviation : 13.981715 µs
Execution time lower quantile : 600.841206 µs ( 2.5%)
Execution time upper quantile : 647.772449 µs (97.5%)
Overhead used : 15.506680 ns
Found 1 outliers in 60 samples (1.6667 %)
low-severe	 1 (1.6667 %)
Variance from outliers : 10.9641 % Variance is moderately inflated by outliers

user> (bench (mapv inc (vec (range 10000)))) ;;not inlined
Evaluation count : 100740 in 60 samples of 1679 calls.
Execution time mean : 626.033642 µs
Execution time std-deviation : 12.133069 µs
Execution time lower quantile : 605.053052 µs ( 2.5%)
Execution time upper quantile : 647.344216 µs (97.5%)
Overhead used : 15.506680 ns
Found 1 outliers in 60 samples (1.6667 %)
low-severe	 1 (1.6667 %)
Variance from outliers : 7.8519 % Variance is slightly inflated by outliers

user> (bench (mapv #(inc %) (vec (range 100000)))) ;;inlined-ish
Evaluation count : 10020 in 60 samples of 167 calls.
Execution time mean : 6.547364 ms
Execution time std-deviation : 262.119623 µs
Execution time lower quantile : 6.230810 ms ( 2.5%)
Execution time upper quantile : 7.195998 ms (97.5%)
Overhead used : 15.506680 ns
Found 2 outliers in 60 samples (3.3333 %)
low-severe	 2 (3.3333 %)
Variance from outliers : 27.0290 % Variance is moderately inflated by outliers

user> (bench (mapv inc (vec (range 100000)))) ;;not inlined
Evaluation count : 9180 in 60 samples of 153 calls.
Execution time mean : 6.819663 ms
Execution time std-deviation : 238.618839 µs
Execution time lower quantile : 6.546443 ms ( 2.5%)
Execution time upper quantile : 7.355244 ms (97.5%)
Overhead used : 15.506680 ns
Found 4 outliers in 60 samples (6.6667 %)
low-severe	 3 (5.0000 %)
low-mild	 1 (1.6667 %)
Variance from outliers : 22.1753 % Variance is moderately inflated by outliers

user> (bench (mapv #(inc %) (vec (range 1000000)))) ;;inline-ish
Evaluation count : 1020 in 60 samples of 17 calls.
Execution time mean : 73.953864 ms
Execution time std-deviation : 12.304344 ms
Execution time lower quantile : 62.346597 ms ( 2.5%)
Execution time upper quantile : 102.226796 ms (97.5%)
Overhead used : 15.506680 ns
Found 5 outliers in 60 samples (8.3333 %)
low-severe	 4 (6.6667 %)
low-mild	 1 (1.6667 %)
Variance from outliers : 87.5993 % Variance is severely inflated by outliers

user> (bench (mapv inc (vec (range 1000000)))) ;; not inline
Evaluation count : 960 in 60 samples of 16 calls.
Execution time mean : 74.353661 ms
Execution time std-deviation : 9.884900 ms
Execution time lower quantile : 67.074004 ms ( 2.5%)
Execution time upper quantile : 103.941296 ms (97.5%)
Overhead used : 15.506680 ns
Found 9 outliers in 60 samples (15.0000 %)
low-severe	 6 (10.0000 %)
low-mild	 3 (5.0000 %)
Variance from outliers : 80.6840 % Variance is severely inflated by outliers
A very scientific test.

There's a few caveats with this test. First, because I wanted a somewhat fair test, I wrapped the inline function in a lambda since the fn object produced by the compiler inlines inc for us. Second, the sample size could be better. Lastly, because the inline function template uses the faster unchecked math functions, it may skew the results toward the inline function, but it's good insight into how an inline function should (or shouldn't) be written (with the hacky fast bits). All that said, I still think we got some results to inform our decision making here.

Based on the bench form provided by cirterium, we can see some marginal improvement from inlining with sufficiently large input on the order of hundreds of microseconds. Not exactly a huge jump, but might be worth the effort when dealing with incredibly large input (though you shouldn't be using eager collections with such input anyways).

When to use it

After all the benchmarking, you might be itching to rewrite the slow library drawing your ire, but think twice before you do. Remember the compiler skips parsing all together if the inline exists, so macroexpand-1 won't save your ass here. If you're writing a Clojure(script) library3, Clojurescript does not support inlining, and the compiler will ignore the metadata keyword. If at all, my suggestion would be to use it in time critical applications with large datasets, like an ad exchange bidder, though it may be more advantagous to look at GraalVM's native-image capabilities instead.

Conclusion

There you have it, Inline metadata in Clojure def tells the Clojure compiler to directly substitute this call for the function template provided and is best suited for repeated symbol table look ups like large reducers or time critical applications.

If you read this far, you'll enjoy the stuff I plan to write in the future. Stay tuned for the next post. I'm thinking it will be a post on a few things to make your Clojure code faster.

1. This example is by and large a recreation of the example Alex Yukashev uses, and serves as the technical inspiration for this post. You can find his post here.
2. Technically these symbols would have been created if they didn't exist, see interning for more details.
3. That is, a library supporting both Clojure and Clojurescript; similarly, your library has files ending with .cljc

Subscribe to Janet A. Carr

Sign up now to get access to the library of members-only issues.
Jamie Larson
Subscribe