Jul 6, 2022

How Clojure works for you

Dear Janet, I hope this email finds you well. I know I'm just a fictional character in a fictional email to set up the hook for this blog post, but I was wondering if you could help me speed up our slow library. We use the best algorithms money can buy, but the code still chugs along. Thanks, Jimmy James, Co-founder and CTO of SooperReel.

I receive fictional emails like this all the time from my thousands of fictional readers. First and foremost, we both know how much fun we can have, and how fast we can get it done when we choose Clojure. Some Clojure features might be worth exploring to leverage that little 'oompf' (or not). Chances are you won't really have to think about these think too much, but it may be helpful to know about them when making development choices.

Hashed Collections

I often lean on hashed collections. When we say hashed collections, we're really talking about two types in Clojure, the Map and Set, {} and #{}, respectively. As one of the most popular data structures ever, not exactly a huge revelation here that Maps (hash maps, not ArrayMaps) have a constant look up time. What might seem a bit less intuitive is a hash set. A Clojure hash set uses a map as the underlying implementation, setting the key and respective value as the same object¹.

Chunked sequences

Even though Clojure vectors aren't hashed collections, they are still performant collections. Vectors box Java Object arrays, allocated in sizes of 32. I like to think of the vector being allocated in 'chunks'. It doesn't use Clojure's chunking directly, instead opting to wrap the vector in a ChunkedSeq or chunked sequence when seq is called.

user> (type (seq [1 2 3]))
;; clojure.lang.PersistentVector$ChunkedSeq

Calling seqon a vector will give us chunking. Reducers like map and filter call seq on their collection argument.

A chunked sequence² allows a lazy sequence (lazy-seq) to realize some of the next nodes into memory, reducing the number of realizations over time which increases efficiency for lazy workloads³. As you can imagine, it wreaks havoc if you happen to use the lazy-seq with side-effects⁴ (which you shouldn't be doing anyways).

user> (take 1 (map println [1 2 3 4 5]))
;; 1
;; 2
;; 3
;; 4
;; 5
;; (nil)
user> (take 1 (map println '(1 2 3 4 5)))
;; 1
;; (nil)

Why doseq exists

A lot Clojure types use the default chunk size of 32 items. If you want a different chunking size, you can create your own chunked type by reifying clojure.lang.IChunkedSeq, but I wouldn't recommend it. Even though the Clojure chunking API is available to developers, the documentation does not cover it well compared to other functions in clojure.core.

Transients

Unlike chunking, Clojure transients are well documented. Transients are stateful duplicates of a few core Clojure data structures that are intended to be encapsulated in a limited scope (say, a function). A function could take a vector, create a transient from it, and perform a number of conj! calls, returning a persistent data structure after all the stateful stuff.

(defn mapv
  "Returns a vector consisting of the result of applying f to the
  set of first items of each coll, followed by applying f to the set
  of second items in each coll, until any one of the colls is
  exhausted.  Any remaining items in other colls are ignored. Function
  f should accept number-of-colls arguments."
  {:added "1.4"
   :static true}
  ([f coll]
     (-> (reduce (fn [v o] (conj! v (f o))) (transient []) coll)
         persistent!))
  ([f c1 c2]
     (into [] (map f c1 c2)))
  ([f c1 c2 c3]
     (into [] (map f c1 c2 c3)))
  ([f c1 c2 c3 & colls]
     (into [] (apply map f c1 c2 c3 colls))))

mapv using transients for speed. into also uses transients. Both function from clojure.core

As such, Transients are fast because an intermediate collection doesn't have to be created each time conj! evaluates. Clojure leans on transients under the hood, for example, when creating a vector from an ISeq.

static public PersistentVector create(ISeq items){
    Object[] arr = new Object[32];
    int i = 0;
    for(;items != null && i < 32; items = items.next())
        arr[i++] = items.first();

    if(items != null) {  // >32, construct with array directly
        PersistentVector start = new PersistentVector(32, 5, EMPTY_NODE, arr);
        TransientVector ret = start.asTransient();
        for (; items != null; items = items.next())
            ret = ret.conj(items.first());
        return ret.persistent();
    } else if(i == 32) {   // exactly 32, skip copy
        return new PersistentVector(32, 5, EMPTY_NODE, arr);
    } else {  // <32, copy to minimum array and construct
        Object[] arr2 = new Object[i];
        System.arraycopy(arr, 0, arr2, 0, i);
        return new PersistentVector(i, 5, EMPTY_NODE, arr2);
    }
}

PersistentVector.create() method from clojure.lang.PersistentVector. Maybe this is why I can't seem to benchmark these things properly

The transient collections do this often, so no need to worry about calling transient. Transients have some caveats though; They are only available for vectors, hash maps, hash sets, and array maps as well as transients only supporting a limited set of 'parallel operations':conj! assoc! dissoc! pop! disj!. If you want to create your own transient for more functionality, nothing stops you from reifying clojure.lang.IEditableCollection, clojure.lang.ITransientCollection, and clojure.lang.IPersistentCollection for smooth transient integration though it strikes me as a bit faux pas.

Type Hinting

If you want the challenge of creating your own transient, you'll want to know about type hinting in Clojure. Type hints use metadata ^tags consumed by the compiler to direct the type of return values and function arguments.

(deftype MyChunk []
  clojure.lang.IChunk
  (dropFirst ^clojure.lang.IChunk [_]
    (chunk (chunk-buffer 32)))
  (reduce ^Object [_ ^clojure.lang.IFn f ^Object start]
    (f start)))

Functions have the tags following the symbol, but arguments have the tags before argument name.

As the documentation suggests, type hinting exists to avoid reflection calls when doing Java interop, and suggests against using type hints until performance becomes a problem⁵. As you might expect, the Clojure compiler will change the type to whatever the type hint says, including primitives. This means Clojure can take on some properties of statically typed languages (like complaining a lot).

user> (defn hints [^long x] (println x))
;; #'user/hints
user> (hints 1)
;; 1
;; nil
user> (hints "hi")
Execution error (ClassCastException) at user/eval69193 (REPL:3).
class java.lang.String cannot be cast to class java.lang.Number (java.lang.String and java.lang.Number are in module java.base of loader 'bootstrap')

user> (defn hints [^String x] (println x))
;; #'user/hints
user> (hints 1)
;; 1
;; nil
user (hints "hi")
;; hi
;; nil

It still tries to coerce though

As an early Clojure compiler experiment to get static typing, Clojure metadata accepted a :static keyword, but it has been a no-op for some time⁶. The keyword can still be found all over functions in clojure.core.

Inline functions

While you're digging around clojure.core for :static metadata, you might come across another metadata keyword called :inline. :inline has the value of a fn reminiscent of a macro, or a "function template".

(defn neg?
  "Returns true if num is less than zero, else false"
  {
   :inline (fn [num] `(. clojure.lang.Numbers (isNeg ~num)))
   :added "1.0"}
  [num] (. clojure.lang.Numbers (isNeg num)))

neg? from clojure.core

If you're familiar with inlining from other programming languages, you can surmise that Clojure inlining allows us to inject the body of the function template into the calling code instead of searching for the function in the symbol table and evaluating it. As Alex Miller tells me, the real benefit of inlining in Clojure comes from using inlining with local primitives. You can learn more aboout inlining from my mediocre post on Clojure inlining from not too long ago.

If you read this far, you'll enjoy the stuff I plan to write in the future, my next post might be on all of the software design patterns you'll never need when using Clojure. Subscribe for the next post or follow me on twitter @janetacarr , or don't ¯\_(ツ)_/¯ . You can also join the discussion about this post on twitter, hackernews, or reddit if you think I'm wrong.

1. Honestly, I put this one together before seeing the code in clojure.lang.PersistentHashSet, keep it in mind for your next interview.
2. Sequences in Clojure are chunked by default after Clojure 1.1
3. Largely paraphrasing this Clojure-doc article section sentence.
4. This lazy side-effects example inspired by this post from TX.
5. You can find the official documentation here.
6. I found this from a Clojure Google Groups discussion here.

Hashed Collections

Chunked sequences

Transients

Type Hinting

Inline functions

Subscribe to Janet A. Carr