How Clojure works for you
Dear Janet, I hope this email finds you well. I know I'm just a fictional character in a fictional email to set up the hook for this blog post, but I was wondering if you could help me speed up our slow library. We use the best algorithms money can buy, but the code still chugs along. Thanks, Jimmy James, Co-founder and CTO of SooperReel.
I receive fictional emails like this all the time from my thousands of fictional readers. First and foremost, we both know how much fun we can have, and how fast we can get it done when we choose Clojure. Some Clojure features might be worth exploring to leverage that little 'oompf' (or not). Chances are you won't really have to think about these think too much, but it may be helpful to know about them when making development choices.
Hashed Collections
I often lean on hashed collections. When we say hashed collections, we're really talking about two types in Clojure, the Map and Set, {}
and #{}
, respectively. As one of the most popular data structures ever, not exactly a huge revelation here that Maps (hash maps, not ArrayMaps) have a constant look up time. What might seem a bit less intuitive is a hash set. A Clojure hash set uses a map as the underlying implementation, setting the key and respective value as the same object1.
Chunked sequences
Even though Clojure vectors aren't hashed collections, they are still performant collections. Vectors box Java Object
arrays, allocated in sizes of 32. I like to think of the vector being allocated in 'chunks'. It doesn't use Clojure's chunking directly, instead opting to wrap the vector in a ChunkedSeq
or chunked sequence when seq
is called.
A chunked sequence2 allows a lazy sequence (lazy-seq) to realize some of the next nodes into memory, reducing the number of realizations over time which increases efficiency for lazy workloads3. As you can imagine, it wreaks havoc if you happen to use the lazy-seq with side-effects4 (which you shouldn't be doing anyways).
A lot Clojure types use the default chunk size of 32 items. If you want a different chunking size, you can create your own chunked type by reifying clojure.lang.IChunkedSeq
, but I wouldn't recommend it. Even though the Clojure chunking API is available to developers, the documentation does not cover it well compared to other functions in clojure.core
.
Transients
Unlike chunking, Clojure transients are well documented. Transients are stateful duplicates of a few core Clojure data structures that are intended to be encapsulated in a limited scope (say, a function). A function could take a vector, create a transient from it, and perform a number of conj!
calls, returning a persistent data structure after all the stateful stuff.
As such, Transients are fast because an intermediate collection doesn't have to be created each time conj!
evaluates. Clojure leans on transients under the hood, for example, when creating a vector from an ISeq.
The transient collections do this often, so no need to worry about calling transient
. Transients have some caveats though; They are only available for vectors, hash maps, hash sets, and array maps as well as transients only supporting a limited set of 'parallel operations':conj! assoc! dissoc! pop! disj!
. If you want to create your own transient for more functionality, nothing stops you from reifying clojure.lang.IEditableCollection
, clojure.lang.ITransientCollection
, and clojure.lang.IPersistentCollection
for smooth transient
integration though it strikes me as a bit faux pas.
Type Hinting
If you want the challenge of creating your own transient, you'll want to know about type hinting in Clojure. Type hints use metadata ^tags consumed by the compiler to direct the type of return values and function arguments.
As the documentation suggests, type hinting exists to avoid reflection calls when doing Java interop, and suggests against using type hints until performance becomes a problem5. As you might expect, the Clojure compiler will change the type to whatever the type hint says, including primitives. This means Clojure can take on some properties of statically typed languages (like complaining a lot).
As an early Clojure compiler experiment to get static typing, Clojure metadata accepted a :static
keyword, but it has been a no-op for some time6. The keyword can still be found all over functions in clojure.core
.
Inline functions
While you're digging around clojure.core
for :static
metadata, you might come across another metadata keyword called :inline
. :inline
has the value of a fn
reminiscent of a macro, or a "function template".
If you're familiar with inlining from other programming languages, you can surmise that Clojure inlining allows us to inject the body of the function template into the calling code instead of searching for the function in the symbol table and evaluating it. As Alex Miller tells me, the real benefit of inlining in Clojure comes from using inlining with local primitives. You can learn more aboout inlining from my mediocre post on Clojure inlining from not too long ago.
If you read this far, you'll enjoy the stuff I plan to write in the future, my next post might be on all of the software design patterns you'll never need when using Clojure. Subscribe for the next post or follow me on twitter @janetacarr , or don't ¯\_(ツ)_/¯ . You can also join the discussion about this post on twitter, hackernews, or reddit if you think I'm wrong.
1. Honestly, I put this one together before seeing the code in clojure.lang.PersistentHashSet, keep it in mind for your next interview.
2. Sequences in Clojure are chunked by default after Clojure 1.1
3. Largely paraphrasing this Clojure-doc article section sentence.
4. This lazy side-effects example inspired by this post from TX.
5. You can find the official documentation here.
6. I found this from a Clojure Google Groups discussion here.