This is the API documentation of the mimalloc allocator (pronounced "me-malloc") – a general purpose allocator with excellent performance characteristics. Initially developed by Daan Leijen for the run-time systems of the Koka and Lean languages.

It is a drop-in replacement for malloc and can be used in other programs without code changes, for example, on Unix you can use it as:

> LD_PRELOAD=/usr/bin/libmimalloc.so myprogram

Notable aspects of the design include:

small and consistent: the core library is about 10k LOC using simple and consistent data structures. This makes it very suitable to integrate and adapt in other projects. For runtime systems it provides hooks for a monotonic heartbeat and deferred freeing (for bounded worst-case times with reference counting). Partly due to its simplicity, mimalloc has been ported to many systems (Windows, macOS, Linux, WASM, various BSD's, Haiku, MUSL, etc) and has excellent support for dynamic overriding. At the same time, it is an industrial strength allocator that runs (very) large scale distributed services on thousands of machines with excellent worst case latencies.
free list sharding: instead of one big free list (per size class) we have many smaller lists per "mimalloc page" which reduces fragmentation and increases locality – things that are allocated close in time get allocated close in memory. (A mimalloc page contains blocks of one size class and is usually 64KiB on a 64-bit system).
free list multi-sharding: the big idea! Not only do we shard the free list per mimalloc page, but for each page we have multiple free lists. In particular, there is one list for thread-local free operations, and another one for concurrent free operations. Free-ing from another thread can now be a single CAS without needing sophisticated coordination between threads. Since there will be thousands of separate free lists, contention is naturally distributed over the heap, and the chance of contending on a single location will be low – this is quite similar to randomized algorithms like skip lists where adding a random oracle removes the need for a more complex algorithm.
eager page purging: when a "page" becomes empty (with increased chance due to free list sharding) the memory is marked to the OS as unused (reset or decommitted) reducing (real) memory pressure and fragmentation, especially in long running programs.
secure: mimalloc can be built in secure mode, adding guard pages, randomized allocation, encrypted free lists, etc. to protect against various heap vulnerabilities. The performance penalty is usually around 10% on average over our benchmarks.
first-class heaps: efficiently create and use multiple heaps to allocate across different regions. A heap can be destroyed at once instead of deallocating each object separately. v3.2+ has true first-class heaps where one can allocate in a heap from any thread.
bounded: it does not suffer from blowup [1], has bounded worst-case allocation times (wcat) (upto OS primitives), bounded space overhead (~0.2% meta-data, with low internal fragmentation), and has no internal points of contention using only atomic operations.
fast: In our benchmarks (see below), mimalloc outperforms other leading allocators (jemalloc, tcmalloc, Hoard, etc), and often uses less memory. A nice property is that it does consistently well over a wide range of benchmarks. There is also good huge OS page support for larger server programs.

There are three maintained versions of mimalloc. All versions are mostly equal except for how the OS memory is handled. New development is mostly on v3, while v1 and v2 are maintained with security and bug fixes.

v1: initial design of mimalloc (release tags: v1.9.x, development branch dev). Send PR's against this version if possible.
v2: main mimalloc version. Uses thread-local segments to reduce fragmentation. (release tags: v2.2.x, development branch dev2)
v3: simplifies the lock-free design of previous versions, and improves sharing of memory between threads. On certain large workloads this version may use (much) less memory. Also supports true first-class heaps (that can allocate from any thread) and has more efficient heap-walking (for the CPython GC for example). (release tags: v3.2.x, development branch dev3).

You can read more on the design of mimalloc in the technical report which also has detailed benchmark results. To learn more about the internal implementation of mimalloc, see include/mimalloc/types.h.

Further information: