Skip to content
girving edited this page Sep 13, 2010 · 3 revisions

This page describes the low level memory layout of the Duck runtime.

Garbage collection

Duck uses conservative garbage collection in order to avoid constraints on how we lay out memory and to ease interaction with LLVM and other low level representations.

Primitive types

Primitive types such as Int and Chr are unboxed and identical to their C analogs. The current C mapping is

  1. Int = int
  2. Chr = char

Structure packing

Boxed representations are arranged with C alignment rules, except that zero-size entries actually have zero size. For example, the tuple type ((),Int,(),Int) has identical memory layout to the tuple (Int,Int).

Algebraic datatypes

The general algebraic datatype has the form

data T a0 ... of
  S0
  ...
  C0 t0 ...
  ...

where

  1. a0 … are type variables
  2. S0 … are nullary type constructors with no argument
  3. C0 … are type constructors with arguments t1 …

The memory layout of such a type is given by the following rules:

  1. Zero size constructor arguments are treated as if they didn’t exist for purposes of these rules. In particular, constructors with only zero size arguments are treated as nullary.
  2. Nullary type constructors are represented as unboxed, word-size integer tags counting from 0. I.e., S0 is 0, S1 is 1, etc.
  3. A non-nullary constructor Ci t0 ... is stored as a pointer to a struct of the form { i32, t0, ... }, where the first entry is a word-size integer tag word. C0 has tag 0, C1 has tag 1, etc. Note that the tag space of boxed constructors is completely separate from the space of unboxed constructors.
  4. If there is only one non-nullary constructor, the tag word is omitted from the boxed representation.
  5. If there is only one non-nullary constructor, no nullary constructors, and the single constructor has only one nonzero-size argument, that argument is stored unboxed.

There are several optimizations that we are not currently performing but may add in future:

  1. Smaller unboxed tags: If all constructors are nullary, tags could be stored in a single byte.
  2. Smaller boxed tags: In some cases, smaller tags can be used for boxed constructors, but this is harder due to alignment.
  3. Tag/pointer compression: Due to alignment restrictions on pointers, it is possible to compress a small number of tags into the low bits of the pointer.
  4. Maybe optimization: If there is a single boxed constructor with a single argument, and that argument is always a pointer, the constructor could be unboxed even if there are other nullary constructors. This optimization avoids the extra level of boxing for Maybe (Int,Int) and similar types.

Additional notes:

  1. The zero size argument removal rules mean that many types that should be isomorphic in memory actually are. For example, Maybe Int and Either () Int have identical representations.
  2. If we implement the Maybe optimization, we will force the “nullability” of a type to become a covariant property so that Maybe continues to be a covariant type constructor.
  3. In order to unambiguously distinguish between boxed and unboxed values, we restrict the number of unboxed type constructors per type to 256 and assume that pointers never point into the first 256 bytes of address space. This assumption is not always reasonable; it can fail on severely memory constrained architectures (like Anton). We’ll deal with that when it comes up.
Clone this wiki locally