Memory

We know that a program is a sequence of commands manipulating variables. The values of the variables are stored in the memory. The memory consists these kinds:

  • register (for fast access)
  • the stack
  • a data segment (for static allocation region)
  • the heap(for dynamic allocation region)

Only the stack and the dynamic allocation region can change in size during the execution of a program.

Memory allocation in OCaml

C provides function malloc for explicit allocation. User can ask for a block of memory of certain size.

Object-oriented languages use new to construct class instances. new invocates a constructor for the class. In C++, we define the constructor function in the definition of class, in functional languages, constructor functions are called in places where a structural value (tuple, list, record, vector, or closure) is defined.

The principal dangers of explicit memory reclamation are:

  • dangling pointers: a memory region has been freed while there are still pointers pointing at it. If the region is reused, access to the region by these pointers risks being incoherent.
  • Inaccessible memory regions (a memory ``leak’’): a memory region is still allocated, but no longer referenced by any pointer. There is no longer any possibility of freeing the region. There is a clear loss of memory.

Automatic garbage collection

We classify automatic memory reclamation algorithms into two classes:

  • reference counters: each allocated region knows how many references there are to it. When this number becomes zero, the region is freed.
  • sweep algorithms: starting from a set of roots, the collection of all accessible values is traversed in a way similar to the traversal of a directed graph.

Sweep algorithms are more commonly used in programming languages. In effect, reference counting garbage collectors increase the processing costs (through counter updating) even when there is no need to reclaim anything.

1. Reference Counting

Each allocated region (object) is given a counter. This counter indicates the number of pointers to the object. It is incremented each time a reference to the object is shared. It is decremented whenever a pointer to the object disappears. When the counter becomes zero, the object is garbage collected.

The advantage of such a system comes from the immediate freeing of regions that are no longer used. Aside from the systematic slowdown of computations, reference counting garbage collectors suffer from another disadvantage: they do not know how to process circular objects. Suppose that Objective CAML had such a mechanism. The following example constructs a temporary value l, a list of characters of where the last element points to the cell containing ‘c’.

let rec l = 'c'::'a'::'m'::l in List.hd l ;;

Memory representation of a circular list.

At the end of the calculation of this expression each element of the list l has a counter equal to one (even the first element, for the tail points to the head). This value is no longer accessible and yet cannot be reclaimed because its reference counter is not zero. In languages equipped with memory reclamation via reference counting—such as Python—and which allow the construction of circular values, it is necessary to add a memory sweep algorithm.

2. Sweep Algorithms

1
2
3
4
5
6
let u = let l = ['c'; 'a'; 'm'] in List.tl l ;; 
let v =
let r = ( ['z'] , u ) in
match r with
p -> (fst p) @ (snd p)
;;

Similar to JAVA, OCaml provides a garbage collector. Thus, we do not need to write free( ) to release the allocated memory.

OCAML uses a mechanism called garbage collection (GC) to perform automatic memory management. Memory is allocated at value construction (i.e., when a constructor is applied) and it is freed implicitly. Most programs do not have to deal with the garbage collector directly, since it works transparently behind the scenes. However, garbage collection can have an effect on efficiency for allocation-intensive programs. In such cases, it is useful to control the GC parameters, or even to invoke the collector explicitly. Moreover, in order to interface OCAML with other languages, it is necessary to understand what constraints the garbage collector imposes on data representations.

OCaml’s GC is faster than explicit free in C

Calling operation free, in fact, costs a lot.

minor and major heap

OCaml’s garbage collector has two heaps, the minor heap and the major heap. This recognises a general principle: Most objects are small and allocated frequently and then immediately freed. These objects go into the minor heap first, which is GCed frequently. Only some objects are long lasting. These objects get promoted from the minor heap to the major heap after some time, and the major heap is only collected infrequently.

The OCaml GC is synchronous. It doesn’t run in a separate thread, and it can only get called during an allocation request.