Off-heap memory in Java

07 April 2021 / 5 min read

The heap area is one of the most important parts in the JVM architecture since it stores all the objects created in a JVM instance, however, there are some cases when it is convenient to put them outside of it. In this post we will see how this can be achieved and some implementations in this regard.

Introduction

First, let's take a quick look at the JVM architecture.

As we can see the heap is into the Runtime Data Area which contains the areas that are used during the execution of a program, some of them are per thread and others are unique by JVM instance such as the heap. The garbage collector must be taken into account because it is key to understand how memory is managed in the heap.

A formal definition of the heap area is:

The Java Virtual Machine has a heap that is shared among all Java Virtual Machine threads. The heap is the run-time data area from which memory for all class instances and arrays is allocated. The heap is created on virtual machine start-up. Heap storage for objects is reclaimed by an automatic storage management system (known as a garbage collector); objects are never explicitly deallocated. The Java Virtual Machine assumes no particular type of automatic storage management system, and the storage management technique may be chosen according to the implementor’s system requirements. The heap may be of a fixed size or may be expanded as required by the computation and may be contracted if a larger heap becomes unnecessary. The memory for the heap does not need to be contiguous. JVM Specification

Knowing that all the data stored in the heap is subject to the garbage collector, if the application data becomes huge, the time consumed by the garbage collector will be proportionally higher. and here we may be wondering why this may affect us? Well, each garbage collector has a different method to do the heap cleanup but they all have something in common, the Stop-The-World mechanism, which means that at some point all the application threads will be suspended until the garbage collector processes all the objects in the heap.

That said, while the garbage collector algorithms do a great job of cleaning up in super fast time, when we are dealing with near real-time applications we don't have the option of having these pauses, or when the available physical memory is less than needed then that is when dumping that data off the heap is an option.

Off-heap memory

Off-heap memory refers to the memory allocated directly to the operative system, it can be part of the same physical memory or/and disk access based such as memory mapped-files. As putting data out of the JVM, serialization is needed to write and read that data, and the performance will depend on the buffer, serialization process and disk speed (if applicable).

Benefits

  • Reduction of garbage collection pressure.
  • Large memory size, depending on the implementation.
  • Memory shared among all JVMs present in the OS.

Considerations

  • Serialization process impact on the performance
  • Manual memory management is hard and error-prone (ask to C devs 😅).

Usage

The way to use off-heap memory depends on the developers and the business case, either creating an own implementation using Java NIO API that allow us to allocate memory manually or using any of the implementations already in the market.

In this post we will see in a general way a library that implements some of the most common data structures used in java.

  • Chronicle-Map: Chronicle Map is an in-memory, key-value store, designed for low-latency, and/or multi-process applications.

Let's do a simple test in which the scenario is an application that processes a few million numbers and put them in a Set data structure to sum them up afterwards.

  • Project repo: GitHub - off-heap-tests
  • Max Heap size: 2Gb
  • JDK: OpenJDK 64-Bit Server VM Microsoft-18724
  • Physical memory: 16Gb
long sumNumbers(Set<Long> numbers) throws InterruptedException {
    for (int i = 0; i < 30_000_000; i++) {
        numbers.add(random.nextLong());
        if (i % 1_000_000 == 0) Thread.sleep(1000); // To have time to check jconsole
    }
    return numbers.stream().reduce(0L, Long::sum);
}
public static void main(String[] args) throws InterruptedException {
    var start = Instant.now();
    new Main(). executeTest();
    var end = Instant.now();
    var timeMilli = end.toEpochMilli() - start.toEpochMilli();
    System.out.println("Time to get finished in ms: " + timeMilli);
}

HashSet Java implementation

The first test will be using a simple HashSet implementation

void executeTest() throws InterruptedException {
    final Set<Long> set = new HashSet<>();
    System.out.println(sumNumbers(set));
}

We can say It took about 58K ~ 59K milliseconds to get finished.

ChronicleSet implementation

ChronicleSet provides a builder that needs the type and the max amount of entries to allocate the memory based on them.

void executeTest() throws InterruptedException {
    final var set = ChronicleSetBuilder.of(Long.class)
        .entries(30_000_000)
        .create();
      return sumNumbers(set);
}

We can say It took about 52K ~ 53K milliseconds to get finished and also notice about change in the heap memory used and the reduction of GC impact.

Conclusions

Off-heap memory is a good option when the data stored in the heap is huge and we need to reduce the time consumed by the garbage collector, also when an application uses more memory than the available physical and using disk space is an option. However, it is worthy to remind that working directly to memory allocation is not an easy task and it can bring difficult issues to deal with.

In what other cases do you think off-heap memory can be used ?