If you are familiar with Java objects, you may already know that while they provide automatic memory management there is a cost associated with it.
In this article, we are going to take a look at the memory layout of Java objects, find out what the overhead is and show what we can do about it.
When you are first introduced to Java, you find out that each Java primitive type has its associated boxed version and that there is a difference between the two. To illustrate, consider the following arrays:
float[] arrayPrim = new float[1000];
Float[] arrayBoxed = new Float[1000];
Both arrays are capable of storing one thousand floats. Let’s calculate how much memory they use. We know that float uses four bytes. Thus, one thousand floats use four-thousand bytes. Since ‘arrayPrim’ is an object and an array we add 16 bytes for the array object header, resulting in 4016 bytes for ‘arrayPrim’. A boxed version Float is an object, thus we add 12 bytes for object header. A Float then takes 16 bytes. One thousand of these add up to 16000 bytes. The array ‘arrayBoxed’ is an array of references, with four bytes per reference, and one thousand references add up to 4016 bytes along with the array object header. This brings the memory usage of ‘arrayBoxed’ to 20016 bytes.
SEE ALSO: 7 ways to capture Java heap dumps
For our calculation, we have used 4 bytes for objects reference, 12 bytes for objects header, and 16 bytes for array object header (which are typical values present in 64-bit Java) using the default -XX:+UseCompressedOops.
We have seen that boxed arrays take up a lot more memory than their primitive versions and store references to objects. This introduces an extra level of indirection. While programming in Java, we often find ourselves using simple objects which represent points, vectors, matrices and similar. Let’s say we would like to store thousand 2D points:
public class Point { public float x, y;}
Point[] points = new Point[1000];
As with the previous example, Point is an object, and the array ‘points’ is an array of references. Each point has two floats, thus stores 8 bytes of data. We have to add the object header of 12 bytes, which adds up to 20 bytes. We are not done yet. 20 bytes is not aligned to 8 byte boundary, thus we add extra 4 byte padding. Thus, each point takes 24 bytes. In total, array ‘points’ takes 28016 bytes, while the actual data is stored in 8000. The overhead is ~3.5x.
If performance is our top priority, we can store the one thousand points in a float array such as:
float[] pointsXY = new float[2000];
However, by doing so we reduce the code readability and maintainability, not to mention the inability to pass individual points to a function without referencing the array that stores them. What if we are presented with the following point definition?
public class PointId { public int id; public float x, y;}
To store one thousand points within primitive arrays would require creating two arrays: one for integers and another for floats.
Can we store data in Java as compactly as with primitive types yet with class-like syntax?
Project Valhalla
Project Valhalla is an OpenJDK project which aims to bring value types to Java. While Project Valhalla would solve the above mentioned issue for small objects, it is not released yet, and release date is still unknown.
SEE ALSO: Challenges and timelines for Project Valhalla
Project JUnion
With the use of an open source library, we can define PointId as:
@Struct
public class PointId { public int id; public float x, y;}
By adding @Struct annotation, PointId is no longer an object but a struct. We can use structs in similar ways to java objects.
PointId[] points = new PointId[1000];
points[7].x = 10;
PointId p7 = points[7];
An array of thousand points with id takes, ~12040 bytes since PointId is no longer an object but a struct. There are some differences between structs and objects, such as structs do not use constructors. If PointId were an object the above code would throw a NullPointerException.
source:-jaxenter.