Sunday, December 12, 2010

Why ConcurrentHashMap does not support null values

While doing some performance bench marking, I figured out that I could replace lot of my Hashmaps into ConcurrentHashMap. Now ConcurrentHashMap does not support null values but HashMaps do, so I had to do this extra checks for handling null values, but that was pretty easy in my application.

I was wondering why ConcurrentHashmaps did not support null. A quick look at the code showed, the put method in ConcurrentHashMap throws an explicit NullPointerException after a null check on value.

Listing 1
1. public V put(K key, V value) {
2.   if (value == null)
3.     throw new NullPointerException();
4.   int hash = hash(key.hashCode());
5.   return segmentFor(hash).put(key, hash, value, false);
6. }


Well probably that meant, the passed value could be accessed inside the ConcurrentHashMap without expecting it to be null at any point, during the code flow through the ConcurrentHashMap. I checked the usage of passed in value in ConcurrentHashMap. There was no evidence which would show a possibility of NullPointerException occuring, and hence I was tempted to intrigue more.

So I tweeted on twitter 'Why can't Concurrent Hash Maps take null values ?', hoping that someone will reply back with an answer. I got none till date.

A quick google search did not give me any reliable answers and hence I decided to check it out myself. I created a new class ConcurrentHashMap in my test project (using eclipse), under the package, java.util.concurrent. I commented out the null check in put (removed line 2 and 3 in Listing 1). I wrote a small test program (Multi-Threaded) to do some reads and writes, and ran the program. (I ran the program pre pending my custom java.util.concurrent.ConcurrentHashMap class into the bootstrap class loader, using vm argument -Xbootclasspath/p:). Surprisingly I did not get any errors and everything was fine.

I decided to check the performance at this point.

I wrote a sample program with two threads in an infinite loop. One thread was reading (Listing 2) and the other one was writing (Listing 3) onto the same ConcurrentHashMap.

Listing 2. Reader
1.  int count = 0;
2.  Date startDate = new Date();
3.  while (true) {
4.    ++count;
5.    chm.get("test");
6.    if (count % 1000000 == 0) {
7.      Date endDate = new Date();
8.      System.out.println("Get : " + (endDate.getTime() - startDate.getTime()));
9.      startDate = endDate;
10.   }
11. }

Listing 3. Writer
1.  count = 0;
2.  startDate = new Date();
3.  while (true) {
4.    ++count;
5.    chm.put("test", "null");
6.    if (count % 1000000 == 0) {
7.      Date endDate = new Date();
8.      System.out.println("Put : " + (endDate.getTime() - startDate.getTime()));
9.      startDate = endDate;
10.   }
11. }

This test program was run under two scenarios :

case 1 : With the modified ConcurrentHashMap (with null check on value removed) and Line 5 in Listing 3 being chm.put("test", "null");

case 2 : With the unmodified ConcurrentHashMap (in rt.jar) and Line 5 in Listing 3 changed to chm.put("test", "test");

The numbers are the time taken in millis for 1 million iteratons of each loop shown in Listing 2 and 3.

Case 2.              Case 1. 

Get : 47      Put : 313
Get : 62      Get : 344
Put : 140     Put : 297
Get : 47      Get : 328
Get : 47      Put : 312
Get : 62      Get : 328
Put : 125     Put : 281
Get : 47      Get : 313
Get : 47      Put : 313
Put : 125     Get : 343
Get : 47      Put : 312
Get : 47      Get : 344
Get : 47      Put : 313
Put : 125     Get : 219
Get : 47      Get : 125
Get : 46      Put : 468
Get : 47      Get : 265
Put : 125     Put : 282
Get : 47      Get : 344
Get : 47      Put : 281
Put : 125     Get : 344
Get : 47      Put : 266
Get : 47      Get : 359
Get : 47      Put : 281
Put : 125     Get : 297
Get : 47      Put : 297
Get : 46      Get : 328
Get : 47      Put : 297
Put : 125     Put : 281
Get : 32      Get : 328
Get : 46      Put : 281
Put : 110     Get : 329
Get : 47      Put : 297
Get : 47      Get : 312
Get : 47      Put : 281
Put : 140     Get : 328
Get : 47      Put : 297
Get : 47      Get : 328
Get : 47      Put : 313

Results were interesting and it clearly showed that in worse case, gets would be 5 to 10 times slower with a ConcurrentHashMap supporting Null than the original one. The puts were 3 to 5 times slower. Hence its a no go for ConcurrentHashMap to support null values in its current state.

Reasoning :

The method get in Segment implementation of a ConcurrentHashMap takes a lock on the segment itself, if the value of a HashEntry turns out to be null. This is illustrated in Listing 4.

Listing 4.
1.  V get(Object key, int hash) {
2.    if (count != 0) { // read-volatile
3.      HashEntry e = getFirst(hash);
4.      while (e != null) {
5.        if (e.hash == hash && key.equals(e.key)) {
6.          V v = e.value;
7.            if (v != null)
8.              return v;
9.          return readValueUnderLock(e); // recheck
10.       }
11.       e = e.next;
12.     }
13.   }
14.   return null;
15. }
Now if ConcurrentHashMap does not allows null values then how can the HashEntry have a null value (value is marked volatile in HashEntry). Consider a scenario while a thread may be trying to put a new value on the HashEntry (line 22 in Listing 5) in the put method of the ConcurrentHashMap. The HashEntry object is created but not yet initialized, so that value attribute in HashEntry does not reflects its actual value, but instead reflects null. At this point a reader gets the HashEntry and reads a null for attribute value in HashEntry, thus having a need to recheck with a lock (line 9. Listing 4.).


Listing 5.
1.  V put(K key, int hash, V value, boolean onlyIfAbsent) {
2.    lock();
3.    try {
4.      int c = count;
5.      if (c++ > threshold) // ensure capacity
6.        rehash();
7.      HashEntry[] tab = table;
8.      int index = hash & (tab.length - 1);
9.      HashEntry first = tab[index];
10.     HashEntry e = first;
11.     while (e != null && (e.hash != hash || !key.equals(e.key)))
12.       e = e.next;
13.     V oldValue;
14.     if (e != null) {
15.       oldValue = e.value;
16.       if (!onlyIfAbsent)
17.         e.value = value;
18.       } else {
19.         oldValue = null;
20.         ++modCount;
21.         tab[index] = new HashEntry(key, hash, first, value);
22.         count = c; // write-volatile
23.       }
24.     return oldValue;
25.   } finally {
26.   unlock();
27. }
28.  }

This extra check in Line 9 of Listing 4 is very costly (as we already see it in the test results above) and is avoided if a not null value of HashEntry is encountered. In case the null values are allowed this would require to have this lock acquired each time a null value is accessed, hence making the ConcurrentHashMap slower.

Friday, December 10, 2010

ArrayList contains() and LinkedList get() gets slower

Its very important to realize when to use what kind of data structure. It is important to understand how similar data structures differ while performing same operations. For e.g. a List interface is implemented by LinkedList as well as the ArrayList, but they widely differ in terms of the implementation. Due to the different practical scenarios, it is often required to use different implementations for the same interface. One should be very careful in choosing the right implementaion for his use.


While debugging a problem in a real system, I realized that the system was much slower in a particular usecase.


The characteristic of the application was as below :


1. The application created say 1 million objects, sequentially one after another per day.

2. A linked list was used as a window to access (say top n) this huge number of objects.

3. The linked list window was of fix size and represented the latest produced objects.


The use case was to access this list and compute some real time numbers for applications use. This is how the list was used.

int sum = 0;
for(int i=0; i
  Object obj = list.get(i);
  sum = sum + obj.value();
}
// sum was used further by the application.


Initially, this may look like a simple case, which would not have any problem. But this is a big performance problem with LinkedList. Well this may also seem to be a very trivial problem to many, but its a very natural programming error which could happen.

This problem may not be visible if the list is short, or if the list is an ArrayList, but here in this case, it was a LinkedList and problem accentuated only with larger set of data. Below is the code Listing from java.util.LinkedList.

Listing 1.
public E get(int index) {
        return entry(index).element;
    }

private Entry entry(int index) {
        if (index < 0 || index >= size)
            throw new IndexOutOfBoundsException("Index: "+index+
                                                ", Size: "+size);
        Entry e = header;
        if (index < (size >> 1)) {
            for (int i = 0; i <= index; i++)
                e = e.next;
        } else {
            for (int i = size; i > index; i--)
                e = e.previous;
        }
        return e;
    }

As you can see, in order to get an element at index i, the LinkedList has to iterate over all its elements till it reaches the element at location i.

With the ArrayList, the implementation is different. ArrayList maintains an object array to store its content. Whenever an object is requested at the given index, it just has to get it from this object array. This operation is fast and constant with number of objects in terms of time.

The problem with the LinkedList increases with the number of objects and depends on the location of the object in the list. Below is a test program which I wrote just to get some approximate figures of this slowness.

Listing 2.
int marker = 1000000;
List list = new LinkedList();
Date startDate = new Date();
for (int i = 0; i < 6000000; ++i) {
    list.add(i);
    if (i % marker == 0) {
    Date endDate = new Date();
    System.out.println("put " + (i / marker) + "  " + (endDate.getTime() - startDate.getTime()));
    startDate = endDate;
    }
}
startDate = new Date();
marker = marker / 100;
for (int i = 0; i < 6000000; ++i) {
    list.get(i);
    if (i % marker == 0) {
    Date endDate = new Date();
    System.out.println("get " + (i / marker) + "  " + (endDate.getTime() - startDate.getTime()));
    startDate = endDate;
    }
}

I ran the program and plotted the number of get calls against the time taken as shown below :



Here you can see that initially the curve is steep and then its almost at a slope of 1, which is kinda expected. The idea to take away is that the time taken to search an element in the linked list, in the worse case, linearly increases with the number of elements presnet in the list.

When the list implementation is replaced with ArrayList, then the get calls are not even noticable and time difference you get in the loops is 0 milli seconds.  Against the huge numbers we get in case of the LinkedList.

However the same problem occurs with ArrayList when its method contains() is used. The method contains() have the similar implementation as LinkedList's get() method. Hence its very important and useful to know the complete detailed implementation of the datastructure, before using it.

Sunday, November 21, 2010

Hibernate : Pre Database Opertaion Event Listeners



Hibernate provides two (logical) ways of interfering with the operations performed on entities. It is either through interceptors and/or its the event handling mechanism. Just as a point, as it initially sounds event handling mechanism of hibernate is not asynchronous, rather it is synchronous and same as interceptors. Interceptors are internally used by the default event listeners. Event Listener is a modern and basic infrastructure of hibernate used by the interceptors too.

The differences are not much and most of the common needs could be full filled by either using any one of them or a combination of both interceptors or event listeners. Its not actually quite clear when one should use which mechansim. I would personally suggest to stick with Event Listeners as they are much more flexible and robust. This is a very broad area of discussion and here we focus our discussion only on some specific types of event listeners ( pre database operations ).

Hibernate provides an event handling framework for various kind of methods on the session interface. Whenever an operation is performed through the session, an event is generated and a corresponding listener is evoked to take some action on the generated event. These events could be events on entity, like on load, pre load, pre insert, post insert etc. (Look at the subclasses of AbstractEvent class to get a complete list of all events). These events could be divided into two basic categories, pre database operations and post database operations.

For me pre database operation events were of major interest because I needed a general mechanism to stamp ,update and insert user and time, on all my persistent objects.

All the events are subclasses of org.hibernate.event.AbstractEvent. The parent of all pre database operation events is the class AbstractPreDatabaseOperation. This has three children PreUpdateEvent, PreInsertEvent and PreDeleteEvent. Here after, we will be focusing our discussion on the usage of PreUpdate and PreInsert Events.

Given : The problem was to add insert and update user and timestamp information to all of my entities.

Approach 1 :

____________________________________________________________________


public class MyPreInsertEventListener implements PreInsertEventListener {
     @Override
     public boolean onPreInsert(PreInsertEvent event) {
          Object object = event.getEntity();
          if (object instanceof Auditable) {
               Auditable entity = (Auditable) object;
               if (entity.getInsertUser() == null) {
               String currentUser = UtilityClass.currentUser();
               Date currentTime = new Date();
               // inserts
               entity.setInsertUser(currentUser);
               entity.setInsertDateTime(dayTime);
               }
          }
          return false;
     }
}

public class MyPreUpdateEventListener implements PreInsertEventListener {
     @Override
     public boolean onPreUpdate(PreUpdateEvent event) {
          Object object = event.getEntity();
          if (object instanceof Auditable) {
               Auditable entity = (Auditable) object;
               String currentUser = UtilityClass.currentUser();
               Date currentTime = new Date();
               // updates
               entity.setUpdateUser(currentUser);
               entity.setUpdateDateTime(currentTime);
          }
          return false;
     }
}

____________________________________________________________________

Now when the entity object inside the event is updated in the pre listener, the expectation is that the updated information is persisted in the database too. But this does not happen and for a given entity you would see these fields being null (or unchanged) on the database. So where is the problem ?

The idea of the pre database operation event listeners was probably not to change the entities itself, but to perform some operations around the state of the entitiy, just prior to its insert or update into the database. For e.g. you could get the name of the class of an entity and decide whether the given user could update this entity or not and you could throw an exception on an illegal access. The idea is to save the state of the entity at the time of the commit and not take into account any of the changes made to this entity object after the commit (for e.g. here we set insert and update information of entities in our listeners).

These pre events contain variety of information around the entity. The ones in which we are interested are listed below.

1. The pre events have a object array called 'state'. These are values of all the attributes of a given entity at the time of calling commit.
2. The entity object itself.
3. The entity persister used by the hibernate to perform operations on the given entity.
4. The event source (which is the assosiated session in the current context).

At this stage, the entity object assosiated with the event could be thought of as a detached object. Any changes made to this object will never be reflected on the database. The state that will finally be reflected in the database after the commit is contained in the 'state' object array.

In order to commit any changes to the entity object, you could get hold of the session object and use it to save changes to the database. This could be done as follows :

Approach 2.
____________________________________________________________________


public class MyPreUpdateEventListener implements PreInsertEventListener {
     @Override
     public boolean onPreUpdate(PreUpdateEvent event) {
          Object object = event.getEntity();
          if (object instanceof Auditable) {
               Auditable entity = (Auditable) object;
               String currentUser = UtilityClass.currentUser();
               Date currentTime = new Date();
               // updates
               entity.setUpdateUser(currentUser);
               entity.setUpdateDateTime(currentTime);
               Transaction txn = event.getSession().beginTransaction();
               event.getSession().save(entity);
               txn.commit();

          }
          return false;
     }
}

____________________________________________________________________

I read this solution in some thread on hibernate forum. This approach may work in many cases but it may fail in equal number of cases (or more). This could also result into recursive calls to this onPreUpdate method and thus resulting into stackoverflow exception. I won't be discussing this issue over here. This would be a working but not a clean and standard solution. Personally I would not recommend this approach until I know all possible side effects the different possible scenarios could cause.

In order to make changes to entities, the right approach would be to make changes to the object array 'state' present in the associated event. The 'state' is an object array containing values of attributes of an entitiy and hence it would be difficult to know and replace the correct value. But fortunately these values are in a particular order and this order does not change.

The entity persister has an entitymodel which contains a lot of information about the entity. For e.g. it contains an array called propertyNames. This array has the propertyNames in the same order as the values of the properties present in the 'state' array in event. Hence our modified code would look like :

Approach 3.
____________________________________________________________________


public class MyPreInsertEventListener implements PreInsertEventListener {
     @Override
     public boolean onPreInsert(PreInsertEvent event) {
          Object object = event.getEntity();
          if (object instanceof Auditable) {
               Auditable entity = (Auditable) object;
               if (entity.getInsertUser() == null) {
                    String currentUser = UtilityClass.currentUser();
                    Date currentTime = new Date();
                    String[] propertyNames = event.getPersister().getEntityMetamodel.getPropertyNames();
                    Object[] state = event.getState();

                    // inserts
                    setValue(state, propertyNames, "insertUser", currentUser, entity);
                    setValue(state, propertyNames, "insertTime", currentTime, entity);
               }
          }
          return false;
     }
}

public class MyPreUpdateEventListener implements PreInsertEventListener {
     @Override
     public boolean onPreUpdate(PreUpdateEvent event) {
          Object object = event.getEntity();
          if (object instanceof Auditable) {
               Auditable entity = (Auditable) object;
               String currentUser = UtilityClass.currentUser();
               Date currentTime = new Date();
               String[] propertyNames = event.getPersister().getEntityMetamodel().getPropertyNames();
               Object[] state = event.getState();

               // updates
               setValue(state, propertyNames, "updateUser", currentUser, entity);
               setValue(state, propertyNames, "updateTime", currentTime, entity);
          }
     return false;
     }
}

A common method in both the classes.

void setValue(Object[] currentState, String[] propertyNames, String propertyToSet, Object value, Object entity) {
     int index = ArrayUtils.indexOf(propertyNames, propertyToSet);
     if (index >= 0) {
          currentState[index] = value;
          } else {
               Log.error("Field '" + propertyToSet + "' not found on entity '" + entity.getClass().getName() + "'.");
          }
}

____________________________________________________________________

This solution will work in most of the cases, however there is still one case which is left out and will fail. Here is the trick described below :

Hibernate generates a prepared statement and fills in the parameters from the 'state' array present in the event. Hence any changes made to the this 'state' array are reflected in the sql statement generated by the hibernate and finally on the database. The insert and update events have a different copy of this states array.

The pre insert listener is called before the pre update event (if an insert as well as update happens). This happens when an entity is created, persisted and then modified in the same transaction. This will result into two seperate sql statements, first will be an insert statement and second one will be an update statement, on the same entitiy. With the insert statement as we set only the insertUser and insertTime in our PreInsertEventListener and not updateUser and updateTime. The generated statement will look like

insert into entity (id, .... , insert_user, insert_time, update_user, update_time) values (1, .... 'test', '21.11.2010 16:10:00', null, null)

with the PreUpdateEventListener the update sql generated will be like

update entity set id=1 .... , insert_user=null, insert_time=null, update_user='test', update_time='21.11.2010 16:10:00'

These two sqls will be generated in the same transaction and one after the other. The effect will be that the update sql will override the values in the insert sql and hence insert user and time will always be null in such cases. In order to avoid this, I modified the code as follows :

Approach 4.
____________________________________________________________________


public class MyPreInsertEventListener implements PreInsertEventListener {
     @Override
     public boolean onPreInsert(PreInsertEvent event) {
          Object object = event.getEntity();
          if (object instanceof Auditable) {
               Auditable entity = (Auditable) object;
               if (entity.getInsertUser() == null) {
                    String currentUser = UtilityClass.currentUser();
                    Date currentTime = new Date();
                    String[] propertyNames = event.getPersister().getEntityMetamodel().getPropertyNames();
                    Object[] state = event.getState();
                    // inserts
                    entity.setInsertUser(currentUser);
                    entity.setInsertDateTime(dayTime);
                    setValue(state, propertyNames, "insertUser", currentUser, entity);
                    setValue(state, propertyNames, "insertTime", currentTime, entity);
               }
          }
          return false;
     }
}

public class MyPreUpdateEventListener implements PreInsertEventListener {
     @Override
     public boolean onPreUpdate(PreUpdateEvent event) {
          Object object = event.getEntity();
          if (object instanceof Auditable) {
               Auditable entity = (Auditable) object;
               String currentUser = UtilityClass.currentUser();
               Date currentTime = new Date();
               String[] propertyNames = event.getPersister().getEntityMetamodel().getPropertyNames();
               Object[] state = event.getState();
               // inserts
               setValue(state, propertyNames, "insertUser", entity.getInsertUser(), entity);
               setValue(state, propertyNames, "insertTime", entity.getInsertDateTime(), entity);

               // updates
               entity.setUpdateUser(currentUser);
               entity.setUpdateDateTime(currentTime);
               setValue(state, propertyNames, "updateUser", currentUser, entity);
               setValue(state, propertyNames, "updateTime", currentTime, entity);
          }
          return false;
     }
}

____________________________________________________________________

The entity object is common and shared between the two events (insert and update). However, each event has its own copy of 'state' array. Here I use this fact in my favour to pass in the insert information between the two event listeners through the entity object itself. Hence in the update event listener I do reset the insert information passed in from the insert event listeners and hence the generated sqls from the two listeners would look like :

insert into entity (id, .... , insert_user, insert_time, update_user, update_time) values (1, .... 'test', '21.11.2010 16:10:00', null, null)

update entity set id=1 .... , insert_user=test, insert_time='21.11.2010 16:10:00', update_user='test', update_time='21.11.2010 16:10:00'

This solution works fine and I have'nt seen yet, any problem or case which could not have been handled. However I would like to conclude by saying not to use Event listeners for any custom auditing. Envers comes bundled up with Hibernate 3.5.6 and beyond. Envers is an excellent framework for auditing and I have been using it successfully.

Tuesday, November 02, 2010

Excessive Full Garbage Collection : Reflection

Excessive Full Garbage Collection without any memory pressure, could it be a Reflection problem ?

I recently noticed on a production deployment that it had frequent full garbage collection cycles with only around 50% of heap utilizations. This could happen because of several reasons and here I will be explaining the reason behind this particular observation and some of my findings in this area. I had to intrigue some of the sun classes. While some googling I found them located at docjar . Hence I am not really sure of the version, and if these observations will be valid in future. Its all tested and verified on latest java version 1.6.0 update 22.

Object Accessors

An object is accessed using reflection in three ways :

1. An object is either initialized by calling newInstance method on the Class object. This needs a Constructor Accessor.
2. An object is accessed for its Methods which invloves a Method Accessor.
3. An object is accessed for its Fields which envolves a Field Accessor.

The field of an object is accessed using an instance of the Field class which internally involves a Field Accessor. The Field Accessor involves the unsafe object which in turn uses Atomic apis to access (get/set) the fields of an object. These are kind of direct memory operations through the virtual machine. We wont be discussing the Field Accessors in details.

Different type of accessors

The method accessor and constructor accessors are either native or generated. This means that either we use NativeMethodAccessorImpl or GeneratedMethodAccessor for Methods and NativeConstructorAccessorImpl and GeneratedConstructorAccessor for Constructors.

The accessor would be a native or generated and is controlled and decided by two system properties.

1. -D sun.reflect.noInflation = false (default value is false)
2. -D sun.reflect.inflationThreshold = 15 (default value is 15)

When the sun.reflect.noInflation is set to true then the accessor used will always be generated and there is no meaning for the system property sun.reflect.inflationThreshold. When the sun.reflect.noInflation is false and the sun.reflect.inflationThreshold is set to 15 (thats the default behavior if not specified) then it means that for the first 15 accesses to the constructor (or methods), a native generator will be used and thereafter a generated accessor will be supplied (from ReflectionFactory) for use.

The Native accessor uses native calls to access information, whereas the generated accessor is all byte code and hence very fast. On the other hand generated accessor needs time to instantiate and load (basically inflates and hence the system properties controlling it has names including 'inflation' word). Following is the extract which I found on a source file at docjar :

"Inflation" mechanism. Loading bytecodes to implement Method.invoke() and Constructor.newInstance() currently costs 3-4x more than an invocation via native code for the first invocation (though subsequent invocations have been benchmarked to be over 20x faster). Unfortunately this cost increases startup time for certain applications that use reflection intensively (but only once per class) to bootstrap themselves. To avoid this penalty we reuse the existing JVM entry points for the first few invocations of Methods and Constructors and then switch to the bytecode-based implementations."

Generated Accessors


Generated accessors for constructors and methods behave the same and are treated in a similar manner, hence here forth I will talk about constructor accessors and everthing about them is applicable to method accessors as well.

The generated accessors are the objects of the classes generated during runtime. The name of a generated accessor's class would look something like "sun.reflect.GeneratedConstructorAccessor11" or "sun.reflect.GeneratedSerializationConstructorAccessor12". These are kind of runtime proxies used by reflection to access object information which is stored in the memory. These accessors are cached so that they dont have to be regenerated each time. for e.g. ConstructorAccessors are cached in a given Constructor of a Class Object. These class objects are singleton per classloader, and hence we have only one instance of a generated accessor per classloader of a given Class. This means that we would have same number of accessors as the number of classes present in the deployment. Same goes for the method accessors as well. All these classes reside in the permgen and hence we need to make sure that the size of our permanent generation (perm gen) is large enough to accomodate these classes. These generated accessors are created lazily on demand and hence an application using reflection heavily and widely would need a larger permanent generation.

If the perm gen size is less and our application uses a lot of relection code then over the time perm gen can explode and hence one can get a OutOfMemoryError. Hence for this reason (probably) serialization deserialization process uses an ObjectOutputStream which relies on ObjectStreamClass to get these accessors. There is one ObjectStreamClass instance per Class object present in the VM. These instances of ObjectStreamClass are cached inside this class itself by soft references. Hence when the perm gen reaches a its limit the garbage collector runs and clears all these soft references, thereby unloading these dynamically generated classes from the virtual machine. These classes are recreated and loaded back into the virtual machine when they are again accessed during serialization and deserialization process.

This behaviour of loading and unloading of dynamically generated classes is frequent and evident in applications involving heavy use of serialization and deserialization (rmi client server model or any kind of non-web client server application or jms based application). The unloading is done when the perm gen is under memory pressure and garbage collection runs and clears the soft references to the ObjectStreamClass.

In a nutshell, when talking about an application using reflection and serialization-deserialization widely (widely refers to, using across most of the classes, and not limited to heavy usage on few classes), we could get an OutOfMemoryError. This could be avoided by increasing the perm size, but in the later case of serialization-desrialization a frequent GC in perm space could be observed. This is due to the clearance of the soft references to ObjectStreamClass objects, becuase of the memory pressure on perm space. An optimum perm size would not see frequent full garbage collection cycles without memory pressure and hence would not see frequent unloading of classes.

Generally in such a scenario, virtual machine spits out log messages such as :

Unloading class sun.reflect.GeneratedSerializationConstructorAccessor1
Unloading class sun.reflect.GeneratedSerializationConstructorAccessor2
Unloading class sun.reflect.GeneratedSerializationConstructorAccessor3
....

Ofcourse this problem could be avoided by increasing the size of the perm gen, but you really dont know how much and this is learnt by regularly profiling the application till you see no frequent unloading of generated classes in perm gen. (I used yourkit profiler to observe this behaviour).

I wrote a simple java class using the compiled sources of the sun classes I got from docjar. Basically I generated Constructor Accessors in a loop, that means continously generating dynamic classes at runtime. I profiled this run and took a few screenshots from the yourkit profiler.

My Java code looked like following :
____________________________________________________________________

public class Test implements Serializable {

     public static void main(String[] args) throws Exception {
          while(true) {
          ReflectionFactory1 reflectionFactory = ReflectionFactory1.getReflectionFactory();
          ConstructorAccessor constructorAccessor = reflectionFactory.newConstructorAccessorDummy(Test.class.getDeclaredConstructor(null));
          constructorAccessor.newInstance(null);
          System.out.println(constructorAccessor.getClass().toString());
          }
     }
}
____________________________________________________________________

ReflectionFactory1 has the same source code as ReflectionFactory except a new method :
____________________________________________________________________

public ConstructorAccessor newConstructorAccessorDummy(Constructor c) {


     return new MethodAccessorGenerator().generateConstructor(c.getDeclaringClass(), c.getParameterTypes(), c.getExceptionTypes(), c.getModifiers());

}

____________________________________________________________________

ReflectionFactory1 was created in the package sun.refelect (in order to compile it) and was included in bootstrap class loader in order to run the Test app. The JVM parameters were as follows :

-Dsun.reflect.inflationThreshold=15
-Dsun.reflect.noInflation=true
-Xbootclasspath/a:./classes/java
-Xmx32m
-Xms32m
-XX:PermSize=6m
-XX:MaxPermSize=6m
-agentpath:"C:\Program Files\YourKit Java Profiler 9.0.5\bin\win32\yjpagent.dll"

Below is the snapshot from the yourkit. The non-heap memory graph in the center represents the permanent generation. Here as we see, because of memory pressure on perm gen, frequent full GC cycles occurs. This leads to regular unloading of generated classed as seen in the graph on the right.


Below graph shows the time spent in full GC is regular but less. This is due to the fact that we dont have any application objects and the heap is very small. Imagine in a real application with large heap and lots of application objects, the frequent garbage collection will mean frequent scanning of large filled heap and hence would be costly in terms of time and CPU.



In the graph shown below, CPU utilization is about 50%. This is due to the infinite while loop. As discussed, in case of large heap and more application objects, this kind of high CPU utilization could also come from GC activity .




Finally, I would suggest not to ignore perm gen and monitor it. As seen, the frequent GCs without full memory of the heap being used, could result in high CPU utilization. This could also cause continouus jitters in an application.

Sunday, October 31, 2010

Classpath setting in an executable jar file


Recently while working for a customer project, I found out that its quite handy to package the product (application) jar as a single executable jar file. The jar file has a MANIFEST file and all the related application class files. In the manifest file I could add the location of all third party libraries and the main class as follows :

Manifest File :

Main-Class : com.x.y.z.MainServer
Class-Path : . ../config ../lib/activation.jar ../lib/activemq-core.jar ../lib/ojdbc.jar ......

This was perfect and I could start the application by firing the following command :

Command : "java -jar application.jar"

The application has classpath dependency on the config folder (property and other configuration files for the application) and the third party jars it used. This all was now specified in the Manifest and I had the cleanest deployment.

Now, this being a databse independent product, there was a requirement not to add the driver jars (default was the ojdbc.jar) in the Manifest and hence not ship it with the product. The customers wanted to specifiy there own tested drivers for their specific databases.

Solution 1 :

The obvious solution which comes into mind is removing driver jar (ojdbc.jar in this case) from the manifest and specifying it in the class path like :

Manifest File :

Main-Class : com.x.y.z.MainServer
Class-Path : . ../config ../lib/activation.jar ../lib/activemq-core.jar ......

Command : "java -cp ../lib/ojdbc.jar -jar application.jar"
or
Command : "java -classpath ../lib/ojdbc.jar -jar application.jar"

The ojdbc.jar could be easily replaced by a custom customer specific driver.

But this is not possible and you would end up with a error such as Driver's ClassNotFoundException . Because when you execute a jar file you can not override or append the classpath present in the manifest. In a nutshell '-cp' and '-classpath' options are ignored by the vm with '-jar' option.

Solution 2 :

You remember the basics of classloaders then you still have access to the bootstrap classloader and extension classloader. These are searched when the classes are missing, through your application classloader (the others being its parent).

1. To add this additional jar to extension classpath, you have to place it in the /ext folder under your java installation. This is not an ideal behaviour for a production setup.

2. To add a jar file to the bootstrap classloader, you have to execute the application with the following command :

Command : "java --Xbootclasspath/a:../lib/ojdbc.jar -jar application.jar"

This will take into account the application classpath from the manifest inside the application.jar and also add the ojdbc.jar into the bootstrap classloader's classpath.

But again, this could potentially conflict with some third party libraries you use which could potentially in turn instrument your application classes at runtime. Personally, I would not recommend this as a healthy approach.

Solution 3 :

The Manifest file would not have a Main-Class defined but just the Class-Path as follows :

Manifest File :

Class-Path : . ../config ../lib/activation.jar ../lib/activemq-core.jar ......

The command used to run application would be :

Command : "java -cp ../lib/application.jar:../ojdbc.jar com.x.y.z.MainServer"

This works and this is what I used. Not the cleanest of all but possibly the best solution available.

The third party jars defined in the application.jar file's Manifest, are still picked up by the VM. The application.jar is in the class path and resolution of dependencies of application classes is also searched through the classpath defined in its manifest.

In a nutshell, the classpath defined in Manifests of the jar files in the application classpath is appended to the application classpath.