Sunday, November 21, 2010

Hibernate : Pre Database Opertaion Event Listeners



Hibernate provides two (logical) ways of interfering with the operations performed on entities. It is either through interceptors and/or its the event handling mechanism. Just as a point, as it initially sounds event handling mechanism of hibernate is not asynchronous, rather it is synchronous and same as interceptors. Interceptors are internally used by the default event listeners. Event Listener is a modern and basic infrastructure of hibernate used by the interceptors too.

The differences are not much and most of the common needs could be full filled by either using any one of them or a combination of both interceptors or event listeners. Its not actually quite clear when one should use which mechansim. I would personally suggest to stick with Event Listeners as they are much more flexible and robust. This is a very broad area of discussion and here we focus our discussion only on some specific types of event listeners ( pre database operations ).

Hibernate provides an event handling framework for various kind of methods on the session interface. Whenever an operation is performed through the session, an event is generated and a corresponding listener is evoked to take some action on the generated event. These events could be events on entity, like on load, pre load, pre insert, post insert etc. (Look at the subclasses of AbstractEvent class to get a complete list of all events). These events could be divided into two basic categories, pre database operations and post database operations.

For me pre database operation events were of major interest because I needed a general mechanism to stamp ,update and insert user and time, on all my persistent objects.

All the events are subclasses of org.hibernate.event.AbstractEvent. The parent of all pre database operation events is the class AbstractPreDatabaseOperation. This has three children PreUpdateEvent, PreInsertEvent and PreDeleteEvent. Here after, we will be focusing our discussion on the usage of PreUpdate and PreInsert Events.

Given : The problem was to add insert and update user and timestamp information to all of my entities.

Approach 1 :

____________________________________________________________________


public class MyPreInsertEventListener implements PreInsertEventListener {
     @Override
     public boolean onPreInsert(PreInsertEvent event) {
          Object object = event.getEntity();
          if (object instanceof Auditable) {
               Auditable entity = (Auditable) object;
               if (entity.getInsertUser() == null) {
               String currentUser = UtilityClass.currentUser();
               Date currentTime = new Date();
               // inserts
               entity.setInsertUser(currentUser);
               entity.setInsertDateTime(dayTime);
               }
          }
          return false;
     }
}

public class MyPreUpdateEventListener implements PreInsertEventListener {
     @Override
     public boolean onPreUpdate(PreUpdateEvent event) {
          Object object = event.getEntity();
          if (object instanceof Auditable) {
               Auditable entity = (Auditable) object;
               String currentUser = UtilityClass.currentUser();
               Date currentTime = new Date();
               // updates
               entity.setUpdateUser(currentUser);
               entity.setUpdateDateTime(currentTime);
          }
          return false;
     }
}

____________________________________________________________________

Now when the entity object inside the event is updated in the pre listener, the expectation is that the updated information is persisted in the database too. But this does not happen and for a given entity you would see these fields being null (or unchanged) on the database. So where is the problem ?

The idea of the pre database operation event listeners was probably not to change the entities itself, but to perform some operations around the state of the entitiy, just prior to its insert or update into the database. For e.g. you could get the name of the class of an entity and decide whether the given user could update this entity or not and you could throw an exception on an illegal access. The idea is to save the state of the entity at the time of the commit and not take into account any of the changes made to this entity object after the commit (for e.g. here we set insert and update information of entities in our listeners).

These pre events contain variety of information around the entity. The ones in which we are interested are listed below.

1. The pre events have a object array called 'state'. These are values of all the attributes of a given entity at the time of calling commit.
2. The entity object itself.
3. The entity persister used by the hibernate to perform operations on the given entity.
4. The event source (which is the assosiated session in the current context).

At this stage, the entity object assosiated with the event could be thought of as a detached object. Any changes made to this object will never be reflected on the database. The state that will finally be reflected in the database after the commit is contained in the 'state' object array.

In order to commit any changes to the entity object, you could get hold of the session object and use it to save changes to the database. This could be done as follows :

Approach 2.
____________________________________________________________________


public class MyPreUpdateEventListener implements PreInsertEventListener {
     @Override
     public boolean onPreUpdate(PreUpdateEvent event) {
          Object object = event.getEntity();
          if (object instanceof Auditable) {
               Auditable entity = (Auditable) object;
               String currentUser = UtilityClass.currentUser();
               Date currentTime = new Date();
               // updates
               entity.setUpdateUser(currentUser);
               entity.setUpdateDateTime(currentTime);
               Transaction txn = event.getSession().beginTransaction();
               event.getSession().save(entity);
               txn.commit();

          }
          return false;
     }
}

____________________________________________________________________

I read this solution in some thread on hibernate forum. This approach may work in many cases but it may fail in equal number of cases (or more). This could also result into recursive calls to this onPreUpdate method and thus resulting into stackoverflow exception. I won't be discussing this issue over here. This would be a working but not a clean and standard solution. Personally I would not recommend this approach until I know all possible side effects the different possible scenarios could cause.

In order to make changes to entities, the right approach would be to make changes to the object array 'state' present in the associated event. The 'state' is an object array containing values of attributes of an entitiy and hence it would be difficult to know and replace the correct value. But fortunately these values are in a particular order and this order does not change.

The entity persister has an entitymodel which contains a lot of information about the entity. For e.g. it contains an array called propertyNames. This array has the propertyNames in the same order as the values of the properties present in the 'state' array in event. Hence our modified code would look like :

Approach 3.
____________________________________________________________________


public class MyPreInsertEventListener implements PreInsertEventListener {
     @Override
     public boolean onPreInsert(PreInsertEvent event) {
          Object object = event.getEntity();
          if (object instanceof Auditable) {
               Auditable entity = (Auditable) object;
               if (entity.getInsertUser() == null) {
                    String currentUser = UtilityClass.currentUser();
                    Date currentTime = new Date();
                    String[] propertyNames = event.getPersister().getEntityMetamodel.getPropertyNames();
                    Object[] state = event.getState();

                    // inserts
                    setValue(state, propertyNames, "insertUser", currentUser, entity);
                    setValue(state, propertyNames, "insertTime", currentTime, entity);
               }
          }
          return false;
     }
}

public class MyPreUpdateEventListener implements PreInsertEventListener {
     @Override
     public boolean onPreUpdate(PreUpdateEvent event) {
          Object object = event.getEntity();
          if (object instanceof Auditable) {
               Auditable entity = (Auditable) object;
               String currentUser = UtilityClass.currentUser();
               Date currentTime = new Date();
               String[] propertyNames = event.getPersister().getEntityMetamodel().getPropertyNames();
               Object[] state = event.getState();

               // updates
               setValue(state, propertyNames, "updateUser", currentUser, entity);
               setValue(state, propertyNames, "updateTime", currentTime, entity);
          }
     return false;
     }
}

A common method in both the classes.

void setValue(Object[] currentState, String[] propertyNames, String propertyToSet, Object value, Object entity) {
     int index = ArrayUtils.indexOf(propertyNames, propertyToSet);
     if (index >= 0) {
          currentState[index] = value;
          } else {
               Log.error("Field '" + propertyToSet + "' not found on entity '" + entity.getClass().getName() + "'.");
          }
}

____________________________________________________________________

This solution will work in most of the cases, however there is still one case which is left out and will fail. Here is the trick described below :

Hibernate generates a prepared statement and fills in the parameters from the 'state' array present in the event. Hence any changes made to the this 'state' array are reflected in the sql statement generated by the hibernate and finally on the database. The insert and update events have a different copy of this states array.

The pre insert listener is called before the pre update event (if an insert as well as update happens). This happens when an entity is created, persisted and then modified in the same transaction. This will result into two seperate sql statements, first will be an insert statement and second one will be an update statement, on the same entitiy. With the insert statement as we set only the insertUser and insertTime in our PreInsertEventListener and not updateUser and updateTime. The generated statement will look like

insert into entity (id, .... , insert_user, insert_time, update_user, update_time) values (1, .... 'test', '21.11.2010 16:10:00', null, null)

with the PreUpdateEventListener the update sql generated will be like

update entity set id=1 .... , insert_user=null, insert_time=null, update_user='test', update_time='21.11.2010 16:10:00'

These two sqls will be generated in the same transaction and one after the other. The effect will be that the update sql will override the values in the insert sql and hence insert user and time will always be null in such cases. In order to avoid this, I modified the code as follows :

Approach 4.
____________________________________________________________________


public class MyPreInsertEventListener implements PreInsertEventListener {
     @Override
     public boolean onPreInsert(PreInsertEvent event) {
          Object object = event.getEntity();
          if (object instanceof Auditable) {
               Auditable entity = (Auditable) object;
               if (entity.getInsertUser() == null) {
                    String currentUser = UtilityClass.currentUser();
                    Date currentTime = new Date();
                    String[] propertyNames = event.getPersister().getEntityMetamodel().getPropertyNames();
                    Object[] state = event.getState();
                    // inserts
                    entity.setInsertUser(currentUser);
                    entity.setInsertDateTime(dayTime);
                    setValue(state, propertyNames, "insertUser", currentUser, entity);
                    setValue(state, propertyNames, "insertTime", currentTime, entity);
               }
          }
          return false;
     }
}

public class MyPreUpdateEventListener implements PreInsertEventListener {
     @Override
     public boolean onPreUpdate(PreUpdateEvent event) {
          Object object = event.getEntity();
          if (object instanceof Auditable) {
               Auditable entity = (Auditable) object;
               String currentUser = UtilityClass.currentUser();
               Date currentTime = new Date();
               String[] propertyNames = event.getPersister().getEntityMetamodel().getPropertyNames();
               Object[] state = event.getState();
               // inserts
               setValue(state, propertyNames, "insertUser", entity.getInsertUser(), entity);
               setValue(state, propertyNames, "insertTime", entity.getInsertDateTime(), entity);

               // updates
               entity.setUpdateUser(currentUser);
               entity.setUpdateDateTime(currentTime);
               setValue(state, propertyNames, "updateUser", currentUser, entity);
               setValue(state, propertyNames, "updateTime", currentTime, entity);
          }
          return false;
     }
}

____________________________________________________________________

The entity object is common and shared between the two events (insert and update). However, each event has its own copy of 'state' array. Here I use this fact in my favour to pass in the insert information between the two event listeners through the entity object itself. Hence in the update event listener I do reset the insert information passed in from the insert event listeners and hence the generated sqls from the two listeners would look like :

insert into entity (id, .... , insert_user, insert_time, update_user, update_time) values (1, .... 'test', '21.11.2010 16:10:00', null, null)

update entity set id=1 .... , insert_user=test, insert_time='21.11.2010 16:10:00', update_user='test', update_time='21.11.2010 16:10:00'

This solution works fine and I have'nt seen yet, any problem or case which could not have been handled. However I would like to conclude by saying not to use Event listeners for any custom auditing. Envers comes bundled up with Hibernate 3.5.6 and beyond. Envers is an excellent framework for auditing and I have been using it successfully.

Tuesday, November 02, 2010

Excessive Full Garbage Collection : Reflection

Excessive Full Garbage Collection without any memory pressure, could it be a Reflection problem ?

I recently noticed on a production deployment that it had frequent full garbage collection cycles with only around 50% of heap utilizations. This could happen because of several reasons and here I will be explaining the reason behind this particular observation and some of my findings in this area. I had to intrigue some of the sun classes. While some googling I found them located at docjar . Hence I am not really sure of the version, and if these observations will be valid in future. Its all tested and verified on latest java version 1.6.0 update 22.

Object Accessors

An object is accessed using reflection in three ways :

1. An object is either initialized by calling newInstance method on the Class object. This needs a Constructor Accessor.
2. An object is accessed for its Methods which invloves a Method Accessor.
3. An object is accessed for its Fields which envolves a Field Accessor.

The field of an object is accessed using an instance of the Field class which internally involves a Field Accessor. The Field Accessor involves the unsafe object which in turn uses Atomic apis to access (get/set) the fields of an object. These are kind of direct memory operations through the virtual machine. We wont be discussing the Field Accessors in details.

Different type of accessors

The method accessor and constructor accessors are either native or generated. This means that either we use NativeMethodAccessorImpl or GeneratedMethodAccessor for Methods and NativeConstructorAccessorImpl and GeneratedConstructorAccessor for Constructors.

The accessor would be a native or generated and is controlled and decided by two system properties.

1. -D sun.reflect.noInflation = false (default value is false)
2. -D sun.reflect.inflationThreshold = 15 (default value is 15)

When the sun.reflect.noInflation is set to true then the accessor used will always be generated and there is no meaning for the system property sun.reflect.inflationThreshold. When the sun.reflect.noInflation is false and the sun.reflect.inflationThreshold is set to 15 (thats the default behavior if not specified) then it means that for the first 15 accesses to the constructor (or methods), a native generator will be used and thereafter a generated accessor will be supplied (from ReflectionFactory) for use.

The Native accessor uses native calls to access information, whereas the generated accessor is all byte code and hence very fast. On the other hand generated accessor needs time to instantiate and load (basically inflates and hence the system properties controlling it has names including 'inflation' word). Following is the extract which I found on a source file at docjar :

"Inflation" mechanism. Loading bytecodes to implement Method.invoke() and Constructor.newInstance() currently costs 3-4x more than an invocation via native code for the first invocation (though subsequent invocations have been benchmarked to be over 20x faster). Unfortunately this cost increases startup time for certain applications that use reflection intensively (but only once per class) to bootstrap themselves. To avoid this penalty we reuse the existing JVM entry points for the first few invocations of Methods and Constructors and then switch to the bytecode-based implementations."

Generated Accessors


Generated accessors for constructors and methods behave the same and are treated in a similar manner, hence here forth I will talk about constructor accessors and everthing about them is applicable to method accessors as well.

The generated accessors are the objects of the classes generated during runtime. The name of a generated accessor's class would look something like "sun.reflect.GeneratedConstructorAccessor11" or "sun.reflect.GeneratedSerializationConstructorAccessor12". These are kind of runtime proxies used by reflection to access object information which is stored in the memory. These accessors are cached so that they dont have to be regenerated each time. for e.g. ConstructorAccessors are cached in a given Constructor of a Class Object. These class objects are singleton per classloader, and hence we have only one instance of a generated accessor per classloader of a given Class. This means that we would have same number of accessors as the number of classes present in the deployment. Same goes for the method accessors as well. All these classes reside in the permgen and hence we need to make sure that the size of our permanent generation (perm gen) is large enough to accomodate these classes. These generated accessors are created lazily on demand and hence an application using reflection heavily and widely would need a larger permanent generation.

If the perm gen size is less and our application uses a lot of relection code then over the time perm gen can explode and hence one can get a OutOfMemoryError. Hence for this reason (probably) serialization deserialization process uses an ObjectOutputStream which relies on ObjectStreamClass to get these accessors. There is one ObjectStreamClass instance per Class object present in the VM. These instances of ObjectStreamClass are cached inside this class itself by soft references. Hence when the perm gen reaches a its limit the garbage collector runs and clears all these soft references, thereby unloading these dynamically generated classes from the virtual machine. These classes are recreated and loaded back into the virtual machine when they are again accessed during serialization and deserialization process.

This behaviour of loading and unloading of dynamically generated classes is frequent and evident in applications involving heavy use of serialization and deserialization (rmi client server model or any kind of non-web client server application or jms based application). The unloading is done when the perm gen is under memory pressure and garbage collection runs and clears the soft references to the ObjectStreamClass.

In a nutshell, when talking about an application using reflection and serialization-deserialization widely (widely refers to, using across most of the classes, and not limited to heavy usage on few classes), we could get an OutOfMemoryError. This could be avoided by increasing the perm size, but in the later case of serialization-desrialization a frequent GC in perm space could be observed. This is due to the clearance of the soft references to ObjectStreamClass objects, becuase of the memory pressure on perm space. An optimum perm size would not see frequent full garbage collection cycles without memory pressure and hence would not see frequent unloading of classes.

Generally in such a scenario, virtual machine spits out log messages such as :

Unloading class sun.reflect.GeneratedSerializationConstructorAccessor1
Unloading class sun.reflect.GeneratedSerializationConstructorAccessor2
Unloading class sun.reflect.GeneratedSerializationConstructorAccessor3
....

Ofcourse this problem could be avoided by increasing the size of the perm gen, but you really dont know how much and this is learnt by regularly profiling the application till you see no frequent unloading of generated classes in perm gen. (I used yourkit profiler to observe this behaviour).

I wrote a simple java class using the compiled sources of the sun classes I got from docjar. Basically I generated Constructor Accessors in a loop, that means continously generating dynamic classes at runtime. I profiled this run and took a few screenshots from the yourkit profiler.

My Java code looked like following :
____________________________________________________________________

public class Test implements Serializable {

     public static void main(String[] args) throws Exception {
          while(true) {
          ReflectionFactory1 reflectionFactory = ReflectionFactory1.getReflectionFactory();
          ConstructorAccessor constructorAccessor = reflectionFactory.newConstructorAccessorDummy(Test.class.getDeclaredConstructor(null));
          constructorAccessor.newInstance(null);
          System.out.println(constructorAccessor.getClass().toString());
          }
     }
}
____________________________________________________________________

ReflectionFactory1 has the same source code as ReflectionFactory except a new method :
____________________________________________________________________

public ConstructorAccessor newConstructorAccessorDummy(Constructor c) {


     return new MethodAccessorGenerator().generateConstructor(c.getDeclaringClass(), c.getParameterTypes(), c.getExceptionTypes(), c.getModifiers());

}

____________________________________________________________________

ReflectionFactory1 was created in the package sun.refelect (in order to compile it) and was included in bootstrap class loader in order to run the Test app. The JVM parameters were as follows :

-Dsun.reflect.inflationThreshold=15
-Dsun.reflect.noInflation=true
-Xbootclasspath/a:./classes/java
-Xmx32m
-Xms32m
-XX:PermSize=6m
-XX:MaxPermSize=6m
-agentpath:"C:\Program Files\YourKit Java Profiler 9.0.5\bin\win32\yjpagent.dll"

Below is the snapshot from the yourkit. The non-heap memory graph in the center represents the permanent generation. Here as we see, because of memory pressure on perm gen, frequent full GC cycles occurs. This leads to regular unloading of generated classed as seen in the graph on the right.


Below graph shows the time spent in full GC is regular but less. This is due to the fact that we dont have any application objects and the heap is very small. Imagine in a real application with large heap and lots of application objects, the frequent garbage collection will mean frequent scanning of large filled heap and hence would be costly in terms of time and CPU.



In the graph shown below, CPU utilization is about 50%. This is due to the infinite while loop. As discussed, in case of large heap and more application objects, this kind of high CPU utilization could also come from GC activity .




Finally, I would suggest not to ignore perm gen and monitor it. As seen, the frequent GCs without full memory of the heap being used, could result in high CPU utilization. This could also cause continouus jitters in an application.