Efficient Runtime Analysis of Hibernate
February 24, 2009
In a recent article claiming to “demystify” the Hibernate session context & cache a rather expensive (@runtime) and somewhat laborious (@analysis-time) approach was presented to show how the impact of certain operations in the Session interface have on the persistence context & cache components as well as database access patterns.
Note: The approach involved journaling (@runtime) each call along with some primitive (string) parameters and then presenting the entries as a sequenced call tree with the important measurements need for this rather small runtime analysis nested and scattered across different call branches.
But getting the answer to such questions as “Which entities are being added to the persistence context?” and “Which software activities trigger entity additions and database queries?” can be so much easier for both the user and the runtime once you have the right measurement model that is inherently extensible, uniform, and flexible – probes, meters and counters.
Lets start with the counters. Below is a table showing some of the counters maintained by our (@load-time) instrumentation extension libraries for Hibernate.
Note: The
hibernate.add.counttracks the entities added to the persistence context.

The counters are maintained per thread and can be aggregated at the process, host and cluster levels within the management console. We can also mark the counters to monitor the rate of change during a particular measurement interval.
In addition we can add other extension libraries to the class path to count at a contextual level (the class of the entity) as well.
Note: The class of the entity prefixes the
hibernate.*.countcounter.

Now that we know what is happening in terms of our data model lets find out why by simply mapping (via configuration) custom meters to some of the above counters and then enabling probes (activities) at the appropriate application level (Seam Bean).
With the following metering table it is very easy to determine which packages, classes and methods result in entities being added.

We can also perform a similar analysis across all meters mapped to counters.

If you want you can even map a meter to an individual entity.* counter and rename it within the metering model. I actually recently did this for a customer that helped resolve a significant issue with an application in production.
This approach can be applied to other persistence frameworks (a similar set of counters are supplied for JPA) as well as data caches such as Oracle’s Coherence.

The only skill required in effectively using our probes technology is in deciding which software execution points become counters (and possibly mapped resource meters) and which become probes (hierarchical cost/metered groups).
Annotated Java Thread Call Stacks
February 11, 2009
Following my last post in which I presented an alternative (more like complimentary) Java stack trace print I have received a few inquiries offline questioning my view that call stacks are not necessarily that important for runtime diagnostics.
To be honest I am somewhat surprised by this incorrect perception because I have always placed a significant value on the efficient collection of thread call stacks especially when only a small portion of the code base is instrumented. Having a call stack at particular instrumented points in a software’s execution is important which is why in 2002 I made it a core element within our own software & system execution models for both database transaction path analysis and component distributed tracing. This was prior to the introduction of StackTraceElement (@since 1.4) when amazingly some tools were actually using printStackTrace to obtain the calling context.
Unfortunately the benefit of the StackTraceElement class was greatly diminished by its initial class design (and possibly its name) which was probably heavily influenced by the usage of stack traces at the time and possibly the need to transmit them over the wire (RMI?).
Can you see the problem?

The important context fields are Strings. In a typical container environment with a proliferation of ClassLoaders it is practically impossible to accurately obtain access to the declaring java.lang.Class instance. Even if this was possible within a “hello world” like application we still cannot access the java.lang.reflect.Method as there is no parameter list only a line number into a file (allowing for overloaded method names).
When I designed our native profiling agent (initially JVMPI now JVMTI) I set out to ensure that for every frame on the stack we could obtain the Method reference and that the cost in obtaining the call stack from within the runtime itself would be significantly cheaper than any other means available especially for call stacks that appear repeatedly during execution.
What did the agent solution buy us? Well it allowed us to offer the ability to apply filtering to a collected call stack based on class meta data (what it implements, what it extends) and not just on the package/class/method name as offered by other tools. It also allowed use to add meta data to our software execution model which could be used to more effectively visualize call stacks. Here are some screenshots taken from within the product’s management console to illustrate this last point.
Icons: Using the meta data each Class within the model is given one or more classifications each mapped to an icon.
Note: Classifications have a ranking which determines which one is used for rendering when there are multiple classifications assigned.

Signatures: The meta data allows us to display a tooltip for a Method listing its modifiers, parameter types and return type.
Note: You might be surprised to see the
abstractkeyword. This is because forProxyclasses we substitute the actual interfaceClassandMethod. The * denotes this.

Annotations: The meta data allows us to present to the user the Annotations assigned to a Method.

Class & Classloader Hierarchies & CodeSources: The meta data allows us to display the Class & ClassLoader hierarchies for a Method’s declaring Class including the CodeSource of the Class.
Note: The
ejb:sessionclass classification is based on the presence of the @Statefulannotation.

Compressed Call Stack Rendering: With the meta data I can quickly process (visually that is) the following 26 call stack collections immediately identifying the sequence of calls between major classified components/classes.
[Servlet]->[JSF View]->[SessionBean]->[SeamBean]->[JPA Class]

Note: Collecting call stacks is important in creating a software & system execution model but one cannot ignore there is a relatively high cost to the collection even if you are using our agent which significantly out-performs all other available means and tools today. Call stacks should only be collected at coarse grain component boundaries i.e. database calls, out-bound remote calls, messaging endpoints, service/activity entry points, etc.
Better Java Thread Stack Trace Dumps
February 8, 2009
When performing JVM runtime diagnostics the standard (and need I say pretty crude) approach adopted by most working for companies with very little in the way of mature IT management processes and tooling is the thread (or exception) stack trace dump.
There are numerous problems with this approach including the lack of object & argument state information (already solved quite comprehensively) as well as timing & resource usage which I hope to address here.
Lets start with the sample code used to generate the stack trace print outs presented in this entry.
A simple deep recursive call chain execution ending with a wait followed by a stack trace print.

Here is a small extract of a single stack trace printed by the above code.

The problem with the output is that we have no inclination to how long each frame has been on the threads execution call stack and the resources it has consumed since added.
We cannot even determine whether two identical stack traces, printed by the same thread, represent the exact same execution call (frame) stack. Is this a new request at the same execution point or has the code stalled (waited) since the last print?
In addition the stack trace print contains frames that represent noise in our runtime analysis if we were focused on tracking down performance bottlenecks. This can also be the case when analyzing applications deployed to standard middleware containers.
Now lets rewrite the waiting method to use our recently announced stack probes provider extension.

Here is an extract of the stack trace printed.

A few observations. It is in XML format. Each fired and metered Probe (frame) has a unique id within the context of each thread. The wall clock time (in microseconds) is listed as a Meter for each frame on the stack.
Note: With the id we can compare across stack trace prints and performance call stack sampling with a much higher degree of accuracy (no second guessing whether the method indeed remained active during interval).
Did you notice that the intermediary stacks.Dumping.call method is absent in the above output? The reason is the call(int) method had been executed a sufficiently number of times (default=1000) prior to this particular stack print for the resource metering runtime to have already determined that it is not a hotspot (in terms of inherent cost) and does not need to be metered any further. It was fired but not metered.
Note: The runtime allows for different meters to be used in determining whether a particular named probe (method) is a hotspot or not.
Here is another stack trace printed later in the same run.

Now all the stacks.Dumping.$* methods have completely disappeared. We are left with probes fired and metered for methods marked as hotspots or still under analysis by the probes runtime metering strategies.
You have probably already guessed that if you enable additional resource meters within the metering runtime these will also be included in the probe stack prints.

Conventional call stack dumps quickly fill up a log file with a large amount of irrelevant and duplicated data but they are also extremely costly both in terms of wall clock time and CPU usage.

I am not advocating the complete elimination of thread stack trace dumps from logs though I am sure many of us would wish it to be so especially when staring at the same exception printed 20 times or more as it passes back up through the request processing pipeline.
Having the complete call frame stack can be helpful in a few cases but for the most part it is woefully inefficient (both at runtime and off-line analysis), wasteful, and severely lacking in important runtime diagnostic information – object state and resource usage.
There are solutions available today that make problem determination pretty efficient even for the most complex of systems. You just need to step outside of the runtime and to stop staring into the Sun (it can blind you).
Note: With the solution above probe stack prints can also cross multiple language boundaries – Java-to-Ruby.