Since we released JXInsight 5.7 with support for Java 6 I have been curious to test whether our probes resource metering technology, which is largely Java code except for some important underlying native resource meters and counters, would benefit from the speed-ups indicated by Sun. I was also curious to see whether such improvements benefited other profiling/monitoring tools especially those with a much larger ratio of native code to Java code.

The tests below are based on a micro-benchmark we have specifically designed and developed to accurately determine the overhead of instrumentation and measurement by various profiling tools including our own. It consists of a series of calls through a chain of methods with each method in the chain consisting of a switch statement that forwards the call onwards to the next method. The execution cost is extremely low for this test when not instrumented (bytecode injected) and measured (counters/clocks read).

Below is a bar chart comparing the test execution time across products and runtimes. To get a much more complete picture of the performance improvements possible I disabled all the production oriented features in the JXInsight test runs.

java5vsjava6-c1

Our product obtains a performance improvement of approximately 33%. The other product had no noticeable timing differences which probably indicates that a large amount of the measurement and data collection is performed in native code.

The following chart compares test runs with JXInsight’s dynamic overhead reduction optimizations enabled (which is the default).  Again there was a 33% performance improvement in our test execution times though the scaling makes it harder to discern.

java5vsjava6-c2

A while back there was a Java EE performance management tool (eventually acquired by Compuware) that touted its extremely low overhead because it was written entirely in C/C++. The claim was never substantiated during my own performance investigations and with the continued improvements in the runtime I doubt if it was ever the case would it still hold true today.

I am not advocating that all tooling in the Java management space be written entirely in Java as such solutions tend to be simplistic (no native counters integration) and slow (access through clunky JMX interfaces) but it is important to know where best to perform the instrumentation/measurement and then to design suitable and efficient interfaces between the native code and Java code. Easier said than done considering the current quality and performance of tooling offered by the runtime vendors themselves.

UPDATE: A tales of two threads

Below is a chart comparing the performance times of both products executing the same CPU bound test concurrently across two threads (twice the amount of work) with a Java 6 runtime. To make this in some way comparable I have again disabled all production oriented features in our product. Less than 10% increase compared with a near 100% increase.

java5vsjava6-c3

MacBookPro5,1 Intel Core 2 Duo 2.53 GHz

java version "1.6.0_07"
Java(TM) SE Runtime Environment (build 1.6.0_07-b06-153)
Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_07-b06-57, mixed mode)

java version "1.5.0_16"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_16-b06-284)
Java HotSpot(TM) Client VM (build 1.5.0_16-133, mixed mode, sharing)

I was recently asked my opinion of Java stress|load test tools. I had considered providing a very detailed technical review of the various products and open source projects on the market but considering this was a performance workshop that was already running over its time schedule I opted instead for a quick visualization tour that would bring home one of the biggest issues with a fundamental feature of such tools – inaccurate reporting.

To visually demonstrate this I opened up a timeline analysis snapshot in JXInsight of a load-test run targeting an Java EE application deployed in JBoss with the remote procedure call traffic generated from a ITKO Lisa server process. Both processes were monitored with our distributed tracing technology and product/middleware related extensions.

The screenshot below shows the traces recorded across both processes with the bottom two rows representing load generating threads and the two rows above representing the corresponding traces in JBoss. In the screenshot I have selected a trace in the bottom row which will result in a rectangle overlay being drawn to include the corresponding distributed trace in the second process.    

itko1

To fast track to my observations in the field when using such tools I simply added a highlight to the timeline – High GC

itko2

Here is the same timeline graph above but with a highlight on only those traces that have a “High GC” symptom associated by way of an automated observation inspection. All the application server side traces have faded leaving the spotlight on the threads executing the load scripts in the test server with reported GC times between 20 and 40% of the wall clock time. 

itko3

What impact does this have on the load tests results? Well responses times are overstated with the throughput reduced and understated in reports. 

It is possible that your tests might not be impacted to the same degree but can you be sure? Have you load and performance tested the test tools you are using today?