Since we released JXInsight 5.7 with support for Java 6 I have been curious to test whether our probes resource metering technology, which is largely Java code except for some important underlying native resource meters and counters, would benefit from the speed-ups indicated by Sun. I was also curious to see whether such improvements benefited other profiling/monitoring tools especially those with a much larger ratio of native code to Java code.

The tests below are based on a micro-benchmark we have specifically designed and developed to accurately determine the overhead of instrumentation and measurement by various profiling tools including our own. It consists of a series of calls through a chain of methods with each method in the chain consisting of a switch statement that forwards the call onwards to the next method. The execution cost is extremely low for this test when not instrumented (bytecode injected) and measured (counters/clocks read).

Below is a bar chart comparing the test execution time across products and runtimes. To get a much more complete picture of the performance improvements possible I disabled all the production oriented features in the JXInsight test runs.

java5vsjava6-c1

Our product obtains a performance improvement of approximately 33%. The other product had no noticeable timing differences which probably indicates that a large amount of the measurement and data collection is performed in native code.

The following chart compares test runs with JXInsight’s dynamic overhead reduction optimizations enabled (which is the default).  Again there was a 33% performance improvement in our test execution times though the scaling makes it harder to discern.

java5vsjava6-c2

A while back there was a Java EE performance management tool (eventually acquired by Compuware) that touted its extremely low overhead because it was written entirely in C/C++. The claim was never substantiated during my own performance investigations and with the continued improvements in the runtime I doubt if it was ever the case would it still hold true today.

I am not advocating that all tooling in the Java management space be written entirely in Java as such solutions tend to be simplistic (no native counters integration) and slow (access through clunky JMX interfaces) but it is important to know where best to perform the instrumentation/measurement and then to design suitable and efficient interfaces between the native code and Java code. Easier said than done considering the current quality and performance of tooling offered by the runtime vendors themselves.

UPDATE: A tales of two threads

Below is a chart comparing the performance times of both products executing the same CPU bound test concurrently across two threads (twice the amount of work) with a Java 6 runtime. To make this in some way comparable I have again disabled all production oriented features in our product. Less than 10% increase compared with a near 100% increase.

java5vsjava6-c3

MacBookPro5,1 Intel Core 2 Duo 2.53 GHz

java version "1.6.0_07"
Java(TM) SE Runtime Environment (build 1.6.0_07-b06-153)
Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_07-b06-57, mixed mode)

java version "1.5.0_16"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_16-b06-284)
Java HotSpot(TM) Client VM (build 1.5.0_16-133, mixed mode, sharing)

3 Responses to “Java 6: Good Design -> Faster Performance”


  1. Hi William,
    Is this test code available for the public?
    Regards,
    Markus

  2. williamlouth Says:

    Hi Markus,

    I believe the source code was previously published on our company blog (http://blog.jinspired.com) prior to the transfer of all the content over to our xpe community site (http://xpe.jinspired.com).

    The method call chain executes extremely fast which is why it is called from within a loop 100 million times with a default call depth of 4.

    The template for each method we generate prior to the test and executed in a our benchmark harness looks like this:

    private static void call${N}(int depth) {
    switch(depth) {
    case ${N-1}: {
    call${N-1}(–depth);
    break;
    } default: {
    throw new IllegalArgumentException();
    }
    }
    }

    The entry point method “call” contains a large switch statement covering all possible starting depths as we also test profilers which are sample based (call stacks cost more the greater the depth).

    By default we only use one thread but if we suspect a profiling solution is offloading work onto another thread than we test with a higher number of threads to max out all processing capacity depending on the platform and runtime tested.

    Remember this code is used to profile the profilers own instrumentation and measurement. We adjust the reported times for the cost in executing the underlying system call to get the counter (twice per method measured/metered).

    William


  3. [...] I think the problem here is that the session speaker incorrectly assumes that other tools managing applications in production using some form of execution analysis (1) instrument every single method on the call stack, (2) measure every method invocation occurrence, and (3) have a relatively high overhead in the measuring invocations. This is certainly not the case today.  [...]


Leave a Reply