To be in anyway successful at managing & monitoring real-world Java enterprise applications in production (that excludes Spring “PetClinic”) one needs to assess the runtime impact of the various performance data collection techniques and their actual (tool) implementations. Unfortunately most programmers fail to understand this and instead develop/promote/deploy tools that are clearly completely inadequate for the task at hand – in a production context. They incorrectly assume ease of use equates to lightweight when in fact the only thing lightweight is the effort and due diligence performed in understanding the problem domain and managing the associated risks. It is for this reason why most operations teams immediately discount (with any evaluation) the use of monitoring tools a developer might have used in any way during localized tuning.

Recently this was made so evident when its was falsely claimed (without any qualification other than wishful thinking) that a Java sample based profiling tool could out-perform JXInsight – a dynamic strategy based resource metering solution. I will save the publication of a benchmark comparison for a (near) future blog but instead show you a chart (taken from within the “visual” tool itself) depicting the performance impact of such a technique on an application with a realistic number of threads (100) running with call stacks of realistic sizes (100-200).

Note: The similarities of both the 100 ms and 1,000 ms run sections indicates the tool was not able to perform at the lower of the sampling rates.

sampling.chart

Even at a 10 second sample interval rate the runtime impact is excessive ignoring the fact that at this resolution the data would be completely meaningless unless the application was a dog in terms of performance which begs the question “How did such a beast get into production in the first place?”.

Yes the overhead of sampling is zero when not being performed but what type of enterprise application is not monitored continuously in production? Instrumentation needs to be a primary source of resource metering & software execution metric performance data. Sampling should be considered (and only after discounting many other sources and techniques) when the current instrumentation coverage is not adequate at the time a problem appears though I am doubtful of its effectiveness with the noise that would be created when enabled based on the charting above.

10 Responses to “Java Call Stack Sampling in the Wild”

  1. Fiji Says:

    > what type of enterprise application is not monitored continuously in production

    You know, some of them just works. For monthes and years without developer’s intrusion. Just a matter of implementation quality.

    • williamlouth Says:

      Yes and we should all put our blind faith in the programmer gods to look over our creation whilst we vacation in Las Vegas.

      Seriously you seem to be short cutting the first activity in software performance engineering is “Risk Assessment”.

      Whether it works for months or years is beside the point (at least at the start when you do not know this to be the case). You need to be sure that when things go wrong you have a good insurance cover – you can react quickly and in a correct & directed manner.

  2. Jiri Sedlacek Says:

    Nice post and very informative, thanks! Just one clarification – most of the VisualVM (that’s the ugly tool being described here) users are using it to improve/fix the code during development, that’s where the real optimizations should be made.

    You’ve probably used just the CPU sampling, it would be interesting to see how VisualVM compares to JXInsight also when monitoring the memory usage. I’m sure you’ll add another article soon.

    BTW could you please share a link where it’s said that any sampling tool could out-perform JXInsight? Unfortunately I’ve missed that article.

    • williamlouth Says:

      Apparently the author and commentator cannot handle their beliefs (hopes/dreams) being openly questioned by someone who actually knows what he is talking about so my comments were deleted.

      I have no real intention of tackling heap memory analysis as I think it is largely a development activity with very little possibility of automation (at least in terms of analysis) until developers can communicate via meta data (@annotations) their intention and life expectancy (lease) of objects within the code itself & available to runtime tooling.

      • Jiri Sedlacek Says:

        So you think that slow memory leaks which eventually cause production server crashes are not worth monitoring? Based on feedback of our users this is a real pain, that’s why VisualVM supports heap inspection and now in 1.2 also lightweight memory sampling.

    • williamlouth Says:

      I used the Visual VM remote JMX based sampler plug-in which explains the large amount of GC activity and the delay in the client sampling at low intervals due to networking latency.

    • williamlouth Says:

      It used to be ugly but you have worked very hard on this and it looks much better.

      That said it [Visual VM] is likely to end up looking like another Eclipse frankenstein tool as you have focused wrongly (IMHO) on the console rather than the underlying data models. But that is easy to say with experience.

  3. williamlouth Says:

    Please read the following articles BEFORE posting a commenting otherwise I will delete comments ** already answered in these articles ** to ensure this discussion does not go over the same ground.

    http://williamlouth.wordpress.com/2009/01/07/profiling-sampling-versus-execution-part-1/

    http://williamlouth.wordpress.com/2009/01/16/profiling-sampling-versus-execution-part-2/

    http://williamlouth.wordpress.com/2009/07/27/one-billion-operations-per-second/

  4. williamlouth Says:

    No. I am saying that the approach used by current crop of memory profiler tools is not at all suitable for analysis in a production monitoring context – the cure is as poisonous as the problem killing the patient (process).

    Detailed memory profiling should be used during development & testing and tools like Visual VM, Yourkit, Eclipse MAT, and NetBeans Profiler are good choices here. All these tools are more than adequate for focused code tuning. For both software and system execution behavior analysis under production workload levels then there is currently only one viable solution at least from my benchmarking and feature comparison – JXInsight.

  5. williamlouth Says:

    By the way JXInsight does indeed offers capabilities related to memory analysis but this is more focused on object allocation rates for the purpose of detecting working memory capacity problems rather than memory leakages which is best addressed with expensive heap dumps and offline analysis.


Leave a Reply