Profiling: Sampling versus Execution (Part 1)
January 7, 2009
At the recent Devoxx conference (which I did not manage to attend) it was claimed that sample based profiling has much less overhead than execution profiling. This claim was made by a person that works on a product (not yet released) that only supports sample based profiling during a session which opened up with the statement “Any performance tuning advice provided in this presentation….. will be wrong!”
This same person previously claimed GC actually made applications faster. If this holds true for an application then more than likely the application has serious resource bottlenecks elsewhere in the request processing pipeline which the additional GC stop-the-world event is alleviating by inadvertently throttling traffic and reducing resource contention (concurrency).
But before testing the validity of such a claims (ignoring the actual benefit of the data collection) lets consider the typical production workload context for enterprise Java applications.
- Large number (>50) of request processing threads
- Very deep call stacks (>200) with a high percentage of call frames non-application related (especially so when using frameworks such as Spring)
- High degree of database activity with high latency costs (>10 ms)
Which means that there is a high probability that when the sample profiler executes a measurement cycle (every 1-5 ms?) a large number of threads will have very deep call stacks that by and large are of little value in terms of the application performance analysis – non-application code and with no application context.
Obtaining the call stack for a thread is incredibly expensive (we know this from the cost of throwing exceptions) and this is typically performed after all threads have been suspended temporarily by the sampling profiler (more cores, more waste). This expense does not even include the cost in performing a per thread call stack comparison with the previous call stack collected, recording timing and updating statistics – a cost that grows with each thread and each frame.
I think the problem here is that the session speaker incorrectly assumes that other tools managing applications in production using some form of execution analysis (1) instrument every single method on the call stack, (2) measure every method invocation occurrence, and (3) have a relatively high overhead in the measuring invocations. This is certainly not the case today.
Unless we are talking about a “HelloWorld” application with only one main thread of execution being profiled a dynamic strategy based execution profiling (metering) solution can indeed out-perform simplistic sample based profilers whilst collecting much more relevant data, discarding noise, at a much higher degree of accuracy. This will be demonstrated in part 2.
Aside: One of the reasons that sample based profiling is offered by vendors is that it simplifies the development work for the product team. There is no requirement to deliver technology specific extensions, configuration options or an open API to allow custom extension. There is overhead reduction it is at the vendors development site!!!