The Fastest Python Profiler is a Java Profiler
September 29, 2009
Following on from our “The Fastest Ruby Profiler is a Java Profiler” I am pleased to announce that our upcoming probes based resource metering extension for Jython is the fastest function call profiling solution for Python applications executed within a Java runtime. With our dynamic hotspot resource metering strategy the overhead of our probes instrumentation can drop down to approximately 8 nanoseconds for non-hotspot function calls.
Here is the Python micro-benchmark I have used in determining this.

Here is a chart showing the average time (secs) reported by multiple executions of the run method without (baseline) and with our probes based instrumentation. JXInsight’s instrumentation overhead is just over 2% but that is 2% on a method, call, that effectively does nothing.
The overhead incurred in the instrumentation and measurement of fine grain method executions, which dominate the execution (frequency) profile of most web applications, is a key performance indicator (KPI) for any enterprise application performance management and monitoring solution.

Below is a screenshot of the resulting resource metering in our application management console. Note the strike-through on the call:12 function indicating the associated instrumentation has been disabled following the execution of 1000 probe firings and the determination by the hotspot metering strategy that the named probe, benchmark.call.12, is not a hotspot.
Note: Line numbers can be excluded from the probe name assigned to a function marking all same named functions within a module as hot or cold spots depending on the combined resource metering behavior for all implementations.

The following chart shows the performance of the same benchmark executed by the native c-based Python engine along with the standard cProfile. The overhead is significant.

Here is a chart comparing the overhead (%) of JXInsight with that of the Python cProfile.

I attempted to run the benchmark with the Python based
profilesolution but a singlerun()call took over 200 seconds compared to 2-4 seconds for all other reported test runs. Surprisingly the Jython version of the Pythonprofilesolution executed in just under 120 seconds.
If you are interested in seeing more of what will be possible with the upcoming extension then check out the following images showing contextual labeling as well as multi-language metering (profiling).
www.jinspired.com/images/jxinsight.5.7.32.jython.to.python.p.1.gif
www.jinspired.com/images/jxinsight.5.7.32.jython.to.python.p.2.gif
Metering (Profiling) Tagged Execution Periods
September 21, 2009
One of the many cool and innovative features we have recently pushed out the door with our regular software updates is an enhancement to tagging that really makes this unique feature extremely useful and versatile in the analysis of software under construction during development & test or under management in production. Now when you set a tag at either the global or thread context level not only do we aggregate resource metering for the tagged metering group but we do this for every other metering group updated during the tagged execution period. We do this by creating an extended @tag metering group that concatenates the name of the metered group updated with the value of tag.

Being able to inject tags remotely from within our management console or command line interface or locally from within the actual managed runtime via our Open API gives us the ability answer two very broad questions:
1. What is the resource usage for a package/class/method during a particular tagged execution period?
2. How does the resource usage for a particular probe compare across tagged and non-tagged execution periods in terms of count, total, min, max and average?
Previously the way to answer such questions at a global level across various profiling and application performance management (APM) tools would be to reset all data collected prior to the execution period, specify a period filter on an event journal table or log, or in our case mark the current totals and then perform delta analysis following the completion of the execution metered. But this leads to the creation of multiple disconnected performance models that cannot be easily compared and reconciled. In addition it is not possible to narrow the analysis to one or more threads or have concurrent tagged execution periods with each thread having its own tag value which is required when the tag values reflect the some aspect of the execution context such as the web application user name, request parameter or session attribute.

Here are some concrete kind use cases where this capability might be much more appropriate during analysis across multiple application life cycle management phases.
1. During the testing of a web application interaction consisting of a series of steps and pages a tag based on the step or page name is set globally allowing the developer to break down the resource metering costs for a particular package/class/method by the tagged step (or page).
2. During unit testing a tag is set at the thread context level based on the name of each @Test method executed providing code coverage like analysis for direct and indirect executed code by test method. This data can also be used in generating dependency analysis reports for such tests.
3. During a system failure (i.e. database bounce) a global tag is injected identifying the failure period allowing service level management to later break-out such costs on a per activity/interaction basis.
4. In diagnosing a user specific problem a thread context level tag is set identifying the user or the particular interaction at the entry point of request processing enabling resource metering to be reported both from an operational perspective (code) and contextual (user marker) perspective side by side.
5. In delivering elastic computing capabilities to an application a tag is injected into a cluster of runtimes at each dynamic change (a capacity profile) in the computing capacity made available to the application in the cloud which can then later be analyzed offline to determine reasons for the increase in capacity not being fully utilized due to possible inherent scalability problems with the applications own code base. Likewise one can easily determine which particular activities are impacted more than others by decreases in computing capacity due to elasticity policies.