<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Profiling: Sampling versus Execution (Part 1)</title>
	<atom:link href="http://williamlouth.wordpress.com/2009/01/07/profiling-sampling-versus-execution-part-1/feed/" rel="self" type="application/rss+xml" />
	<link>http://williamlouth.wordpress.com/2009/01/07/profiling-sampling-versus-execution-part-1/</link>
	<description>The Art of API Design and Performance Engineering</description>
	<lastBuildDate>Wed, 28 Oct 2009 20:05:57 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: williamlouth</title>
		<link>http://williamlouth.wordpress.com/2009/01/07/profiling-sampling-versus-execution-part-1/#comment-188</link>
		<dc:creator>williamlouth</dc:creator>
		<pubDate>Tue, 20 Oct 2009 05:40:24 +0000</pubDate>
		<guid isPermaLink="false">http://williamlouth.wordpress.com/?p=284#comment-188</guid>
		<description>Yes sampling, unlike instrumentation, has no overhead when not enabled but how many applications in the enterprise are not actually managed &amp; monitored. That said we have managed to drop the overhead down pretty low that it is generally not noticeable at all - instrumentation can be created in such a way that when disabled the impact is negligible (i.e. DTrace).

Sample away if you are a Java desktop developer but please do not use this in production and continuously.</description>
		<content:encoded><![CDATA[<p>Yes sampling, unlike instrumentation, has no overhead when not enabled but how many applications in the enterprise are not actually managed &amp; monitored. That said we have managed to drop the overhead down pretty low that it is generally not noticeable at all &#8211; instrumentation can be created in such a way that when disabled the impact is negligible (i.e. DTrace).</p>
<p>Sample away if you are a Java desktop developer but please do not use this in production and continuously.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: williamlouth</title>
		<link>http://williamlouth.wordpress.com/2009/01/07/profiling-sampling-versus-execution-part-1/#comment-187</link>
		<dc:creator>williamlouth</dc:creator>
		<pubDate>Tue, 20 Oct 2009 05:36:27 +0000</pubDate>
		<guid isPermaLink="false">http://williamlouth.wordpress.com/?p=284#comment-187</guid>
		<description>&quot;Creating a stack trace extremely quickly can be done&quot;

600 ns per thread is not extremely quickly in my book or anyones book for that matter with regard to enterprise applications with demanding performance requirements.</description>
		<content:encoded><![CDATA[<p>&#8220;Creating a stack trace extremely quickly can be done&#8221;</p>
<p>600 ns per thread is not extremely quickly in my book or anyones book for that matter with regard to enterprise applications with demanding performance requirements.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: williamlouth</title>
		<link>http://williamlouth.wordpress.com/2009/01/07/profiling-sampling-versus-execution-part-1/#comment-186</link>
		<dc:creator>williamlouth</dc:creator>
		<pubDate>Tue, 20 Oct 2009 05:34:57 +0000</pubDate>
		<guid isPermaLink="false">http://williamlouth.wordpress.com/?p=284#comment-186</guid>
		<description>Please read the rest of the blog entries on this site that demonstrate who we do this which by the way is entirely in your hands in terms of its metering definition.</description>
		<content:encoded><![CDATA[<p>Please read the rest of the blog entries on this site that demonstrate who we do this which by the way is entirely in your hands in terms of its metering definition.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: yuzzamatuzz</title>
		<link>http://williamlouth.wordpress.com/2009/01/07/profiling-sampling-versus-execution-part-1/#comment-182</link>
		<dc:creator>yuzzamatuzz</dc:creator>
		<pubDate>Mon, 19 Oct 2009 00:30:43 +0000</pubDate>
		<guid isPermaLink="false">http://williamlouth.wordpress.com/?p=284#comment-182</guid>
		<description>My point was just that the design point for stack walking in current JVMs was driven primarily by creating stack traces for exceptions, which don&#039;t happen very often (and so the performance of walking stacks wasn&#039;t given too much attention -- so you walk the stack when the exception occurs).  Creating a stack trace extremely quickly can be done, it just creates a higher overhead on application execution time when you don&#039;t need the result of the walk (no different than your instrumentation probes, except you seem to know which methods are &quot;important&quot; for tracing whereas the JVM wouldn&#039;t usually have that kind of information, I don&#039;t think).  But if you&#039;re constantly connected to a profiler or other performance tool, then perhaps it makes sense to invest in that kind of approach so you can get the trace without the current degree of overhead.

Similarly, there&#039;s no reason thread stacks cannot be walked asynchronously...you don&#039;t need to stop all the threads all at the same time.

So I&#039;m still curious how you identify what you suggest are the 10% of the methods fired that need to be actually metered?  Does the tool figure it out automatically, somehow, or does it rely on input from the user of the tool?  That part seems to me to be the key advantage of your approach, rather than the relative perceived inherent costs of doing things in currently engineered JVMs.</description>
		<content:encoded><![CDATA[<p>My point was just that the design point for stack walking in current JVMs was driven primarily by creating stack traces for exceptions, which don&#8217;t happen very often (and so the performance of walking stacks wasn&#8217;t given too much attention &#8212; so you walk the stack when the exception occurs).  Creating a stack trace extremely quickly can be done, it just creates a higher overhead on application execution time when you don&#8217;t need the result of the walk (no different than your instrumentation probes, except you seem to know which methods are &#8220;important&#8221; for tracing whereas the JVM wouldn&#8217;t usually have that kind of information, I don&#8217;t think).  But if you&#8217;re constantly connected to a profiler or other performance tool, then perhaps it makes sense to invest in that kind of approach so you can get the trace without the current degree of overhead.</p>
<p>Similarly, there&#8217;s no reason thread stacks cannot be walked asynchronously&#8230;you don&#8217;t need to stop all the threads all at the same time.</p>
<p>So I&#8217;m still curious how you identify what you suggest are the 10% of the methods fired that need to be actually metered?  Does the tool figure it out automatically, somehow, or does it rely on input from the user of the tool?  That part seems to me to be the key advantage of your approach, rather than the relative perceived inherent costs of doing things in currently engineered JVMs.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: williamlouth</title>
		<link>http://williamlouth.wordpress.com/2009/01/07/profiling-sampling-versus-execution-part-1/#comment-181</link>
		<dc:creator>williamlouth</dc:creator>
		<pubDate>Sun, 18 Oct 2009 14:25:44 +0000</pubDate>
		<guid isPermaLink="false">http://williamlouth.wordpress.com/?p=284#comment-181</guid>
		<description>I am actually going to correct myself here. Our native agent generates thread call stacks, java.lang.reflect.Method[], very efficiently and that takes 100-200 microseconds. 

Doing it the standard (naive) way via a call to Thread.currentThread().getStackTrace() takes on average between 600-750 microseconds for a depth of 200 with occasional outliners in the order of 1-2 milliseconds due to the excessive object allocation nature of this particular call. So a stack depth of over 300 (unfortunately a standard these days) will take over 1 ms which is equivalent to the cost of a distributed call. This is why the latest crop of straw man Java  sampling profilers have a minimum interval of 100 ms. Now factor in the drop of throughput during this period due to thread suspension and the conclusion is pretty obvious even for &quot;peter the programmer&quot; who should probably be called &quot;peter the plumber&quot;.</description>
		<content:encoded><![CDATA[<p>I am actually going to correct myself here. Our native agent generates thread call stacks, java.lang.reflect.Method[], very efficiently and that takes 100-200 microseconds. </p>
<p>Doing it the standard (naive) way via a call to Thread.currentThread().getStackTrace() takes on average between 600-750 microseconds for a depth of 200 with occasional outliners in the order of 1-2 milliseconds due to the excessive object allocation nature of this particular call. So a stack depth of over 300 (unfortunately a standard these days) will take over 1 ms which is equivalent to the cost of a distributed call. This is why the latest crop of straw man Java  sampling profilers have a minimum interval of 100 ms. Now factor in the drop of throughput during this period due to thread suspension and the conclusion is pretty obvious even for &#8220;peter the programmer&#8221; who should probably be called &#8220;peter the plumber&#8221;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: williamlouth</title>
		<link>http://williamlouth.wordpress.com/2009/01/07/profiling-sampling-versus-execution-part-1/#comment-180</link>
		<dc:creator>williamlouth</dc:creator>
		<pubDate>Sun, 18 Oct 2009 08:37:32 +0000</pubDate>
		<guid isPermaLink="false">http://williamlouth.wordpress.com/?p=284#comment-180</guid>
		<description>By the way on further reflection I would not classify what our solution does as &quot;sampling&quot;. It is strategy based. Admittedly of the 16 or so base metering strategies we support 4 could be classified as sample based (time, frequency) but most are based on behavioral aspects: concurrent, entry point, warm-up, initial, exclude, include, hotspot, dynamic, checkpoint, delay, highcpu, busythread, busy, .....</description>
		<content:encoded><![CDATA[<p>By the way on further reflection I would not classify what our solution does as &#8220;sampling&#8221;. It is strategy based. Admittedly of the 16 or so base metering strategies we support 4 could be classified as sample based (time, frequency) but most are based on behavioral aspects: concurrent, entry point, warm-up, initial, exclude, include, hotspot, dynamic, checkpoint, delay, highcpu, busythread, busy, &#8230;..</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: williamlouth</title>
		<link>http://williamlouth.wordpress.com/2009/01/07/profiling-sampling-versus-execution-part-1/#comment-179</link>
		<dc:creator>williamlouth</dc:creator>
		<pubDate>Sun, 18 Oct 2009 08:32:13 +0000</pubDate>
		<guid isPermaLink="false">http://williamlouth.wordpress.com/?p=284#comment-179</guid>
		<description>I think you miss the reason why sampling was introduced &amp; used ignoring the obvious problems with it. Sampling was used because (&lt;em&gt;primitive&lt;/em&gt;) execution based profiling/monitoring solutions incurred excessive overhead. The trade-off in sampling is loss in accuracy and reduced statistical data collection (averages, max, min, stddev, var,...). That is assuming we are only talking about desktop applications with a single thread of execution. Today with a large number of concurrent threads executing within each JVM process and most making database/messaging calls at very deep call stack depths the cost benefit analysis of sampling is pretty dismal. Getting the call stack for a single thread with a depth of +100 will cost at least 100 microseconds of cpu time before it is even processed by a tool. Now multiply that by 100 threads. Then multiply that by 2x-4x for call stack depths of between 300-400 and you can see the problem. Then factor in that most native samplers first suspend the execution of all threads before collecting the actual call stack (frames). Seems a no brainer to me at least when one takes into account that less than 10% of the methods fired need to be actually metered and their frequency percentages is even less than that. A smart execution profiling/metering solution will out-perform a sampling solution both in terms of cost (&lt;em&gt;overhead&lt;/em&gt;) and benefit (&lt;em&gt;data collection&lt;/em&gt;).</description>
		<content:encoded><![CDATA[<p>I think you miss the reason why sampling was introduced &amp; used ignoring the obvious problems with it. Sampling was used because (<em>primitive</em>) execution based profiling/monitoring solutions incurred excessive overhead. The trade-off in sampling is loss in accuracy and reduced statistical data collection (averages, max, min, stddev, var,&#8230;). That is assuming we are only talking about desktop applications with a single thread of execution. Today with a large number of concurrent threads executing within each JVM process and most making database/messaging calls at very deep call stack depths the cost benefit analysis of sampling is pretty dismal. Getting the call stack for a single thread with a depth of +100 will cost at least 100 microseconds of cpu time before it is even processed by a tool. Now multiply that by 100 threads. Then multiply that by 2x-4x for call stack depths of between 300-400 and you can see the problem. Then factor in that most native samplers first suspend the execution of all threads before collecting the actual call stack (frames). Seems a no brainer to me at least when one takes into account that less than 10% of the methods fired need to be actually metered and their frequency percentages is even less than that. A smart execution profiling/metering solution will out-perform a sampling solution both in terms of cost (<em>overhead</em>) and benefit (<em>data collection</em>).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: williamlouth</title>
		<link>http://williamlouth.wordpress.com/2009/01/07/profiling-sampling-versus-execution-part-1/#comment-178</link>
		<dc:creator>williamlouth</dc:creator>
		<pubDate>Sun, 18 Oct 2009 08:17:05 +0000</pubDate>
		<guid isPermaLink="false">http://williamlouth.wordpress.com/?p=284#comment-178</guid>
		<description>Leaving out the context of any advice is dangerous and such disclaimers make what follows completely irrelevant to the subject (other than to highlight the complexity as you rightly point out) - just entertainment.</description>
		<content:encoded><![CDATA[<p>Leaving out the context of any advice is dangerous and such disclaimers make what follows completely irrelevant to the subject (other than to highlight the complexity as you rightly point out) &#8211; just entertainment.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: yuzzamatuzz</title>
		<link>http://williamlouth.wordpress.com/2009/01/07/profiling-sampling-versus-execution-part-1/#comment-177</link>
		<dc:creator>yuzzamatuzz</dc:creator>
		<pubDate>Sun, 18 Oct 2009 01:42:49 +0000</pubDate>
		<guid isPermaLink="false">http://williamlouth.wordpress.com/?p=284#comment-177</guid>
		<description>I think opening a session with the comment you quoted isn&#039;t necessarily a bad thing; it just reflects the complex nature of giving advice on performance tuning, especially when you don&#039;t have time to go into minute detail explaining the contexts where the advice is appropriate and where it&#039;s not.  Seems like an appropriate standard disclaimer to me.</description>
		<content:encoded><![CDATA[<p>I think opening a session with the comment you quoted isn&#8217;t necessarily a bad thing; it just reflects the complex nature of giving advice on performance tuning, especially when you don&#8217;t have time to go into minute detail explaining the contexts where the advice is appropriate and where it&#8217;s not.  Seems like an appropriate standard disclaimer to me.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: yuzzamatuzz</title>
		<link>http://williamlouth.wordpress.com/2009/01/07/profiling-sampling-versus-execution-part-1/#comment-176</link>
		<dc:creator>yuzzamatuzz</dc:creator>
		<pubDate>Sun, 18 Oct 2009 01:40:33 +0000</pubDate>
		<guid isPermaLink="false">http://williamlouth.wordpress.com/?p=284#comment-176</guid>
		<description>I don&#039;t see why call stack sampling *needs* to be inefficient.  The problem is that the JVM has little knowledge about how far up the stack things become &quot;interesting&quot; to a profiling user scenario.  Current stack walkers have likely received the performance improvement attention appropriate for the frequency of handling exceptions (after all, they&#039;re called &quot;exceptions&quot; for a reason).  Using that same code for a call stack profiler changes the equation (it&#039;s not used only in exceptional cases anymore) so it deserves more effort to improve its performance.  So that&#039;s just engineering effort needed.

Forgive me for not being familiar with the profiling tool: does it solve the &quot;what&#039;s interesting to look at&quot; question and is that a part of why it&#039;s effective, in your opinion?</description>
		<content:encoded><![CDATA[<p>I don&#8217;t see why call stack sampling *needs* to be inefficient.  The problem is that the JVM has little knowledge about how far up the stack things become &#8220;interesting&#8221; to a profiling user scenario.  Current stack walkers have likely received the performance improvement attention appropriate for the frequency of handling exceptions (after all, they&#8217;re called &#8220;exceptions&#8221; for a reason).  Using that same code for a call stack profiler changes the equation (it&#8217;s not used only in exceptional cases anymore) so it deserves more effort to improve its performance.  So that&#8217;s just engineering effort needed.</p>
<p>Forgive me for not being familiar with the profiling tool: does it solve the &#8220;what&#8217;s interesting to look at&#8221; question and is that a part of why it&#8217;s effective, in your opinion?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: williamlouth</title>
		<link>http://williamlouth.wordpress.com/2009/01/07/profiling-sampling-versus-execution-part-1/#comment-175</link>
		<dc:creator>williamlouth</dc:creator>
		<pubDate>Sat, 17 Oct 2009 17:49:52 +0000</pubDate>
		<guid isPermaLink="false">http://williamlouth.wordpress.com/?p=284#comment-175</guid>
		<description>Clearly call stack sampling analysis does not scale both in terms of the runtime and the offline analysis. Even performance engineers at Sun admit some of its problems though they do seem to think the sadistic (statistical) nature of the collection &amp; analysis work is &quot;fun&quot;.

http://weblogs.java.net/blog/sdo/archive/2009/10/16/fun-jstack</description>
		<content:encoded><![CDATA[<p>Clearly call stack sampling analysis does not scale both in terms of the runtime and the offline analysis. Even performance engineers at Sun admit some of its problems though they do seem to think the sadistic (statistical) nature of the collection &amp; analysis work is &#8220;fun&#8221;.</p>
<p><a href="http://weblogs.java.net/blog/sdo/archive/2009/10/16/fun-jstack" rel="nofollow">http://weblogs.java.net/blog/sdo/archive/2009/10/16/fun-jstack</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: williamlouth</title>
		<link>http://williamlouth.wordpress.com/2009/01/07/profiling-sampling-versus-execution-part-1/#comment-174</link>
		<dc:creator>williamlouth</dc:creator>
		<pubDate>Sat, 17 Oct 2009 17:46:27 +0000</pubDate>
		<guid isPermaLink="false">http://williamlouth.wordpress.com/?p=284#comment-174</guid>
		<description>Yes sampling in the statistical sense (&lt;em&gt;though combined to one or more strategies&lt;/em&gt;) but not in the &quot;instrumentation&quot; &amp; &quot;collection technique&quot; sense which is what the particular slide was comparing. I do not think I am muddying the waters at all most software performance engineers would assume an &quot;instrumentation&quot; context. 

Anyway I am more focused on the actual sampling data collection technique (please see part 2) which is call stack based.</description>
		<content:encoded><![CDATA[<p>Yes sampling in the statistical sense (<em>though combined to one or more strategies</em>) but not in the &#8220;instrumentation&#8221; &amp; &#8220;collection technique&#8221; sense which is what the particular slide was comparing. I do not think I am muddying the waters at all most software performance engineers would assume an &#8220;instrumentation&#8221; context. </p>
<p>Anyway I am more focused on the actual sampling data collection technique (please see part 2) which is call stack based.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: yuzzamatuzz</title>
		<link>http://williamlouth.wordpress.com/2009/01/07/profiling-sampling-versus-execution-part-1/#comment-173</link>
		<dc:creator>yuzzamatuzz</dc:creator>
		<pubDate>Sat, 17 Oct 2009 17:05:37 +0000</pubDate>
		<guid isPermaLink="false">http://williamlouth.wordpress.com/?p=284#comment-173</guid>
		<description>&quot;I think the problem here is that the session speaker incorrectly assumes that other tools managing applications in production using some form of execution analysis (1) instrument every single method on the call stack, (2) measure every method invocation occurrence, and (3) have a relatively high overhead in the measuring invocations. This is certainly not the case today.&quot;

If #1 and #2 aren&#039;t true, doesn&#039;t that just mean that modern tools are also doing a form of sampling, just using different triggers for the samples (admittedly more intelligent ones than just &quot;N ms has elapsed&quot;)?  So far all I see is lower overhead on the probes and smarter choice of triggers (which are good things, no doubt).  But I think you&#039;re clouding the issue by making it an anti-sampling diatribe.  Even #3 is just a simple conclusion from #1 and #2 rather than an additional complaint.</description>
		<content:encoded><![CDATA[<p>&#8220;I think the problem here is that the session speaker incorrectly assumes that other tools managing applications in production using some form of execution analysis (1) instrument every single method on the call stack, (2) measure every method invocation occurrence, and (3) have a relatively high overhead in the measuring invocations. This is certainly not the case today.&#8221;</p>
<p>If #1 and #2 aren&#8217;t true, doesn&#8217;t that just mean that modern tools are also doing a form of sampling, just using different triggers for the samples (admittedly more intelligent ones than just &#8220;N ms has elapsed&#8221;)?  So far all I see is lower overhead on the probes and smarter choice of triggers (which are good things, no doubt).  But I think you&#8217;re clouding the issue by making it an anti-sampling diatribe.  Even #3 is just a simple conclusion from #1 and #2 rather than an additional complaint.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Profiling: Sampling versus Execution (Part 2) &#171; William Louth&#8217;s Weblog</title>
		<link>http://williamlouth.wordpress.com/2009/01/07/profiling-sampling-versus-execution-part-1/#comment-16</link>
		<dc:creator>Profiling: Sampling versus Execution (Part 2) &#171; William Louth&#8217;s Weblog</dc:creator>
		<pubDate>Fri, 16 Jan 2009 13:15:42 +0000</pubDate>
		<guid isPermaLink="false">http://williamlouth.wordpress.com/?p=284#comment-16</guid>
		<description>[...] entry I have constructed a simple benchmark class based on the following observations raised in part 1 related to enterprise Java applications in the wild (which I assume was the context when the health [...]</description>
		<content:encoded><![CDATA[<p>[...] entry I have constructed a simple benchmark class based on the following observations raised in part 1 related to enterprise Java applications in the wild (which I assume was the context when the health [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>
