ABC for Cloud Computing
January 27, 2009
How Fast and at What Cost?
Whilst there are many benefits to cloud computing in terms of dynamic provisioning and on-demand scaling it presents challenges to IT management organizations in the performance monitoring of applications and streamlining processing costs as result of cloud computing service charges. Fortunately there is a common solution to tackling both challenges that offers to make the cost of both the operation and quality of an IT service delivered transparent, measurable and manageable. A solution that allows one to explicitly model the relation between the performance of an IT service and its operating costs when delivered at particular levels of quality (performance) to consumers – activity-based costing (ABC).
Activity-based costing consists of a service costing method and a resource consumption model that facilitates better decision-making by interpreting relationships between services delivered and their operational costs.
Advice: I strongly recommend that you read our Metering the Cloud article as well as some basic information on the mocked application and technologies before proceeding if you have not already done so.
Activity & Resource Mappings
An activity is represented as a named Probe with composite names representing hierarchical cost centers (Groups).
A resource is represented by either a Counter or Meter. The main difference between a Counter and a Meter is that changes to a Meter (which can in fact be mapped to a counter) are tracked at the activity level via a Metering associated with a Group (full or partial name).
Note: The mock application has been simplified significantly leaving out nested activities (probes), tracking of costs by context paths, and optimal metering strategies (cost versus overhead).
Within each of the mock cloud technology components I have inserted instrumentation for the purpose of tracking and costing of software activities (Probes) and resource usage (Meters & Counters). These lines are bookmarked in the left margin.
Note: The instrumentation inserted would normally be weaved into the code base transparently at load-time using our aspect libraries.
Mock Twitter WepApp
The Twitter WebApp component mimics the handling of web requests dispatched by the Google Engine. The request handling code is the main entry point from which requests can be dispatched to two service points – status and timeline.
The status service point will update the latest status text associated for the specified user by writing the binary version of the text to an Amazon S3 Bucket keyed on the user’s name.
The timeline service point will list previous status updates for the specified user by reading all objects stored in an Amazon S3 Bucket keyed on the user’s name.
Tracking relevant activities (software services) is the foundation of a successful ABC model so this is the only component in the application that is instrumented with named Probes. A single Probe is fired and metered in the request handling code creating a dynamic composite named Probe with the following hierarchical cost center (metered groups):
twitter.users.${user}.${service}
In addition a Counter, representing a service charge, is incremented for each request delivered to the status service point. Charges are assigned to status updates by the user and not the viewing of such updates by the user and others.

Mock Google Engine
The Google Engine component mimics a web request processing dispatcher. It reads a single line representing a URL from a text console and forwards both a Request and Response object to the registered Handler.
From the perspective of the Twitter application the software execution of the Engine is modeled as a resource (with multiple cost driver) and not as an activity with Counters accumulating both the counts of requests dispatched and associated CPU usage.

Mock Google Engine Request
The Google Request component mimics the reading and parsing of a web request. A single Counter is maintained (across threads) by the Request component tracking the number of bytes read which is used by Google in its cloud computing billing rate plan.
Note: Counters are thread specific so their accumulation can easily be mapped to a metered resource (meter) which is in turn mapped to a metered activity (probe) executed by a request handling thread. This is a very important distinction with legacy system monitoring solutions working with metrics at the process level and oblivious to the causality of changes.

Mock Google Engine Response
The Google Response component mimics the writing of a web response back to the user. It also increments the same Counter used by the Google Request component recording the number of bytes written out.

Mock Amazon S3 Bucket
The Amazon Bucket component mimics the storage capabilities of the S3 service. Tweets that have been transformed into binary form are written and read from a Bucket maintained on per user basis (ignoring the limits imposed by Amazon).
Unlike the Google Request and Response components the Amazon Bucket maintains two distinct Counters tracking bytes transfered during read and write operations to and from the Bucket.

Test Scenario
The following series of console input lines have been executed with each of the resource metering model configurations depicted below.
in: service/status/user/wlouth/text/hello world
in: service/status/user/wlouth/text/hello cloud
in: service/timeline/user/wlouth
out: hello cloud
out: hello world
To help with understanding the execution flow I have collected a probes tracking model based solely on code level instrumentation performed by our load-time bytecode weaving agent and some aspect extension libraries.
Note: In this partial ABC model every method execution is represented as an activity itself which is not necessarily what we would want in the context of the large cloud application.
![]()
Resource Metering Models
Executing the test scenario (this time with no code level instrumentation) we get the following resource Metering table view that tracks the changes in Meter usage at the activity level (leaf node) and associated cost centers (root and inner nodes). This type of data is similar (to some degree) to what is found in transaction monitoring and hotspot (CPU) detection profiling tools.
Note: The
Countis the number ofProbesfired and metered. TheTotalis the cumulative change in theMeterduring this period.

Here is a table listing the values of the inserted Counters at the process level. This is type of the data presented by system management tools with no actual knowledge or tracking of the software activity treating the process largely as a black box with a basic and flat metric surface.

What we really need initially is a way of combining these two perspectives especially as our monthly cloud computing billing is closely tied to increases in one or more of these resource counters updated by the cloud platform and cloud service components.
Well it is actually not so hard (at least when using our resource metering runtime) because Counters are in fact thread (activity) specific and only aggregated at the process level when the counting probes provider extension is enabled.
Lets create some Meters mapped to to Counters and execute our test again with the following configuration.

Below is the new Metering table view captured following the test execution. With just an external configuration change we can now see which activities impact which Counters via the Meter mapping.
Additionally we have introduced a charge back mechanism by way of the twitter.service.charge Meter (more on this later).

Cost Metering Models
Now lets try to introduce a more explicit cost model by introducing unit cost Meters based on the Counters with some relabeling of the mapped Meters to hide the underlying technologies and billable cloud services by appending the following properties to the configuration.

Below is the revised metering model after executing the test scenario. The model now includes a new unit cost based meter, io.cost, representing our storage costs. From the model we can see that the storage cost in delivering the status and timeline services (activities) to user wlouth is 66 (io.read.bytes*1 + io.write.bytes*2). This is higher than our service charge units (20) so we might need to revise things slightly in our next configuration.
Note: A good practice (of abstraction) in cost modeling is to create an internal unit charge scheme and associated unit currency.

Our final configuration assigns a unit cost to our service charge Meter as well as relabeling the mapped Meter.
![]()
And here is the resulting model. Yes! The cloud application is making profit (well at least in terms of the storage cost).

An important benefit in our innovative resource metering approach is that we now have a single model for both cost management and service management (performance & capacity management) that can in addition provide business insight into software service sales (and in real-time alongside operational costs).
This unified model allows performance engineers to much more easily factor in cost when making trade-offs (in terms of resource usage) in order to improve the response times and throughput of one or more software services. With each possible performance optimization (CPU vs IO) the cost of the change can be evaluated. Here is a table that combines the code centric and service centric views into a single model.

Naturally the model lends itself to being distributed (and disconnected) across multiple processing units and execution flows in the cloud whilst offering a user defined consolidated view from any business & service management workstation.

Summary
Cost management is important for companies using cloud computing as a platform for application delivery as it provides a framework for planning and controlling decisions related to such services in terms of service value and computing cost. It also serves to help cloud computing vendors to deliver resources in a cost effective manner whilst maximizing value. Finally, knowledge of costs is required to make intelligent decisions related to cost-justified service quality.
January 29, 2009 at 11:45 am
I just wanted to share that I think this is a great idea, and I wouldn’t be surprised to see tools like this pop up everywhere in cloud developers’ toolkits.
I wrote a followup article here:
http://www.cloudcomputingeconomics.com/2009/01/cloud-computing-cost-profiler.html.
Great work!
January 29, 2009 at 12:15 pm
Thanks Jon for the appreciation.
Though I have made this all look relatively simple I do hope others realize that this has been achieved by shielding the complexity of the runtime from the user which in the real-world also needs to handle nested activities, disjointed traceability (workflows/transactions) whilst striking a delicate balance between overhead (not just in the runtime) and coverage by way of metering strategies. Hopefully I can elaborate on this more after people have fully digested this initial cloud computing entry.
Thanks again.
January 29, 2009 at 12:29 pm
I wanted to stress that the ABC approach we are pioneering allows for multiple resource meter(ing) models as well cost(ing) models to be consolidated into a single view and treated uniformly. We can apply the concepts of ABC to low-level runtime diagnostics as well as high level business service management. The concepts and model(s) do not change only the granularity of the activity, the modeling of a resource & cost (KPI) and the user perspective (developer, tester, system ops, or business executive).
Key to all of this is the context of the execution flow which is not present in any form whatsoever in a legacy (in terms of cloud computing) system management solution. As soon as cloud programming models become available and adopted, blackbox system management solutions will become obsolete (at least from outside of the cloud data center itself).
I do hope that this will inspire cloud service vendors to make available counters, charges, and billing information at the activity/interaction level (possibly piggybacked on remote request results) though in the end we need a standard.
For the actual cloud computing platform vendors they need to consider ways to allow customers to communicate the context (the code|service|business activity) of the resource consumption at various points in the execution flow to the underlying metering platform. This will become more and more important in the future when users look for different types of subscription packages that reflect their actual usage just like in other services industries. Software activity monitoring will take on a whole new meaning when the cloud is the foundation (revenue stream) for a business.
January 30, 2009 at 12:59 pm
http://bgracely-exft2009-wfumba.blogspot.com/2009/01/connect-dots-activity-based-costing-abc.html
“…. an interesting blurring between IT, Ops/Production and Accounting, especially when you start thinking about the business opportunities that happen when companies leverage the cloud and open development environments.”
January 30, 2009 at 2:36 pm
I would like to point out that we are not targeting specifically billing or a particular environment or customer type.
Our vision is that in the future everything we (users, companies) do will have a corresponding set of one or more software activities within the cloud itself (parallel universe). I am not talking about footprints (events) in the sand as in tweets, blogs, and facebook/myspace pages. I am actually talking about binary life streams flowing continuously allowing one to understand aspects of a users/group//department/unit/company behavioral patterns within a particular environment context (job, processes, operations, services, objectives/goals). This allows a company to achieve unprecedented insight into the actual functioning of a subject. If this becomes a reality then the actual execution of the software itself and not its superfluous container constructs (processes, host) will represent the best means to managing effectively a business and its services. Today with our software I can easily recreate the scene from THX 1138 were the robot decides to discontinue the pursuit of THX because the costs exceed the budget though I am not sure that is a good thing or bad thing.
Coming back down to earth our approach and solution has a number of differentiators that throw some light on our immediate short-to-mid term vision.
* The runtime, tooling, and modeling can be used in and outside of the cloud and across all application life cycle phases (dev, test, prod). This I regard as extremely important especially when mature cloud computing programming models become available. Why disconnect the data & models across phases? Why have separate management domains (performance, cost management, operations) within a particular phase (production) using different disconnected data sets and tooling? Nothing happens without software executing. We just have different perspectives on the actual activity itself.
* The activity & resource metering can be applied cloud applications that are subscription based (not transaction based). Here the solution can be used by both the user and platform vendor for resource management & capacity planning. “Are we planning on introducing or extending particular user services (activities) and do we need additional capacity to be added to our subscription.”
* Customers deploying applications to the cloud can embeded their own activity and cost models unbeknownst to the cloud computing vendor though with standardized markers/annotations (defined in the programming model) a cloud computing vendor can use such information to profile (in terms of resource usage) the activity and deliver a truly fine grain resource scheduler and provision infrastructure.
* Our resource metering and costing models do not have to record every single activity occurrence (unless it is an actual tx billing engine). We have an unique approach to metering using chained metering strategies that allows us to out perform (100 times faster) every single performance execution profiling solution on the market. We can easily drop down to the finest granularity in activity metering whilst not perturbing (or costing) the application or user. We have used SPECjvm2008 to verify this.
* We can analyse the firing and metering of activity from different perspectives collecting different analysis models depending on the users objectives. For example we can turn on activity tracking which allows us to see which activities lead to other (nested) activities firing and how the different in resource usage across these different tracks and paths. Some of our extensions to the bare metal probes (activity) runtime our developer focused (trackings, billings), others test (stack, inspections, asserts) and production (strategies, events). All based on the same underlying runtime. These extensions are not possible unless the runtime is within the process itself.
* Our model(s) can be used across managment domains: problem-, incident-, and service-management.
* We allow activities to represent abstract execution constructs and not just code execution or transaction entry points. Activities (probes) can have a composite name that merges the activity name with the context of the execution of the cost center hierarchy. Our naming hierarchy is unlimited – it does not presuppose that software execution points can be defined with just four name parts.
* The metering model can be accessed via our Open API (local) in real-time during the actual execution of the software itself allowing more sophisticative billing (resource counters) to be developed. No post mortem analysis.
We are basically building on the core of cloud computing or any other environment, the execution of software, and augmenting the model with different runtime data to support analysis from multiple management domains.
March 23, 2009 at 11:24 am
[...] Jinspired Jinspired commes forward with a java tool set called JXInsight. JXInsight is a comprehensive enterprise Java performance monitoring, problem diagnostic, transaction analysis and application management solution available in the marketplace today. The transaction oriented monitoring solution, JXInsight, offers application insight into business transactions consisting of multiple resource transactional units of work across distributed systems recording resource consumption and response for transaction patterns and contextual paths. Jinspired links performance management with cost efficiency to create cost groups for billing in case of hosting providers. They track costs per user for used resources. In other words, they relate costs to demanded services. An detailed example of Activity-based costing can be found at http://williamlouth.wordpress.com/2009/01/27/abc-for-cloud-computing/ [...]