ABC for Cloud Computing
January 27, 2009
How Fast and at What Cost?
Whilst there are many benefits to cloud computing in terms of dynamic provisioning and on-demand scaling it presents challenges to IT management organizations in the performance monitoring of applications and streamlining processing costs as result of cloud computing service charges. Fortunately there is a common solution to tackling both challenges that offers to make the cost of both the operation and quality of an IT service delivered transparent, measurable and manageable. A solution that allows one to explicitly model the relation between the performance of an IT service and its operating costs when delivered at particular levels of quality (performance) to consumers – activity-based costing (ABC).
Activity-based costing consists of a service costing method and a resource consumption model that facilitates better decision-making by interpreting relationships between services delivered and their operational costs.
Advice: I strongly recommend that you read our Metering the Cloud article as well as some basic information on the mocked application and technologies before proceeding if you have not already done so.
Activity & Resource Mappings
An activity is represented as a named Probe with composite names representing hierarchical cost centers (Groups).
A resource is represented by either a Counter or Meter. The main difference between a Counter and a Meter is that changes to a Meter (which can in fact be mapped to a counter) are tracked at the activity level via a Metering associated with a Group (full or partial name).
Note: The mock application has been simplified significantly leaving out nested activities (probes), tracking of costs by context paths, and optimal metering strategies (cost versus overhead).
Within each of the mock cloud technology components I have inserted instrumentation for the purpose of tracking and costing of software activities (Probes) and resource usage (Meters & Counters). These lines are bookmarked in the left margin.
Note: The instrumentation inserted would normally be weaved into the code base transparently at load-time using our aspect libraries.
Mock Twitter WepApp
The Twitter WebApp component mimics the handling of web requests dispatched by the Google Engine. The request handling code is the main entry point from which requests can be dispatched to two service points – status and timeline.
The status service point will update the latest status text associated for the specified user by writing the binary version of the text to an Amazon S3 Bucket keyed on the user’s name.
The timeline service point will list previous status updates for the specified user by reading all objects stored in an Amazon S3 Bucket keyed on the user’s name.
Tracking relevant activities (software services) is the foundation of a successful ABC model so this is the only component in the application that is instrumented with named Probes. A single Probe is fired and metered in the request handling code creating a dynamic composite named Probe with the following hierarchical cost center (metered groups):
twitter.users.${user}.${service}
In addition a Counter, representing a service charge, is incremented for each request delivered to the status service point. Charges are assigned to status updates by the user and not the viewing of such updates by the user and others.

Mock Google Engine
The Google Engine component mimics a web request processing dispatcher. It reads a single line representing a URL from a text console and forwards both a Request and Response object to the registered Handler.
From the perspective of the Twitter application the software execution of the Engine is modeled as a resource (with multiple cost driver) and not as an activity with Counters accumulating both the counts of requests dispatched and associated CPU usage.

Mock Google Engine Request
The Google Request component mimics the reading and parsing of a web request. A single Counter is maintained (across threads) by the Request component tracking the number of bytes read which is used by Google in its cloud computing billing rate plan.
Note: Counters are thread specific so their accumulation can easily be mapped to a metered resource (meter) which is in turn mapped to a metered activity (probe) executed by a request handling thread. This is a very important distinction with legacy system monitoring solutions working with metrics at the process level and oblivious to the causality of changes.

Mock Google Engine Response
The Google Response component mimics the writing of a web response back to the user. It also increments the same Counter used by the Google Request component recording the number of bytes written out.

Mock Amazon S3 Bucket
The Amazon Bucket component mimics the storage capabilities of the S3 service. Tweets that have been transformed into binary form are written and read from a Bucket maintained on per user basis (ignoring the limits imposed by Amazon).
Unlike the Google Request and Response components the Amazon Bucket maintains two distinct Counters tracking bytes transfered during read and write operations to and from the Bucket.

Test Scenario
The following series of console input lines have been executed with each of the resource metering model configurations depicted below.
in: service/status/user/wlouth/text/hello world
in: service/status/user/wlouth/text/hello cloud
in: service/timeline/user/wlouth
out: hello cloud
out: hello world
To help with understanding the execution flow I have collected a probes tracking model based solely on code level instrumentation performed by our load-time bytecode weaving agent and some aspect extension libraries.
Note: In this partial ABC model every method execution is represented as an activity itself which is not necessarily what we would want in the context of the large cloud application.
![]()
Resource Metering Models
Executing the test scenario (this time with no code level instrumentation) we get the following resource Metering table view that tracks the changes in Meter usage at the activity level (leaf node) and associated cost centers (root and inner nodes). This type of data is similar (to some degree) to what is found in transaction monitoring and hotspot (CPU) detection profiling tools.
Note: The
Countis the number ofProbesfired and metered. TheTotalis the cumulative change in theMeterduring this period.

Here is a table listing the values of the inserted Counters at the process level. This is type of the data presented by system management tools with no actual knowledge or tracking of the software activity treating the process largely as a black box with a basic and flat metric surface.

What we really need initially is a way of combining these two perspectives especially as our monthly cloud computing billing is closely tied to increases in one or more of these resource counters updated by the cloud platform and cloud service components.
Well it is actually not so hard (at least when using our resource metering runtime) because Counters are in fact thread (activity) specific and only aggregated at the process level when the counting probes provider extension is enabled.
Lets create some Meters mapped to to Counters and execute our test again with the following configuration.

Below is the new Metering table view captured following the test execution. With just an external configuration change we can now see which activities impact which Counters via the Meter mapping.
Additionally we have introduced a charge back mechanism by way of the twitter.service.charge Meter (more on this later).

Cost Metering Models
Now lets try to introduce a more explicit cost model by introducing unit cost Meters based on the Counters with some relabeling of the mapped Meters to hide the underlying technologies and billable cloud services by appending the following properties to the configuration.

Below is the revised metering model after executing the test scenario. The model now includes a new unit cost based meter, io.cost, representing our storage costs. From the model we can see that the storage cost in delivering the status and timeline services (activities) to user wlouth is 66 (io.read.bytes*1 + io.write.bytes*2). This is higher than our service charge units (20) so we might need to revise things slightly in our next configuration.
Note: A good practice (of abstraction) in cost modeling is to create an internal unit charge scheme and associated unit currency.

Our final configuration assigns a unit cost to our service charge Meter as well as relabeling the mapped Meter.
![]()
And here is the resulting model. Yes! The cloud application is making profit (well at least in terms of the storage cost).

An important benefit in our innovative resource metering approach is that we now have a single model for both cost management and service management (performance & capacity management) that can in addition provide business insight into software service sales (and in real-time alongside operational costs).
This unified model allows performance engineers to much more easily factor in cost when making trade-offs (in terms of resource usage) in order to improve the response times and throughput of one or more software services. With each possible performance optimization (CPU vs IO) the cost of the change can be evaluated. Here is a table that combines the code centric and service centric views into a single model.

Naturally the model lends itself to being distributed (and disconnected) across multiple processing units and execution flows in the cloud whilst offering a user defined consolidated view from any business & service management workstation.

Summary
Cost management is important for companies using cloud computing as a platform for application delivery as it provides a framework for planning and controlling decisions related to such services in terms of service value and computing cost. It also serves to help cloud computing vendors to deliver resources in a cost effective manner whilst maximizing value. Finally, knowledge of costs is required to make intelligent decisions related to cost-justified service quality.