Pages

Wednesday, July 17, 2013

IBM versus Amazon in the Cloud


By Bill Moran

IBM and Amazon have been battling it out over a cloud contract with the CIA. The CIA awarded the contract, valued in the press at $600 million to Amazon. IBM protested this decision, which was then reviewed by the GAO. The GAO came down on IBM's side on several issues while rejecting IBM's claims on others. The GAO recommended that the CIA should reopen the bids. However, it is our understanding that the GAO does not have the power to force the agency to reopen the bid, but agencies do follow GAO recommendations in a great many cases. They can require the agency to review the situation and announce a decision within 60 days.

This caught our attention because the CIA was quoted as saying that the Amazon solution was technically superior to IBM's. We assumed that the evaluation team at the CIA was technically competent, and we hoped that we would learn something by reviewing their evaluation, at least the part made public. The GAO evaluation of the whole situation is publicly available, that document is the basis for our comments (parts of the GAO report were redacted before making it public).[1]

Here is the evaluation summary as it appears in the GAO report:

Amazon
IBM
Technical Management


 --Technical Approach (Demo)
Very Good
Marginal
 --Technical Approach (Written)
Exceptional
Very Good
 --Service Level Agreements
Very Good
Satisfactory
 --Management Approach
Satisfactory
Very Good
Past Performance (confidence)
High
Moderate
Security
Pass
Pass
Proposed Price
[deleted]
[deleted]
Evaluated Price
$148.06 million
$93.9 million
Guaranteed Minimum
[deleted]
[deleted]
Overall Proposal Risk
Low
High
Figure 1 GAO Evaluation Summary
Let’s summarize what we learn from the evaluation, prepared by the CIA and partially redacted in the GAO’s published version. The proposals were evaluated in 10 categories. Amazon wins in 5 categories, while IBM wins in 2. They tie in Security. We are not given rankings for Proposed Price and Guaranteed Minimum, nor do we know the weights assigned any category. We will comment further after examining the rest of the CIA’s evaluation as reported by the GAO.

 

 More details on the requirements

The Request for a Proposal (RFP) issued by the CIA defined what the Agency wanted the vendors to supply. The CIA wanted the chosen vendor to implement their public cloud on government premises. Two types of services are defined: 1) infrastructure as a service (IaaS) and 2) platform as a service (PaaS). For IaaS, the contractor would provide networking, storage, servers, virtualization, and cloud. The government would provide the operating system, middleware, runtime, data, and applications. For PaaS, the government would provide data and applications while everything else is provided by the contractor.  In both cases, the RFP required the cloud software to support auto-scaling. This is a mechanism that allows the computing resources being used by an application to be increased or decreased automatically at any time in response to the needs of that application.  It is important to note that any application, particularly customer written ones, needs this capability.

The RFP required the vendors to address six hypothetical scenarios. The GAO does not define 5 of these very well. However, they do give details on the one identified as number 5.  The vendors supplied pricing based on the scenarios. They also had to provide a catalog of services to the government, which included prices for all services. Scenario 5 was a PaaS which meant that the government would supply only the data and the applications with the vendor supplying everything else. The Scenario 5 application was defined as data analysis. We quote portions of the description as provided by the GAO:

“The scenario centers around providing a hosting environment for applications which process vast amounts of information in parallel on large clusters (1000s of nodes of commodity hardware in a reliable, fault tolerant manner…The solution to this scenario should automatically provision clusters of compute for the segmentation and processing of input datasets via the MapReduce framework where the vendor is responsible for the management of the OS and the MapReduce implementation. Assume a cluster large enough to process 100TB (terabytes) of raw input dataAssume a duty cycle of 100% for all virtual machines associated with this scenario.

Apparently, Amazon and the government interpreted this scenario to mean that this application should run continuously throughout the year. IBM decided that the application should run only once per order[2] and IBM’s pricing reflected this point. However, before responding to the RFP, IBM asked for clarification on this point from the Agency. The clarifications provided did not shed any light on CIA requirements, so IBM’s final pricing reflected the one time per order execution. However, the GAO rejected IBM’s protest about the ambiguity of the requirement. Apparently, the government rule is if you respond to a requirement that you consider ambiguous, you cannot protest at a later time if the agency interprets the requirement differently than you did.

However, IBM protested the CIA’s follow-on actions to its pricing. The GAO supported that protest. The CIA realized that IBM’s and Amazon’s prices were not comparable since they had priced different things, as described above. The agency then used the price catalog to adjust each company’s price in an attempt to get them on a level playing field. However, Amazon did not supply how long a single run of the 100 TB would take. As a result, the CIA had no idea how many runs Amazon actually priced per year. Therefore, it was impossible to make the IBM and Amazon prices comparable.

IBM also protested an adjustment made to the contract by the CIA after awarding Amazon the contract. The RFP specified that the vendor would have to certify any software provided to the government under this contract as virus free. The key point here is the word “any.” Amazon indicated their intention to use Red Hat Linux and MYSQL, so they wanted to be excused from the requirement to certify open source software. They asked the agency to require them to certify only software that they (Amazon) wrote. The CIA agreed to change the requirement for Amazon. IBM protested that this type of change could not be made for only one bidder, and the government should have communicated this change to all bidders. The GAO agreed with IBM that reducing the requirements only for Amazon was a material change to the contract. 

In another area, the issue of auto-scaling applications in the cloud came up. There is no doubt IBM agreed to provide this function as part of the cloud it intended to supply the government. However, the issue seems to be in how IBM addressed the question of whether or not its existing public cloud had this ability. The government says that IBM gave different answers to this question. It seems at one point IBM said that only IBM-supplied applications had this ability, but not other applications. The agency said that IBM had not said how it would modify its cloud to provide this capability and this omission created substantial doubt about the technical feasibility of IBM’s proposal. IBM protested the fact that this situation resulted in a minus for them. The GAO disallowed their protest on the ground that the CIA action was reasonable under the circumstances.

The final IBM protest related to past performance of Amazon. As was well documented in the media (NY Times, Wall Street Journal), there were significant Amazon cloud outages during the second half of 2012. These outages were due to problems with Amazon’s datacenter in northern Virginia. IBM said that the government should have taken account of these outages when evaluating Amazon. The CIA said they ignored these media reports on outages because they did not provide enough information and there was no information on the effect of these outages on Service Level Agreements (SLAs). The GAO supported the CIA and rejected IBM’s protest.  The GAO included a statement from the Jet Propulsion Lab (located in California) that said Amazon outages (in Virginia) had no effect on them[3].

 

PNA analysis

In fairness, we must say the GAO report which may have been drafted and reviewed by a panel of lawyers is relatively readable. Some ambiguities do remain and some information has been deleted. However, under the circumstances, the GAO does a reasonable job in laying the situation out.

Any evaluation must take into account the fact that we do not have access to several key documents: 1) the RFP itself, 2) the vendor responses to the RFP and 3) the detailed CIA response to the vendors. There were originally 5 companies competing for this contract; namely, Amazon, IBM, Microsoft, AT&T and “X”[4]. It is unclear how the contestants were winnowed down to only IBM and Amazon. The GAO does mention that Microsoft and AT&T protested the first version of the RFP and that the agency rewrote it. This is an indication that the procurement was troubled from the start.

If we return to the evaluation summary in Figure 1, several things stand out. First, there is the large price discrepancy. We do not know the relation between the price shown and the price supplied by the vendors. We do know that the CIA adjusted the price in a way that prompted IBM to protest so we can assume IBM’s price was increased relative to Amazon’s. Even after this adjustment, Amazon’s price remained almost 58% higher than IBM’s. In our view, this large a difference should require significant advantages somewhere to overcome it.

Frankly, the only technical difference or superiority of Amazon actually mentioned is the auto-scaling situation. Since IBM committed to provide this function in their cloud, we do not see it as a major issue. Its presence or absence in IBM’s current cloud is a bit of a tempest in a teapot.

Unfortunately, it appears that IBM’s response to the RFP, clearly gave the CIA an opportunity to mark them down for not addressing some issues very well. According to Figure 1, the CIA considered IBM’s demonstration to be far inferior to Amazon’s. They also thought IBM’s written proposal did not match Amazon’s.

One final comment on Figure 1 – we are surprised that the government’s premier intelligence agency would allow security to be evaluated on a pass/fail basis, rather than an in-depth evaluation of cloud security. This raises the question: Why was it not evaluated on a more granular basis?

While IBM did not put its best foot forward in its RFP response, the CIA also did a poor job with a poorly worded RFP and then fumbling the evaluation process.  When IBM queried Scenario 5, the agency had a clear responsibility to clarify its meaning to ensure comparable price quotes from the respondents. They failed to do so. One can argue that IBM could have protested more vigorously at that point, true. However, the CIA wrote the RFP. It was their responsibility to correct it.

In our opinion, the CIA showed bias in favor of Amazon. They too casually brush off Amazon’s outages. Instead of saying they lacked enough data about the outages; they should have done their due diligence. They should have asked Amazon to supply the names and details of affected SLAs and customers. Finally, the agency should have treated both vendors in the same way, i.e. by requesting outage records and SLAs affected. If this information was not requested in the RFP, it should have been since it is critical in doing an evaluation of the contractor’s past performance in operating a public cloud. We believe the CIA’s rather feeble explanation of their response to this issue should not have been accepted by the GAO.

Finally, the way the CIA changed the requirements to exempt only Amazon from the responsibility of certifying all cloud software as virus free was clearly out of bounds. The GAO correctly sided with IBM on this issue.

 

Conclusion

We think that given the delay this procurement has gone through and its cost[5], it would be prudent for the CIA to get some assistance in rebidding the contract. They clearly need it. For IBM, it is clear that they need to do a better job in writing and presenting their proposal. IBM should be careful not to give the agency any opportunity to magnify any error that they might make. They do not need any other issue like “auto-scaling” to bedevil them the next time around. Finally, IBM’s recent acquisition of SoftLayer Technologies should strengthen their hand considerably when responding to a new RFP even though it is possible integrating SoftLayer technology into the response may present some challenges and require careful footwork to make everything fit smoothly together.


[1] See http://www.gao.gov/assets/660/655241.pdf
[2] We know this is confusing, but the different types of orders are not defined in the GAO report, nor is it made clear how many of these types there are.
[3] It is not clear why the GAO thought the JPL in California was likely to be affected by outages in Virginia.
[4] “X” is not identified by the GAO.
[5] The GAO has said IBM will be reimbursed for the cost of their protest.