Pages

Tuesday, November 17, 2015

Do Oracle's latest SPARC comparisons reveal more than intended?

by Bill Moran and Rich Ptak

At Open World 2015, Oracle announced its latest version of the SPARC microprocessor. In this blog, we focus on Oracle's performance claims versus others. We know that all vendors like to highlight their system’s performance advantages over competitors. Oracle is no different. Typically, claims are based on benchmarks tailored to a specific workload or are standardized. Standardized benchmarks have more or less rigidly enforced guidelines. The Oracle announcement claims to have advantages based on standardized benchmarks. Oracle (like any vendor) makes every effort to make their system look as good as possible. That is to be expected. We found no evidence of cheating. We do think their results call for commentary.

A few words about benchmark testing. Some years ago, there was a benchmark expert named Jack; he held a PHD in mathematics. He wanted to bet $100 that he could write a benchmark proving any system better than any other system. It didn’t matter which system was faster nor how different they were. He could ‘fix’ the winner. We didn’t doubt he could do that and didn’t bet. The point is if one completely controls the benchmark, one controls the result. That is why Industry standard benchmarks, e.g. SPEC[1], TPC[2], exist. However, some have more restrictions, e.g. TPC requires audit to certifiers and dictates how price/performance is calculated. This makes them very expensive and less likely to be run. In between TPC and Jack’s creation, SPEC’s less onerous rules make a good compromise. Care still needs to be taken when interpreting results.

Benchmark 1 SPECjEnterprise:
Oracle’s first performance point is based on results from the SPECjEnterprise2010[3] test. Table 1 is what the Oracle press release[4] presents. We added the last column
                  
System Tested
Result
Benchmark
Status[5]
Date of Test
SPARC T7-1
25,818.85
SPECjEnterprise2010 EjOPS
Unsecure
10/23/2015
SPARC T7-1
25,093.06
SPECjEnterprise2010 EjOPS
 Secure
10/23/2015
IBM Power S824
22,543.34
SPECjEnterprise2010 EjOPS
Unsecure
04/22/2014
IBM x3650 M5
19,282.14
SPECjEnterprise2010 EjOPS
Unsecure
02/18/2015
Table 1 SPEC JEnterprise 2010 results
Oracle did include the two best “IBM” results. However, the test date shows that the IBM Power result is 16 months old. Does this make any difference? We don’t know. But, it is quite conceivable that if the test were run with a newer system, the results would be better. The IBM X3650 result is newer. But that system was sold to Lenovo making the comparison irrelevant.

Other points to consider when evaluating the data include:
  1. SPEC benchmarks have no rules controlling calculation of price/performance, nor are system prices provided. Therefore, it is impossible to calculate the system price/performance. Comparing a $100K system with $500K one makes no sense without knowing the relative costs.
  2. For a generic benchmark like SPEC, it isn’t known how close it reproduces or reflects real workload performance. There is no guarantee that the advantages hold in production environments. A benchmark with system A running faster than system B, does not assure A outperforms B running a real workload.
  3. The “Status” column scores brand new Oracle security features announced at Oracle World and described in the press release (Footnote 5). Ellison also discussed them in his Open World kickoff talk[6]. Oracle claims these new security features are low cost. The results include when the features are turned on (secure), and when turned off (unsecure). Somewhat arbitrarily, IBM/Lenovo systems are labeled “unsecure”. It isn’t surprising IBM hasn’t implemented security features just announced by Oracle. But, that is no indication the systems are unsecure. We disagree with labeling it as such.

One final observation, browsing Spec jEnterprise benchmark results, one could conclude that Oracle’s performance degraded over the past several years. Why? The most recent Spec jEnterprise 2010 result in Table 1 is 25,818 EjOPS. But data from March 26, 2013 has Oracle reporting 57,422 EjOPS! Conclusion, performance degraded by some 50%!! It doesn’t make sense to us. But, that’s what happens when context is ignored and benchmark results are taken literally. We’ll leave it to Oracle to explain this one.

Another Benchmark: Hadoop Performance

Table 2 shows another Oracle benchmark in the press release.
System
Processors/Cores
Benchmark
Status
Oracle SPARC T7-4
4 Processors
32.5 GB/min per chip
Unsecure
Oracle SPARC T7-4
4 Processors
29.1 GB/min per chip
Secure
IBM Power8 S822L
8 node cluster
3.5 GHz – 6 Core
7.5 GM/min per chip
Unsecure
Table 2 Hadoop Terasort Benchmark
The Hadoop Terasort benchmark accompanies the Apache Hadoop distribution[7]. An examination of the results include both good news and bad news for Oracle. The good news is that the result seems to show that Oracle outperforms IBM by a factor of 4. But, there is no date given for this result. Were both tests run at the same time? Or is the IBM result, once again, older? As discussed, it makes a difference. Other context data is missing. Without system costs, there is no way to judge how realistic the comparisons are. The results do have a “gee whiz” factor but lack substance.

The bad news is a bit more subtle. Elsewhere, Oracle claims implementing their security features is very low cost. This result raises some questions as it appears performance degrades by about 10% with security turned on. Finally, the critique about labeling the IBM system unsecure still holds.

Another Benchmark: SAP performance

Perhaps the most useful commercial benchmark is the SAP benchmark.  Oracle has submitted a result for this benchmark as recently as last month (October). Table 3[8] shows result for the latest Oracle and IBM submissions.  
Vendor
SAPS
System
OS
Date
Oracle
168600
SPARC T7-2
Solaris 11
10/23/15
IBM
436100
E870
AIX 7.1
10/3/14
Table 3 SAP Benchmark Results
SAPS are the key performance metric; that it is closely related to a real SAP workload[9] adds further credibility. We can’t claim it proves that IBM does a better job than Oracle running all SAP workloads. However, it is an additional data point. More data as described earlier would provide better context for a decision.

One more comment, during his Open World keynote talk[10], Larry Ellison strongly emphasized that Oracle never sees SAP or IBM in competition for business in the Cloud. He repeated this multiple times. The Oracle PR department needs to know about this. The Wall Street Journal of 11/5/2015 had a front page ad[11] by Oracle detailing performance advantages versus SAP (in the cloud). The claim is that the Oracle database runs twice as fast as the SAP Cloud when compared with HANA. (NOTE: Ads only appear in the print version of the WSJ). If Larry is accurate about never seeing SAP in the cloud in competitive situations, the ad wastes money.

However, Oracle has written a white paper[12] to document this benchmark. Note the legal disclaimer at the top of the white paper. Oracle claims in the document that SAP has tried to conceal Hana performance so Oracle is running the benchmark to clear up this issue. We think that this situation is a minor version of the “benchmark wars” of the past. Frankly, we have neither the time nor the space to attempt to sort the whole issue out at this time. However, it does reinforce our point about the care needed to interpret benchmark results.

The Final Word

We’ve pointed out some concerns with Oracle’s claims including highlighting some contradictory claims regarding their competitors and competition. In fairness, Oracle usually does just present a benchmark result letting the reader draw their own conclusions. (Okay, they would nudge the reader toward a conclusion.)

We’ve tried to present a bit more context around Oracle’s benchmark results. We’ve also pointed out that benchmark data must be treated with care. Clearly, benchmarks using real production workloads (or a subset) running on multiple systems with configuration and cost details included are most credible.  Other comparisons can be significantly cheaper but should be less trusted. Be wary of unsubstantiated, poorly documented claims whether in whatever source. Better decisions will result. 

One final word, we recently received (November 16, 2015) a press release which included product information and performance claims. It discusses OpenPOWER Foundation member activities with IBM’s Power Systems. It has great information. It also provides great examples germane to this paper. For instance, the last sub-paragraph describes OpenPOWER Server providing Louisiana State University a 7.5x to 9x performance increase over a competitor’s server doing Genomics Analysis. It is footnoted with server details, and a link to an LSU white paper with additional details about the systems and benchmarking.  We think you’ll appreciate the difference.



[5]  Oracle rates just announced ‘security’ features available only on their systems. Obviously a 6 month old IBM or Lenovo system wouldn’t include these features. See comments later in the text,
[10] You will find Ellison’s  talk at https://www.oracle.com/openworld/on-demand/index.html
[11] Oracle provides a URL in the ad – corrected to the following: https://www.oracle.com/corporate/features/oracle-powers-sap.html. The copy of the WSJ ad appears on the right side of the page.

4 comments:

  1. What I find amusing is that the article tries to shoot holes in the Oracle benchmark claims, maybe because you’ve been provided all the points from IBM, but the arguments are quite weak.

    You make claims that the IBM results are old "However, the test date shows that the IBM Power result is 16 months old. Does this make any difference? We don’t know". Why didn’t you do any research to answer this? Because you know the answer!!Duh!

    If IBM had faster systems today, don't you think they would have published newer/faster results? That’s because IBM is shipping the same exact systems as they have last year and so the results listed are the current fastest from IBM, and therefore are as good as new.

    And you compare a single chip SPARC T7-1 result to Oracle's previous 2x8-chip SPARC T5-8 result and state, oh, gee SPARC T7 is slower?: "Conclusion, performance degraded by some 50%!! It doesn’t make sense to us". Yes it does, but you'd rather make the statement and claim ignorance!

    You're comparing an apple to a bucket full of apples-Duh! If you looked closely at the two results, you'll see that a SPARC T7-1 with just a single SPARC M7 CPU chip performs about 3x faster than the SPARC T5 chip in the SPARC T5-8!

    And for your SAP comparison, why did you choose to show IBM's high end E870, with 8 x Power8 CPU chips, that has a list price around $1M and compare with the SPARC T7-2, which lists for around $80K?

    IBM's Power S824 also has SAP results, 115,870 SAPs, and with a list price of about $100K, is clearly inferior to the SPARC T7-2 which is ~50% faster and 20% lower cost?

    And finally, for the Hadoop comparison, would have been quite simple to pull up pricing from the web. The SPARC T7-4 standard configuration w/1TB RAM is about $183K and is listed here: https://shop.oracle.com/pls/ostore/f?p=dstore:5:554995537056::NO::P5_PROD_HIER_ID,P5_LPI:126825848267061387258361,126825849299541387268765:

    while the S822L pricing is here: http://www-03.ibm.com/systems/power/hardware/s812l-s822l/browse.html

    A Power S822L 24c/3GHz with a measly 96GB shows $18,183 IBM Web price. Upgrading to 512GB RAM as IBM configured benchmark, price goes up to ~ $41,015/system. Yeah RAM is expensive.

    IBM ran 8 x Power S822L servers, so that’s 8 x $41K = $328,120. Add all the networking gear needed to connect all this together, and you're over $400K.

    And so a single SPARC T7-4 ran 10TB Terasort @ 141GB/min, whereas the IBM S822L solution did 30.1GB/min per node, 241 GB/min for 8 nodes. So calculating basic price/perf, SPARC T7-4 is at $1,299/GB/min vs S822 @ $1,777GB/min, or 37% higher cost/performance.
    You can see details of IBM's Terasort benchmark here:
    https://www-950.ibm.com/events/wwe/grp/grp030.nsf/vLookupPDFs/Power%20Business%20Analytics%20Solution%20overview/$file/Power%20Business%20Analytics%20Solution%20overview.pdf

    ReplyDelete
  2. and Bill Moran reply:

    “Unknown”, a critic of an earlier blog is back. We think this is the same Unknown (based on style/tone/errors). Hiding behind anonymity allows him to conceal his biases. We do identify ourselves; we are not employees of any vendor. The last time with some additional research, we refuted his claims. We won’t put the same effort into responding to this comment, but will address some of the claims.

    First of all, he asserts our comments are wrong “maybe because you have been provided all the points from IBM”. The blog entry was based on our research and review of published material. We did quote from both IBM and Oracle publicly available results but we had no assistance or suggestions from IBM.

    Displaying amazing ignorance of the benchmarking world, Unknown asserts that using old IBM results makes no difference. When running a benchmark, you assemble the best system available at that time. Running it a year later, many things change, each likely making a material difference in the results. For example, there may be a new OS version or the system may have upgraded microcode. New system fixes unavailable on the older system can alter performance. Additional time tuning may result in changes that improve the results. There are other possibilities, but the point is timing is significant. Stating “We don’t know” how much difference the time lapse makes is an accurate statement. We stand by it.
    Unknown asserts that “if IBM had faster system today don’t you think that they would have published newer/faster results”. Superficially this comment makes some sense. However, running benchmarks is expensive. At any given time, any vendor has a huge portfolio of potential benchmarks they can run. It is a business and economic decision picking what to run or update. No vendor can afford to redo all old benchmarks. Most sophisticated observers understand the limitations of an old benchmark. Unknown obviously doesn’t.

    Our tongue-in-cheek comment about Oracle’s performance degrading by 50% was meant to highlight the idiocy of taking results out of context. Apparently, it was too subtle for some. Hey Unknown, wake up! We were not serious.

    Unknown’s claim about the SAP benchmark is nonsense. We compared Oracle’s best result to IBM’s best. By Unknown’s own logic, Oracle must be presenting their best results. They selected which system to highlight, not us.

    Finally, Unknown’s claim that it is easy to find system pricing is nonsense. Sure, pricing is all over the web. However, a benchmark requires pricing that is accurate, comprehensive and detailed to guarantee a valid comparison. Anyone, experienced in price/performance comparisons understands the difficulty.
    TPC benchmarkers expend significant effort to get valid pricing for the price/performance comparison reporting. There is a reason why SPEC and SAP benchmarks do not include pricing. Vendors have been known to leave out the pricing for essential software, under-provision infrastructure configurations or lower system list prices to show an advantage (recovering the difference through expensive mandatory maintenance contracts).
    TPC finally chose a five year TCO to make valid pricing and cost comparisons. They also had an auditor for each benchmark. Unknown is either naïve or ignorant about pricing issues.
    We rest our case. We have spent enough time responding to Unknown.

    ReplyDelete
  3. A few more words on benchmarking in general. There are clearly 2 types of benchmarks. One type is akin to an automotive race where the only thing that counts is the raw performance. SPEC benchmarks are of this type. The other type includes measurement of price/performance. Both types are valid. If you are going to include price/performance then steps need to be taken to guarantee that the pricing is arrived at in a defined way. Clearly, you want the price to be comparable among the systems that you are comparing. The methodology that TPC uses is instructive in this respect. We think that taking SPEC results (or other benchmarks of the first type) and trying to massage them by adding some estimated price is not likely to give a very accurate comparison. At best the result is useful only for a so called "back of the envelope" calculation. Most customer driven benchmarks normally need pricing information so the customer can compare the different systems that they are considering. However, the customer usually relies on the vendor to propose a system. As a part of the proposal the vendor will have to price the system that they hope the customer will order. This type of benchmark along with the TPC type has the disadvantage that they are expensive to complete.

    ReplyDelete
  4. Adding to what Rich & Bill have said. This "Unknown" looks like the classic Oracle marketing troll who monitors all Oracle & competitive articles defending Oracle kit while spewing the same venom using wrong, misleading and unsubstantiated data to denigrate competitive solutions. You can go to any of the top technology sites to see this person's signature BS.

    Oracle continues to refer to the S824, S822, S822L as a 4 chip system despite it being a 2 socket - so sayeth IBM btw. They then compare these servers which max out at 24 cores against a 4 chip T7 which happens to be the T7-4. For those who aren't aware, each T7 chip consists of 32 cores. Setting aside the T7/M7 chip is a socket consisting of 8 x quad core chiplets that strains to feed memory & I/O they disingenuously compare a 4 chip T7-4 which is 128 cores (yes, that is NOT a typo) against what they claim is a 4 chip S824, S822 or S822L; all are 2 socket servers totaling no more than 24 cores.

    Since Oracle charges most of their enterprise software by core, the strength of the core continues to be King. They are either incapable of engineering strong cores or purposefully do so to stretch as many Oracle licenses as possible when customers loyally buy their products (why they continue to serve this devil is another blog in itself).

    Oracle doesn't benchmark ExaData although it claims performance superiority (and customers still buy it - wow!). They cherry pick benchmarks for SPARC that are either obscure, in their own category or using their own software while not allowing competitors to run benchmarks on their kit using Oracle software. The latest joke by Oracle is they bought a POWER8 server running HammerDB then trying to claim 5.5X & 3% performance improvements. I wrote on this Register article where the 5.5X is their T7-1 which they called a 1 chip vs 1 chip of a S824 which was actually a 12 core socket and 24 cores in total. The T7 stock frequency was 4.13 vs 3.52 for the S824. Yet, on a per core basis they only achieve a 3% performance increase. Since they also published very limited description of the POWER8 S824 server details it calls into question their entire test. Why wouldn't they allow IBM to run the benchmark with AIX and Oracle DB? Why not use any of the many supported OSes out there for Linux on Power from MariaDB, MySQL, PostgreSQL and 20+ more? Instead they chose HammerDB.

    Rich & Bill - your indignation is justified.

    ReplyDelete