Thursday, May 3, 2012

TPC benchmarks for OLTP databases

TPC states for Transaction Processing Performance Council, which develops benchmarks and publishes results. The benchmarks test both software and hardware. There are two active benchmark for testing hardware and DBMSs running OLTP applications: TPC-C and TPC-E. TPC-C is rather old, but still is active benchmark, while TPC-E is quite new. It is interesting to see that TPC-E results cover only MS SQL Server out of different database engines, while TPC-C is popular for Oracle and DB2. The latest version of MS SQL Server presented in TPC-C results is 2005. This odd participation of database vendors in benchmarking is a sign. What are the problems with TPC-C and TPC-E? Let's have a look to TPC-C first.

TPC-C defines an application with highly partitioned database, where each database partition contains small amount of data and accessed by rare transactions. TPC-C specification requires that only around 10 transactions per second are performed on a partition. Among them less than 20% of transactions access data from more than one partition. Is it realistic example? High-performing DBMSs hardly can show much better performance than old DBMSs on TPC-C benchmark. I believe that TPC-C is dead.

It is important to pay attention when vendors publish their own TPC-C results, which are not reported on TPC web site. This usually suggests that they run unofficial implementation of TPC-C, which is greatly modified to get high throughput. Thus comparison of such benchmark results is not possible due to large differences between benchmarks. Moreover it is not possible to run the same benchmark, e.g., faked unofficial implementation of TPC-C, on different DBMSs and publish results due to the DeWitt Clause, which is included in license agreement of Oracle, MS SQL Server and such.

The other OLTP benchmark, TPC-E, is much more complex than TPC-C and has significant limitation on how an application and a DBMS are integrated. TPC-E requires that application simulators and backend databases communicate through modules provided in implemented code as part of TPC-E specification. Thus TPC-E assumes only traditional architecture, where a DBMS and application code run in separate processes or on separate servers. This makes impossible to utilize VMDBMS concept, where a DBMS and application code run in the same process to avoid unnecessary data movements and transformations between the DBMS and the application business logic.

Since only Microsoft runs TPC-E benchmark, it seems that TPC-E will never become a popular benchmark.

Among all TPC benchmarks the most popular and live one is TPC-H, which is oriented to data warehouses or databases supporting OLAP-kind applications.

No comments:

Post a Comment