Thursday, May 3, 2012

TPC benchmarks for OLTP databases

TPC states for Transaction Processing Performance Council, which develops benchmarks and publishes results. The benchmarks test both software and hardware. There are two active benchmark for testing hardware and DBMSs running OLTP applications: TPC-C and TPC-E. TPC-C is rather old, but still is active benchmark, while TPC-E is quite new. It is interesting to see that TPC-E results cover only MS SQL Server out of different database engines, while TPC-C is popular for Oracle and DB2. The latest version of MS SQL Server presented in TPC-C results is 2005. This odd participation of database vendors in benchmarking is a sign. What are the problems with TPC-C and TPC-E? Let's have a look to TPC-C first.

TPC-C defines an application with highly partitioned database, where each database partition contains small amount of data and accessed by rare transactions. TPC-C specification requires that only around 10 transactions per second are performed on a partition. Among them less than 20% of transactions access data from more than one partition. Is it realistic example? High-performing DBMSs hardly can show much better performance than old DBMSs on TPC-C benchmark. I believe that TPC-C is dead.

It is important to pay attention when vendors publish their own TPC-C results, which are not reported on TPC web site. This usually suggests that they run unofficial implementation of TPC-C, which is greatly modified to get high throughput. Thus comparison of such benchmark results is not possible due to large differences between benchmarks. Moreover it is not possible to run the same benchmark, e.g., faked unofficial implementation of TPC-C, on different DBMSs and publish results due to the DeWitt Clause, which is included in license agreement of Oracle, MS SQL Server and such.

The other OLTP benchmark, TPC-E, is much more complex than TPC-C and has significant limitation on how an application and a DBMS are integrated. TPC-E requires that application simulators and backend databases communicate through modules provided in implemented code as part of TPC-E specification. Thus TPC-E assumes only traditional architecture, where a DBMS and application code run in separate processes or on separate servers. This makes impossible to utilize VMDBMS concept, where a DBMS and application code run in the same process to avoid unnecessary data movements and transformations between the DBMS and the application business logic.

Since only Microsoft runs TPC-E benchmark, it seems that TPC-E will never become a popular benchmark.

Among all TPC benchmarks the most popular and live one is TPC-H, which is oriented to data warehouses or databases supporting OLAP-kind applications.

About interview with Mike Stonebraker on published an interview with Mike Stonebraker, who is a well-known scientist in database technologies and founder of several database products.

Mike repeats his main statement that one size does not fit all, which he proves with founding and developing different database products specialized for different application sizes. Examples of such products are in-memory OLTP database VoltDB and DW/BI database Vertica. Mike argues that current data operated by ACID transactions, should be managed by an OLTP engine, e.g., VoltDB, while historical data should be moved into analytical database system, such as Vertica. Thus getting much better performance for the two different tasks and I fully support him.

For the benchmark comparison, Mike refers to TPC-C benchmark and comparison between VoltDB numbers and legacy DBMSs numbers. Unfortunately, this comparison is unfair. VoltDB runs a modified version TPC-C, which does not follow the TPC-C specification and, thus, the benchmark results are not published on TPC-C web page. VoltDB implementation of "TPC-C benchmark" is biased towards to VoltDB, since VoltDB does not allow concurrency on the same database partition. Note that the original TPC-C is biased to legacy database and limits benchmark result by underlying hardware. (I hope to find time and write a small post about problems with TPC benchmarks for OLTP databases)

In general, Mike Stonebraker plays important role in modern DBMS development. I highly recommend to read the interview, read his papers and listen his presentations.