The TPC-H is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. Pandas 2.0.0 RC0 and RC1 has been released. According to the release notes, the main new features in 2.0.0 focus on performance improvement. The benchmark runs on my laptop: ``` MacBook Pro (16-inch, 2021) Chip Apple M1 Max Memory 32 GB ``` I used pandas 1.5.2 as the baseline and tested all the 4 combinations of new features: ``` 2.0.0rc0 2.0.0rc0 + lazy copy 2.0.0rc0 + pyarrow dtype backend 2.0.0rc0 + lazy copy + pyarrow dtype backend ``` And here's the results, the numbers are running time in seconds. Lower means better. | | round 1 | round 2 | round 3 | average | |----------------------------------------------|---------|---------|---------|:-------:| | 1.5.3 | 11.94 | 11.81 | 11.94 | 11.89 | | 2.0.0rc0 | 17.38 | 17.50 | 17.25 | 17.37 | | 2.0.0rc0 + lazy copy | 16.39 | 16.51 | 16.52 | 16.47 | | 2.0.0rc0 + pyarrow dtype backend | 51.89 | 52.55 | 52.60 | 52.34 | | 2.0.0rc0 + lazy copy + pyarrow dtype backend | 53.51 | 53.92 | 54.19 | 53.87 |