This article compares the chain_linearize and chaincluster branches on the original 4.5-day TxGraph trace. The chain_linearize branch uses GenericClusterImpl for all clusters (including chain-shaped ones); the chaincluster branch uses ChainClusterImpl for chain-shaped clusters, yielding O(N) fast paths and lower memory.
Trace file: txgraph.trace.4.5days.final (original trace, ~552 MB)
The comparison script is available in the bitcoin/bitcoin chaincluster_time_mem_bench branch.
bash /path/to/compare_before_vs_chaincluster.sh chain_linearize chaincluster time
GetMainMemoryUsage is based on TxGraph’s internal Cluster::TotalMemoryUsage and measures TxGraph memory. See ChainCluster Memory Comparison: Three Methods for measurement details.
| Branch | Final (bytes) |
|---|---|
| chain_linearize | 1,273,368 |
| chaincluster | 1,232,424 |
Difference: chaincluster uses 40,944 bytes less (~3.2%).
| Entry point | chain_linearize Total (μs) | chaincluster Total (μs) |
|---|---|---|
| DoWork | 1,637,936 | 1,293,445 |
| CompareMainOrder | 1,600 | 1,553 |
| GetAncestors | 1 | 0 |
| GetDescendants | 8,459 | 8,405 |
| CountDistinctClusters | 471 | 2,467 |
| GetMainMemoryUsage | 151 | 154 |
| GetMainStagingDiagrams | 20,079 | 15,866 |
| IsOversized | 25,534 | 107,668 |
| StartStaging | 83 | 51 |
| AbortStaging | 0 | 0 |
| CommitStaging | 9,563 | 10,305 |
| TOTAL | 1,703,877 | 1,439,914 |
The table sums CPU time (μs) per entry point. Wall-clock time (actual elapsed time for the full replay, including I/O) is measured separately: chain_linearize 8 s, chaincluster 7 s, difference -1 s (-12%). CPU time difference: chaincluster is ~15.5% faster (TOTAL μs).
The chaincluster branch uses ChainClusterImpl for chain-shaped clusters (size≥2), which provides:
The main speedups appear on DoWork (-21.1%) and GetMainStagingDiagrams (-21.0%). IsOversized and CountDistinctClusters are slower in chaincluster due to different implementation trade-offs; the net effect on TOTAL is favorable.
On the original trace, chaincluster uses ~3.2% less TxGraph memory (GetMainMemoryUsage) than chain_linearize. The 889 chain clusters at peak benefit from ChainClusterImpl’s compact representation (~20 bytes/tx vs ~40 bytes/tx).
On the original 4.5-day trace, chaincluster is ~12% faster in wall-clock time and ~15.5% faster in CPU time than chain_linearize. GetMainMemoryUsage: chaincluster uses ~3.2% less TxGraph memory. ChainClusterImpl provides both time and memory benefits for chain-shaped clusters.
The above results were obtained on the test machine.
| Item | Value |
|---|---|
| CPU | Intel Core i5-13600KF (14 cores, 20 threads) |
| Memory | 32 GB |
| Item | Value |
|---|---|
| OS | Ubuntu 24.04 LTS (Noble Numbat) |
| Docker | 28.4.0 |
| Item | Value |
|---|---|
| Base OS | Debian 12 (bookworm) |
| GCC | 12.2.0 |
| CMake | 3.25.1 |