running query…

Filters from to direction

Date range from to applies to the daily chart & block production

SQL {{ p.modeBadge }} ⌘/Ctrl + ⏎ to run

Naive mode — no index / projection. ClickHouse scans the raw table; watch rows read explode versus the optimized query.

                    took{{ p.durText }}
                    
                      {{ sc.label }}{{ sc.val }}
                    
                      {{ p.cacheChipText }}

querying ClickHouse…

{{ p.error }}

The eth database streams the public Ethereum dataset from S3 into two core MergeTree tables, then fans every insert out to a set of materialized-view indexes and dictionaries. Each one re-sorts or pre-aggregates the same rows so a lookup that would scan billions touches only thousands. Engines are colour-coded below; CREATE scripts are at the bottom.

MergeTree AggregatingMergeTree ReplacingMergeTree MaterializedView Dictionary S3Queue

Storage & compression on disk vs uncompressed

Tables & how they connect ingest → core → derived

Ingestion

SOURCE AWS S3 · aws-public-blockchain Daily Parquet files, partitioned by date.

S3Queue → MV blocks_s3_queue → blocks_stream S3Queue pulls new files; the MV streams rows into blocks.

S3Queue → MV transactions_s3_queue → transactions_stream Streams rows into transactions.

materialized on insert↓

Core tables

blocksMergeTree

transactionsMergeTree

each insert fans out via materialized views↓

Derived indexes & dictionaries

transactions_hash_reverse_idxMergeTreefrom transactions

transactions_address_reverse_idxAggregatingMergeTreefrom transactions

transactions_reverse_block_idxReplacingMergeTreefrom blocks

block_hashes_dictDictionaryfrom reverse_block_idx

Two more dictionaries — active_addresses_last_30_days_from_dict / _to_dict — back the from_address_murmur / to_address_murmur MATERIALIZED columns, so only active wallets get a non-zero morton code in the sort key.

CREATE scripts live DDL from system.tables

The whole point of this dataset is schema design. The Ethereum chain is ~3 billion transactions, yet every lookup above returns in milliseconds. That isn't raw hardware — it's eight ClickHouse techniques working together. Here's each one, and the trick that makes it fly.

Morton (Z-order) primary key

eth.transactions

A row should be findable by either party — but only for the wallets where that pays off. mortonEncode interleaves the bits of two values into one sort key, co-locating rows that share a value in either position. The catch: it isn't fed the raw addresses but the *_address_murmur columns — which are non-zero only for active wallets. Every other address hashes to 0.

CREATE TABLE eth.transactions
(
    ...
    from_address        FixedString(20),
    -- murmur = the address hash ONLY for active wallets
    -- (in the dict: > 256 txns over the last 30 days); else 0
    from_address_murmur UInt64 MATERIALIZED if(dictHas(
        'active_addresses_last_30_days_from_dict',
        murmurHash3_64(from_address)), murmurHash3_64(from_address), 0),
    to_address          FixedString(20),
    to_address_murmur   UInt64 MATERIALIZED if(dictHas(
        'active_addresses_last_30_days_to_dict',
        murmurHash3_64(to_address)), murmurHash3_64(to_address), 0),
    ...
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(block_timestamp)
ORDER BY (mortonEncode(to_address_murmur, from_address_murmur),
          block_number, transaction_index)
SETTINGS index_granularity = 256

reading the definition

Both *_address_murmur columns are MATERIALIZED — computed on insert, non-zero only when the address is in the active dictionary; everyone else hashes to 0.

The ORDER BY sorts on mortonEncode(to, from), so an active address's rows cluster whether it's the sender or the receiver. PARTITION BY month and index_granularity 256 keep every scan bounded.

Live from eth.transactions ~32,345 active address/direction pairs clear the bar

1 · The long tail collapses to morton 0

SELECT from_address, to_address, from_address_murmur, to_address_murmur,
       mortonEncode(to_address_murmur, from_address_murmur) AS morton
FROM eth.transactions
ORDER BY morton, block_number, transaction_index
LIMIT 4              -- the very front of the table

from_address
to_address
from_murmur
to_murmur
morton
block
0xa1e4380a…63b4
0x5df9b879…e734
0
0
0
46147
0xbd08e0cd…51a2
0x5c12a8e4…68c8
0
0
0
46169
0x63ac545c…15dc
0xc93f2250…f0d8
0
0
0
46170
0x037dd056…e180
0x7e7ec15a…6090
0
0
0
46194

Inactive↔inactive transfers — almost the entire chain — all hash to 0, so they share morton 0 and sit together at the very front, ordered only by block_number (note the genesis-era blocks). They're located through the reverse index's block-number bitmap, never by morton.

2 · An active wallet clusters under one code

SELECT from_address, to_address, from_address_murmur, to_address_murmur,
       mortonEncode(to_address_murmur, from_address_murmur) AS morton
FROM eth.transactions
WHERE to_address_murmur != 0 AND from_address_murmur = 0
ORDER BY morton DESC, block_number DESC
LIMIT 4

from_address
to_address
from_murmur
to_murmur
morton
block
0x5194badb…8ae3
0xeaf9a34b…dc42
0
4828955614929454015
6148914690878948693
17316328
0x35524fde…9d69
0xeaf9a34b…dc42
0
4828955614929454015
6148914690878948693
17297664
0x6621b3ca…4c89
0xeaf9a34b…dc42
0
4828955614929454015
6148914690878948693
17246960
0x1813eddf…e577
0xeaf9a34b…dc42
0
4828955614929454015
6148914690878948693
17243401

This active recipient (to_murmur = 4828955614929454015) is paid by ordinary, inactive senders (from_murmur = 0). Because only the active side carries a hash, mortonEncode(to_murmur, 0) is identical for every one — all of its received transactions land on a single morton code, i.e. one contiguous run on disk the primary index seeks straight to. No scan.

The bargain: spend morton's interleave only on the ~32k wallets busy enough to be looked up by address, and let the billion-address long tail collapse to one code reached by bitmap. It's never about an address value being large or small — only active or not.

Footprint on disk how one address's rows land in part granules

-- granules = rows touched; ranges = their contiguous runs
SELECT
  _part,
  count() AS rows,
  uniqExact(intDiv(_part_offset,256)) AS granules,
  arrayCount(
    x -> x > 1,
    arrayDifference(arraySort(
      groupUniqArray(intDiv(_part_offset,256))
    ))
  ) + 1 AS ranges,
  max(_part_offset) - min(_part_offset) + 1 AS span
FROM eth.transactions
WHERE <this address>
GROUP BY _part

INACTIVE 0x288204d5…0bbf part 202103

93

rows

17

granules

5

ranges (seeks)

6,984

span (rows)

granules #7598–#7625 · 28 wide · 5 runs

Both murmurs are 0 → morton 0. Its 93 rows scatter across 17 granules in 5 separate ranges (span 6,984 ≫ 93) — the reader makes 5 seeks via the block bitmap.

ACTIVE 0xeaf9a34b…dc42 part 202304

155

rows

2

granules

1

ranges (seeks)

155

span (rows)

granules #101290–#101291 · 2 wide · 1 run

to_murmur ≠ 0 → one morton code. 155 rows packed into 2 adjacent granules forming 1 contiguous range (span = 155, zero gaps) — a single seek.

Granules are the rows touched; ranges are the separate seeks to reach them. Active: 1 range. Inactive: 5 ranges. Fewer, fatter reads — that's the payoff morton buys for the wallets that get looked up.

Reverse-index materialized views

*_reverse_idx

The table is sorted by morton code, so finding a transaction by hash would otherwise be a full scan. A materialized view inverts that lookup, mapping each hash to its block. (The address → day rollup that powers wallet activity is covered in the Window section.)

-- 1. hash → block : a flat lookup table
CREATE MATERIALIZED VIEW transactions_hash_reverse_idx_stream
TO transactions_hash_reverse_idx AS
SELECT murmurHash3_32(hash) AS hash, block_number
FROM eth.transactions;

-- using it: resolve the block from the index, then read just that block
SELECT * FROM eth.transactions
WHERE block_number IN (
        SELECT block_number FROM transactions_hash_reverse_idx
        WHERE hash = murmurHash3_32(unhex(substring('0x63f7…20c59a', 3)))
      )
  AND hash = unhex(substring('0x63f7…20c59a', 3));

~17,000× fewer rows to find a transaction by hash — 202K read via the index vs 3.5B scanned naively, condition cache off. Turn it on and the index lookup drops to ~120K.

Hash index

A flat map of murmurHash3_32(hash) → block_number, sorted by the hash. Resolve the block first, then read just that one block instead of the whole chain.

ALSO

A projection can be the reverse index

No separate table needed: a projection index is a lightweight projection that stores only the row order for an expression — internally SELECT _part_offset ORDER BY murmurHash3_32(hash). ClickHouse uses it to resolve WHERE hash = … on its own — a reverse index rebuilt automatically on every insert, with no MV and no subquery to write.

-- a projection INDEX: stores just the permutation for an expression
-- (ClickHouse projection-index feature, TYPE basic = a normal index)
ALTER TABLE eth.transactions
  ADD PROJECTION order_by_hash INDEX murmurHash3_32(hash) TYPE basic;

-- the planner uses it automatically; no subquery, no join
SELECT * FROM eth.transactions
WHERE hash = unhex(substring('0x63f7…20c59a', 3));

~930× fewer rows — 3.8M through the projection vs 3.5B for a full scan with optimize_use_projections = 0, measured with the condition cache off so it doesn't skew the count. Turn the condition cache on and the same lookup drops to ~7K.

Versus the MV approach, there's nothing to query explicitly — the projection index lives in the table and the planner picks it when the WHERE filters on hash (matched via murmurHash3_32(hash) under the hood).

See it live in the Transaction tab: Projection uses order_by_hash; Naive runs the same query with projections off and scans the whole chain.

Windowed last-N over the aggregating index

transactions_address_reverse_idx

The reverse index is an AggregatingMergeTree grouped by (address, direction, day), keeping per day a transaction count, a SET of morton codes, and a BITMAP of block numbers. To return an address's most recent N transactions, walk its day-groups newest-first, keep a running total with a window function, and stop the moment the newer groups already cover N — then probe the main table by just those codes & blocks. No fixed cap, so it can never come up short.

WITH idx AS (
  SELECT morton, block_number,
         sum(transactions_count) OVER (ORDER BY block_date DESC
              ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS newer
  FROM eth.transactions_address_reverse_idx
  WHERE address = murmurHash3_64(unhex('…'))
  QUALIFY newer < 25          -- keep groups until 25 is covered
)
SELECT ... FROM eth.transactions AS t
WHERE mortonEncode(t.to_address_murmur, t.from_address_murmur)
        IN (SELECT arrayJoin(morton) FROM idx)
   OR  block_number IN (SELECT arrayJoin(bitmapToArray(block_number)) FROM idx)
ORDER BY block_number DESC, transaction_index DESC
LIMIT 25

For this address (143 txns over 49 days), the last 25 are covered by the 9 newest day-groups — 29 candidate rows. The other 40 days, including a 48-tx day just below the cut, are never read. On a long-lived address it's a handful of recent groups instead of its entire history.

walk newest→oldest, keep while Σ of newer groups < 25

date

tx

Σ newer

2025-12-20

2

0

✓

2025-12-19

4

2

✓

2025-12-17

4

6

✓

2025-12-16

5

10

✓

2025-12-14

2

15

✓

2025-12-13

2

17

✓

2025-12-12

1

19

✓

2025-12-11

2

20

✓

2025-12-10

7

22

✓

Σ newer ≥ 25 → stop walking

2025-12-09

48

29

✗

… 39 older day-groups — never read

✓ read✗ never touched

Projections (a pre-aggregated rollup)

PROJECTION tx_per_day

A projection is a second copy of the table stored inside it — it can re-sort the rows or, as here, pre-aggregate them with a GROUP BY. The aggregation does the heavy lifting: tx_per_day collapses 383M transactions into 365 daily rows, so a year's count reads 365 rows instead of re-scanning the chain. ClickHouse swaps it in automatically when a query's GROUP BY matches; flip any Analytics panel to Naive (optimize_use_projections = 0) to force the raw path.

-- transactions per day across all of 2023
SELECT toDate(block_timestamp) AS day, count()
FROM eth.transactions
WHERE toDate(block_timestamp) BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY day
SETTINGS optimize_use_projections = 1   -- then 0

projection

365 rows · <1 MB

raw table

383M rows · 1.8 GB

~1,000,000× less data read — same 365 daily rows

Both paths return the same 365 rows. The projection reads one pre-grouped row per day; the raw path re-scans every transaction in the year — 383 million rows, 1.8 GB.

PROJECTION tx_per_day (
  SELECT toDate(block_timestamp) AS date,
         count() AS total_count,
         countIf(receipt_status) AS completed_count
  GROUP BY date )

It's stored and merged alongside the table, so it's always in sync — no manual refresh. The planner picks it transparently whenever a query's GROUP BY matches.

Bloom skip index on a sort-key column

address_bloom

Skip indexes aren't only for non-key columns. The reverse index is already ORDER BY (address, …), yet it also carries a bloom on address. The sparse primary index narrows to candidate granules but can't prove a value is absent — the bloom can, with no false negatives, so ClickHouse drops granules the primary key would otherwise read.

-- one address, run twice: skip index ON vs OFF
SELECT count()
FROM eth.transactions_address_reverse_idx
WHERE address = murmurHash3_64(unhex('eaf9a34b…dc42'))
SETTINGS use_skip_indexes = 1   -- then 0

index ON

49 parts · 49 granules
12,544 rows

index OFF

264 parts · 264 granules
67,524 rows

5.4× fewer rows read — same 212 rows returned

Both runs return the same 212 rows. The index is partitioned by month: with the bloom off, the primary key reads one candidate granule in every one of the 264 parts; with it on, the 215 parts where this address never appears are skipped — only 49 are touched.

INDEX address_bloom address
  TYPE bloom_filter(0.0001)
  GRANULARITY 100

a per-part bloom; accent = the bits X hashes to

bit →

X needs

·

·

1

·

·

·

1

·

·

1

·

·

✗

✓

✗

✓

✗

✗

✗

✗

                        rows = parts in ORDER BY ↓ · columns = filter bits →
                        ✓ every X-bit set → read
                        0 one missing → skip, no read
                      

A bit can collide (set by another address) but never un-set — so a part is never wrongly skipped. Only that 0.0001 collision rate costs a wasted read.

ALSO

A sparse index only orders by the first key

ClickHouse's primary index is sparse — one mark per granule, in full sort-key order. A filter on the leading key bisects straight to a handful of granules. But a later key is only sorted within each run of the one before it, so filtering on it alone gives the sparse index nothing to bisect and degrades to a near-full scan. block_number is the case in point — blocks arrive in order, but the table is ORDER BY (mortonEncode(…), block_number, …), so morton scatters the blocks across granules. A minmax + bloom_filter skip index on block_number puts the pruning back.

WHERE morton = …

leading key · 1–2 granules

WHERE block_number = N

no skip · full scan

+ minmax / bloom skip

granules pruned

A non-leading-key filter without a skip index scans the table; the skip index restores granule pruning. Bar width ≈ rows read.

INDEX block_number_minmax block_number
  TYPE minmax GRANULARITY 2
INDEX block_number_bloom  block_number
  TYPE bloom_filter(0.001) GRANULARITY 15

minmax keeps the [min, max] block per granule-group and drops any whose range can't hold N; the bloom confirms presence for the survivors. So a filter on a non-leading key prunes the table again — the mechanism behind the Block tab's rows-read counter.

Query condition cache

ClickHouse 25.3+ · use_query_condition_cache

Different from the query result cache, which memorises the whole answer (and can go stale): the condition cache records, for each granule, a single bit — did any row match this WHERE? A repeat query with the same predicate skips every 0-bit granule, even on a column with no index, projection, or sort-key help — and it still applies when the rest of the query changes, as long as the filter is the same.

-- a selective filter on a non-indexed column, run twice
SELECT count()                       -- 87 transfers over 30,000 ETH
FROM eth.transactions
WHERE block_timestamp >= '2023-06-01' AND block_timestamp < '2023-07-01'
  AND value > 30000e18
SETTINGS use_query_condition_cache = 1

1st run (miss)

31.6M rows · 362 MB

repeat (hit)

3.2M rows · 37 MB

9.8× fewer rows on the repeat — same 87 rows returned

The first run pays the scan and records the bitmap; every later query with that filter rides it. It can't go stale — parts are immutable, so a recorded bit stays valid; only brand-new parts are still unknown.

one bit per granule: “any row where value > 30000 ETH?”

                        1 a match → read
                        0 none → skip on repeat
                      

One bit per (filter, granule) — ~nothing in RAM (100 MB covers trillions of rows). Not kept across restarts; SYSTEM CLEAR QUERY CONDITION CACHE resets it.

Query result cache & paging prefetch

use_query_cache · LIMIT over-fetch

Toggle query cache in the header. The first run of a deterministic query is stored, so an identical re-run returns straight from memory — rows read ≈ 0, answered in microseconds. The listings lean on this with one deliberate trick: they over-fetch.

-- the table shows 10 rows per page, but every listing query asks for 100
SELECT … FROM eth.transactions
WHERE …
ORDER BY …
LIMIT 100;   -- one deterministic batch — identical text on every page flip

one query → 100 rows → 10 pages page 1 of 10

▮ shown now · ▯ already in memory — flipping pages slices the buffer, 0 network until page 11.

1 query serves 10 pages. Crossing the 100-row batch rounds the next fetch up to the next multiple of 100 — a cache-friendly shape the result cache reuses.

Why over-fetch?

Paging by the displayed 10 rows would issue a new LIMIT … OFFSET … per page — every one a cache miss. Fetching 100 at once makes pages 1–10 one cached query; the client just shows a different 10-row slice.

Deterministic = cacheable

The result cache only reuses an answer when the query text and settings match exactly. A fixed LIMIT 100 — not a per-page offset — keeps the text identical, so a re-run or a return to page 1 reads ≈ 0 rows.

Column data types & codecs

the smallest type wins

The biggest compression lever isn't a codec — it's the data type. The S3 feed arrives as wide String and Int64 columns; narrowing each to the smallest type that fits the data cut the transactions table nearly in half — before touching a single codec.

30.4 GiB → 16.2 GiB ~50% smaller from data types alone — measured on a 39.2 M-row month of eth.transactions.

                        ColumnDefault → optimal typeCompressed
                      

                        hashString → FixedString(32)2.43G → 1.17G
                      

                        block_hashString → LowCardinality24.7M → 8.1M
                      

                        input 86%String → unhex String24.8G → 13.2G
                      

                        from_addressString → FixedString(20)1.23G → 588M
                      

                        to_addressString → FixedString(20)845M → 424M
                      

                        block_numberInt64 → UInt322.88M → 1.59M
                      

                        transaction_indexInt64 → UInt169.82M → 2.72M
                      

                        receipt_statusInt64 → UInt85.40M → 2.51M
                      

Hex strings → raw bytes

A 66-char hex String becomes FixedString(32): drop the 0x, unhex 64→32 bytes, and lose String's length byte — >2× smaller. Hashes are pure entropy, so codecs can't shrink them; the type is the only lever (hash even uses CODEC(NONE)).

Wide ints → narrow

Int64 wastes 4–7 bytes per value. block_number → UInt32 (130+ yrs of headroom), transaction_index → UInt16 (~1,500 tx/block), receipt_status & transaction_type → UInt8 (2 and 4 values).

LowCardinality for repeats

block_hash repeats for every transaction in a block — thousands of identical values. Dictionary-encoding drops uncompressed size 1.17 GiB → 80 MiB, and a GROUP BY on it from 263 MiB → 8 MiB peak memory.

CODECS

Apply only where they pay

Once the types are right, a codec is the second lever — and not a free one. ClickHouse processes data uncompressed, so a heavier codec is paid back as CPU on every read. Reach for it only on the largest columns, where the byte savings dwarf the decode cost.

                          StrategySizeColumn reads
                        

                          Optimal types · LZ4 defaultbaselinefast ✓
                        

                          + codecs on 6 biggest cols−10%≤10% slower
                        

                          ZSTD on every column−5%3–19× slower
                        

measured vs optimal types + default LZ4, single-thread per-column scans.

The ZSTD-everywhere tax

Per-column reads balloon: from_address 0.86s → 2.65s (3×), transaction_index 0.05s → 0.99s (19×). A whole-table scan is ~1.8× slower and writes ~2× slower.

what gets a codec

input                     CODEC(ZSTD(3))
from_address, to_address  CODEC(ZSTD(3))
nonce, *_gas_used         CODEC(T64, LZ4)
hash                      CODEC(NONE) -- random

{{ viewTitle }}