ClickHouse Testing Guide: Testcontainers, Batch Insert Testing & Aggregation Correctness

ClickHouse Testing Guide: Testcontainers, Batch Insert Testing & Aggregation Correctness

ClickHouse is the analytics database of choice for teams that need sub-second queries over billions of rows. Its columnar storage, vectorized execution, and MergeTree engine deliver extraordinary performance—but they also introduce testing challenges unique to columnar OLAP systems. This guide covers how to test ClickHouse applications effectively.

ClickHouse Testing Challenges

ClickHouse behaves differently from OLTP databases in ways that directly affect testing:

  • Eventual merge: MergeTree tables merge data parts asynchronously—immediately after insert, data may exist in multiple parts that haven't merged yet
  • No true transactions: ClickHouse offers limited transaction support; tests can't rely on rollback
  • Asynchronous inserts: async_insert mode buffers writes—data may not be immediately queryable
  • ReplacingMergeTree deduplication: Duplicate row removal happens at merge time, not immediately
  • Materialized views: Triggered at insert time, their state may diverge from the source table

Setting Up ClickHouse with Testcontainers

<dependency>
    <groupId>org.testcontainers</groupId>
    <artifactId>clickhouse</artifactId>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>com.clickhouse</groupId>
    <artifactId>clickhouse-jdbc</artifactId>
    <classifier>all</classifier>
</dependency>
@Testcontainers
class ClickHouseIntegrationTest {

    @Container
    static ClickHouseContainer clickhouse =
        new ClickHouseContainer(DockerImageName.parse("clickhouse/clickhouse-server:24.1"))
            .withUsername("test")
            .withPassword("test")
            .withDatabaseName("test_db");

    static ClickHouseDataSource dataSource;
    static Connection connection;

    @BeforeAll
    static void setup() throws Exception {
        dataSource = new ClickHouseDataSource(
            clickhouse.getJdbcUrl(),
            new Properties() {{
                setProperty("user", "test");
                setProperty("password", "test");
            }}
        );
        connection = dataSource.getConnection();
    }

    @AfterAll
    static void tearDown() throws Exception {
        if (connection != null) connection.close();
    }

    @BeforeEach
    void prepareSchema() throws Exception {
        try (Statement stmt = connection.createStatement()) {
            stmt.execute("DROP TABLE IF EXISTS test_db.events");
            stmt.execute("""
                CREATE TABLE test_db.events (
                    event_date Date,
                    event_time DateTime,
                    user_id UInt64,
                    event_type LowCardinality(String),
                    properties String,
                    amount Decimal(18, 2)
                ) ENGINE = MergeTree()
                ORDER BY (event_date, user_id, event_time)
                PARTITION BY toYYYYMM(event_date)
            """);
        }
    }
}

Batch Insert Testing

Batch inserts are the recommended insertion pattern in ClickHouse—individual row inserts are expensive due to per-insert overhead.

Testing Batch Insert Correctness

@Test
void shouldInsertBatchAndRetrieveCorrectCount() throws Exception {
    int batchSize = 10_000;

    try (PreparedStatement ps = connection.prepareStatement(
        "INSERT INTO test_db.events " +
        "(event_date, event_time, user_id, event_type, properties, amount) " +
        "VALUES (?, ?, ?, ?, ?, ?)"
    )) {
        LocalDate today = LocalDate.now();

        for (int i = 0; i < batchSize; i++) {
            ps.setDate(1, Date.valueOf(today));
            ps.setTimestamp(2, Timestamp.from(Instant.now().minusSeconds(i)));
            ps.setLong(3, (long) (i % 1000));  // 1000 distinct users
            ps.setString(4, i % 2 == 0 ? "purchase" : "view");
            ps.setString(5, "{\"session\": \"" + i + "\"}");
            ps.setBigDecimal(6, BigDecimal.valueOf(i * 9.99));
            ps.addBatch();

            if (i % 1000 == 0) {
                ps.executeBatch();
            }
        }
        ps.executeBatch(); // Flush remaining
    }

    // Force merge to ensure all data is in a single part
    try (Statement stmt = connection.createStatement()) {
        stmt.execute("OPTIMIZE TABLE test_db.events FINAL");
    }

    try (Statement stmt = connection.createStatement()) {
        ResultSet rs = stmt.executeQuery("SELECT count() FROM test_db.events");
        rs.next();
        assertEquals(batchSize, rs.getLong(1));
    }
}

Testing Data Integrity After Batch Insert

@Test
void shouldPreserveDataIntegrityInBatchInsert() throws Exception {
    List<Event> events = List.of(
        new Event(LocalDate.now(), 1001L, "purchase", new BigDecimal("29.99")),
        new Event(LocalDate.now(), 1002L, "purchase", new BigDecimal("149.50")),
        new Event(LocalDate.now(), 1001L, "refund", new BigDecimal("-29.99")),
        new Event(LocalDate.now(), 1003L, "purchase", new BigDecimal("9.99"))
    );

    try (PreparedStatement ps = connection.prepareStatement(
        "INSERT INTO test_db.events " +
        "(event_date, event_time, user_id, event_type, properties, amount) " +
        "VALUES (?, ?, ?, ?, '{}', ?)"
    )) {
        for (Event e : events) {
            ps.setDate(1, Date.valueOf(e.eventDate()));
            ps.setTimestamp(2, Timestamp.from(Instant.now()));
            ps.setLong(3, e.userId());
            ps.setString(4, e.eventType());
            ps.setBigDecimal(5, e.amount());
            ps.addBatch();
        }
        ps.executeBatch();
    }

    // Verify row count
    try (Statement stmt = connection.createStatement()) {
        ResultSet rs = stmt.executeQuery("SELECT count() FROM test_db.events");
        rs.next();
        assertEquals(4, rs.getLong(1));
    }

    // Verify specific row
    try (PreparedStatement ps = connection.prepareStatement(
        "SELECT amount FROM test_db.events WHERE user_id = ? AND event_type = 'refund'"
    )) {
        ps.setLong(1, 1001L);
        ResultSet rs = ps.executeQuery();
        assertTrue(rs.next());
        assertEquals(new BigDecimal("-29.99"), rs.getBigDecimal(1));
    }
}

Aggregation Correctness Testing

Aggregation accuracy is the core contract of an analytics database—verify it rigorously.

Testing COUNT, SUM, and AVG

@Test
void shouldCalculateCorrectAggregates() throws Exception {
    // Insert known data
    insertEvents(List.of(
        new Event(LocalDate.of(2024, 1, 1), 100L, "purchase", new BigDecimal("100.00")),
        new Event(LocalDate.of(2024, 1, 1), 101L, "purchase", new BigDecimal("200.00")),
        new Event(LocalDate.of(2024, 1, 2), 100L, "purchase", new BigDecimal("300.00")),
        new Event(LocalDate.of(2024, 1, 2), 102L, "view", new BigDecimal("0.00")),
        new Event(LocalDate.of(2024, 1, 3), 101L, "purchase", new BigDecimal("150.00"))
    ));

    try (Statement stmt = connection.createStatement()) {
        // Count by event type
        ResultSet purchaseCount = stmt.executeQuery(
            "SELECT count() FROM test_db.events WHERE event_type = 'purchase'"
        );
        purchaseCount.next();
        assertEquals(4L, purchaseCount.getLong(1));

        // Sum of purchase amounts
        ResultSet total = stmt.executeQuery(
            "SELECT sum(amount) FROM test_db.events WHERE event_type = 'purchase'"
        );
        total.next();
        assertEquals(new BigDecimal("750.00"), total.getBigDecimal(1));

        // Average purchase amount
        ResultSet avg = stmt.executeQuery(
            "SELECT avg(amount) FROM test_db.events WHERE event_type = 'purchase'"
        );
        avg.next();
        assertEquals(187.5, avg.getDouble(1), 0.01);

        // Distinct user count
        ResultSet uniqueUsers = stmt.executeQuery(
            "SELECT uniq(user_id) FROM test_db.events"
        );
        uniqueUsers.next();
        assertEquals(3L, uniqueUsers.getLong(1));
    }
}

Testing Time-Series Aggregations

@Test
void shouldAggregateEventsByDay() throws Exception {
    LocalDate start = LocalDate.of(2024, 1, 1);

    // Insert 10 events per day for 7 days
    List<Event> events = new ArrayList<>();
    for (int day = 0; day < 7; day++) {
        for (int i = 0; i < 10; i++) {
            events.add(new Event(
                start.plusDays(day),
                (long) (i + 1),
                "view",
                BigDecimal.ZERO
            ));
        }
    }
    insertEvents(events);

    try (Statement stmt = connection.createStatement()) {
        ResultSet rs = stmt.executeQuery(
            "SELECT toDate(event_date) as day, count() as event_count " +
            "FROM test_db.events " +
            "GROUP BY day " +
            "ORDER BY day"
        );

        int dayCount = 0;
        while (rs.next()) {
            assertEquals(10L, rs.getLong("event_count"),
                "Day " + rs.getDate("day") + " should have 10 events");
            dayCount++;
        }
        assertEquals(7, dayCount, "Should have 7 days of data");
    }
}

Testing Window Functions

@Test
void shouldCalculateRunningTotalWithWindowFunction() throws Exception {
    insertEvents(List.of(
        new Event(LocalDate.of(2024, 1, 1), 1L, "purchase", new BigDecimal("100.00")),
        new Event(LocalDate.of(2024, 1, 2), 1L, "purchase", new BigDecimal("50.00")),
        new Event(LocalDate.of(2024, 1, 3), 1L, "purchase", new BigDecimal("75.00"))
    ));

    try (Statement stmt = connection.createStatement()) {
        ResultSet rs = stmt.executeQuery(
            "SELECT event_date, amount, " +
            "sum(amount) OVER (PARTITION BY user_id ORDER BY event_date) as running_total " +
            "FROM test_db.events " +
            "WHERE user_id = 1 " +
            "ORDER BY event_date"
        );

        BigDecimal[] expectedTotals = {
            new BigDecimal("100.00"),
            new BigDecimal("150.00"),
            new BigDecimal("225.00")
        };

        int i = 0;
        while (rs.next()) {
            assertEquals(expectedTotals[i].doubleValue(),
                rs.getBigDecimal("running_total").doubleValue(), 0.01);
            i++;
        }
        assertEquals(3, i);
    }
}

MergeTree Engine Tests

Testing ReplacingMergeTree Deduplication

ReplacingMergeTree removes duplicates during merge—but not immediately:

@Test
void shouldDeduplcateWithReplacingMergeTree() throws Exception {
    try (Statement stmt = connection.createStatement()) {
        stmt.execute("DROP TABLE IF EXISTS test_db.user_states");
        stmt.execute("""
            CREATE TABLE test_db.user_states (
                user_id UInt64,
                status LowCardinality(String),
                updated_at DateTime,
                version UInt64
            ) ENGINE = ReplacingMergeTree(version)
            ORDER BY user_id
        """);
    }

    // Insert multiple versions of user state
    try (PreparedStatement ps = connection.prepareStatement(
        "INSERT INTO test_db.user_states (user_id, status, updated_at, version) VALUES (?, ?, ?, ?)"
    )) {
        long userId = 42L;
        Timestamp now = Timestamp.from(Instant.now());

        ps.setLong(1, userId); ps.setString(2, "inactive");
        ps.setTimestamp(3, now); ps.setLong(4, 1L);
        ps.addBatch();

        ps.setLong(1, userId); ps.setString(2, "active");
        ps.setTimestamp(3, now); ps.setLong(4, 2L);
        ps.addBatch();

        ps.setLong(1, userId); ps.setString(2, "premium");
        ps.setTimestamp(3, now); ps.setLong(4, 3L);
        ps.addBatch();

        ps.executeBatch();
    }

    // Force merge/deduplication
    try (Statement stmt = connection.createStatement()) {
        stmt.execute("OPTIMIZE TABLE test_db.user_states FINAL");
    }

    // After dedup, only the latest version should remain
    try (Statement stmt = connection.createStatement()) {
        ResultSet rs = stmt.executeQuery(
            "SELECT status FROM test_db.user_states FINAL WHERE user_id = 42"
        );
        assertTrue(rs.next());
        assertEquals("premium", rs.getString("status"), "Latest version should win");
        assertFalse(rs.next(), "Only one row should exist after deduplication");
    }
}

Testing Partition Pruning

@Test
void shouldPrunePartitionsForDateRangeQueries() throws Exception {
    // Insert data across 3 months
    List<Event> events = new ArrayList<>();
    for (int month = 1; month <= 3; month++) {
        for (int day = 1; day <= 28; day++) {
            events.add(new Event(
                LocalDate.of(2024, month, day),
                1L, "view", BigDecimal.ZERO
            ));
        }
    }
    insertEvents(events);

    // Query should only scan January partition
    try (Statement stmt = connection.createStatement()) {
        // Enable query logging to verify partition pruning in production
        ResultSet rs = stmt.executeQuery(
            "SELECT count() FROM test_db.events " +
            "WHERE event_date >= '2024-01-01' AND event_date < '2024-02-01'"
        );
        rs.next();
        assertEquals(28L, rs.getLong(1));
    }
}

Testing Materialized Views

@Test
void shouldUpdateMaterializedViewOnInsert() throws Exception {
    try (Statement stmt = connection.createStatement()) {
        // Create source and materialized view
        stmt.execute("DROP TABLE IF EXISTS test_db.daily_stats");
        stmt.execute("""
            CREATE MATERIALIZED VIEW IF NOT EXISTS test_db.daily_purchase_stats
            ENGINE = SummingMergeTree()
            ORDER BY (event_date, event_type)
            AS SELECT
                event_date,
                event_type,
                count() as event_count,
                sum(amount) as total_amount
            FROM test_db.events
            WHERE event_type = 'purchase'
            GROUP BY event_date, event_type
        """);
    }

    // Insert purchase events
    insertEvents(List.of(
        new Event(LocalDate.now(), 1L, "purchase", new BigDecimal("50.00")),
        new Event(LocalDate.now(), 2L, "purchase", new BigDecimal("75.00")),
        new Event(LocalDate.now(), 3L, "view", new BigDecimal("0.00"))
    ));

    // Verify materialized view was populated (may need OPTIMIZE for SummingMergeTree)
    try (Statement stmt = connection.createStatement()) {
        stmt.execute("OPTIMIZE TABLE test_db.daily_purchase_stats FINAL");

        ResultSet rs = stmt.executeQuery(
            "SELECT event_count, total_amount " +
            "FROM test_db.daily_purchase_stats FINAL " +
            "WHERE event_date = today()"
        );

        assertTrue(rs.next());
        assertEquals(2L, rs.getLong("event_count"));
        assertEquals(new BigDecimal("125.00"), rs.getBigDecimal("total_amount"));
    }
}

Performance Testing

@Test
void shouldQueryMillionRowsUnderSLA() throws Exception {
    int rowCount = 1_000_000;

    // Bulk insert using ClickHouse's native format via VALUES or CSV
    try (Statement stmt = connection.createStatement()) {
        // Generate data using ClickHouse's built-in functions (fast)
        stmt.execute("""
            INSERT INTO test_db.events
            SELECT
                today() - intDiv(number, 1000) as event_date,
                now() - number as event_time,
                number % 10000 as user_id,
                ['purchase', 'view', 'click'][1 + number % 3] as event_type,
                '{}' as properties,
                rand() % 100 + 0.99 as amount
            FROM numbers(%d)
        """.formatted(rowCount));
    }

    // Test aggregation query performance
    long start = System.currentTimeMillis();
    try (Statement stmt = connection.createStatement()) {
        ResultSet rs = stmt.executeQuery("""
            SELECT
                event_type,
                count() as cnt,
                sum(amount) as revenue,
                avg(amount) as avg_amount,
                uniqExact(user_id) as unique_users
            FROM test_db.events
            GROUP BY event_type
            ORDER BY revenue DESC
        """);

        int rowsFetched = 0;
        while (rs.next()) rowsFetched++;
        assertEquals(3, rowsFetched); // 3 event types
    }
    long duration = System.currentTimeMillis() - start;

    // Analytics query on 1M rows should be under 2 seconds
    assertThat(duration)
        .as("Aggregation query over 1M rows should complete under 2s")
        .isLessThan(2000L);
}

ClickHouse with HelpMeTest

Analytics dashboards powered by ClickHouse need end-to-end testing of the user-facing layer:

*** Test Cases ***
Revenue Dashboard Loads Within SLA
    As    Admin
    Go To    https://app.example.com/dashboard/revenue
    Wait Until Element Is Visible    [data-testid=revenue-chart]    timeout=5s
    Check For Visual Flaws
    Element Should Contain    [data-testid=total-revenue]    $

Daily Active Users Chart Shows Correct Range
    As    Admin
    Go To    https://app.example.com/dashboard/users
    Select Date Range    Last 30 Days
    Wait Until Element Is Visible    [data-testid=dau-chart]
    Element Count Should Equal    [data-testid=chart-bar]    30

HelpMeTest validates that your ClickHouse-backed analytics actually renders correctly in the browser—catching issues like query timeouts surfaced as empty charts, or incorrect date formatting.

Common Pitfalls

1. Not using FINAL with ReplacingMergeTree Queries on ReplacingMergeTree tables without FINAL may return duplicate rows before merge. In tests, use OPTIMIZE TABLE ... FINAL + query with FINAL.

2. Forgetting ClickHouse timestamps are DateTime (second precision) DateTime is second-precision; use DateTime64(3) for millisecond precision. Tests inserting high-frequency events may collapse to the same second.

3. Ignoring async_insert in production config If production uses async_insert=1, writes aren't synchronous. Tests against a container default won't catch async_insert-related issues unless you enable it in your test config.

4. Testing with trivial data sizes ClickHouse optimizations like vectorized execution and late materialization only activate at scale. Performance tests with 100 rows don't reflect production behavior.

Summary

ClickHouse testing requires understanding its columnar, analytics-first architecture:

  • Testcontainers for reproducible, real ClickHouse instances
  • Batch insert tests to verify data ingestion pipelines
  • Aggregation correctness tests for COUNT, SUM, AVG, and window functions
  • MergeTree behavior tests including deduplication and partition pruning
  • Materialized view tests for pre-aggregation consistency
  • Performance tests at realistic data scales (1M+ rows)

The analytics layer is often where subtle bugs hide—test it with the same rigor as application code.

Read more