ClickHouse Testing Guide: Testcontainers, Batch Insert Testing & Aggregation Correctness
ClickHouse is the analytics database of choice for teams that need sub-second queries over billions of rows. Its columnar storage, vectorized execution, and MergeTree engine deliver extraordinary performance—but they also introduce testing challenges unique to columnar OLAP systems. This guide covers how to test ClickHouse applications effectively.
ClickHouse Testing Challenges
ClickHouse behaves differently from OLTP databases in ways that directly affect testing:
- Eventual merge: MergeTree tables merge data parts asynchronously—immediately after insert, data may exist in multiple parts that haven't merged yet
- No true transactions: ClickHouse offers limited transaction support; tests can't rely on rollback
- Asynchronous inserts:
async_insertmode buffers writes—data may not be immediately queryable - ReplacingMergeTree deduplication: Duplicate row removal happens at merge time, not immediately
- Materialized views: Triggered at insert time, their state may diverge from the source table
Setting Up ClickHouse with Testcontainers
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>clickhouse</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.clickhouse</groupId>
<artifactId>clickhouse-jdbc</artifactId>
<classifier>all</classifier>
</dependency>@Testcontainers
class ClickHouseIntegrationTest {
@Container
static ClickHouseContainer clickhouse =
new ClickHouseContainer(DockerImageName.parse("clickhouse/clickhouse-server:24.1"))
.withUsername("test")
.withPassword("test")
.withDatabaseName("test_db");
static ClickHouseDataSource dataSource;
static Connection connection;
@BeforeAll
static void setup() throws Exception {
dataSource = new ClickHouseDataSource(
clickhouse.getJdbcUrl(),
new Properties() {{
setProperty("user", "test");
setProperty("password", "test");
}}
);
connection = dataSource.getConnection();
}
@AfterAll
static void tearDown() throws Exception {
if (connection != null) connection.close();
}
@BeforeEach
void prepareSchema() throws Exception {
try (Statement stmt = connection.createStatement()) {
stmt.execute("DROP TABLE IF EXISTS test_db.events");
stmt.execute("""
CREATE TABLE test_db.events (
event_date Date,
event_time DateTime,
user_id UInt64,
event_type LowCardinality(String),
properties String,
amount Decimal(18, 2)
) ENGINE = MergeTree()
ORDER BY (event_date, user_id, event_time)
PARTITION BY toYYYYMM(event_date)
""");
}
}
}Batch Insert Testing
Batch inserts are the recommended insertion pattern in ClickHouse—individual row inserts are expensive due to per-insert overhead.
Testing Batch Insert Correctness
@Test
void shouldInsertBatchAndRetrieveCorrectCount() throws Exception {
int batchSize = 10_000;
try (PreparedStatement ps = connection.prepareStatement(
"INSERT INTO test_db.events " +
"(event_date, event_time, user_id, event_type, properties, amount) " +
"VALUES (?, ?, ?, ?, ?, ?)"
)) {
LocalDate today = LocalDate.now();
for (int i = 0; i < batchSize; i++) {
ps.setDate(1, Date.valueOf(today));
ps.setTimestamp(2, Timestamp.from(Instant.now().minusSeconds(i)));
ps.setLong(3, (long) (i % 1000)); // 1000 distinct users
ps.setString(4, i % 2 == 0 ? "purchase" : "view");
ps.setString(5, "{\"session\": \"" + i + "\"}");
ps.setBigDecimal(6, BigDecimal.valueOf(i * 9.99));
ps.addBatch();
if (i % 1000 == 0) {
ps.executeBatch();
}
}
ps.executeBatch(); // Flush remaining
}
// Force merge to ensure all data is in a single part
try (Statement stmt = connection.createStatement()) {
stmt.execute("OPTIMIZE TABLE test_db.events FINAL");
}
try (Statement stmt = connection.createStatement()) {
ResultSet rs = stmt.executeQuery("SELECT count() FROM test_db.events");
rs.next();
assertEquals(batchSize, rs.getLong(1));
}
}Testing Data Integrity After Batch Insert
@Test
void shouldPreserveDataIntegrityInBatchInsert() throws Exception {
List<Event> events = List.of(
new Event(LocalDate.now(), 1001L, "purchase", new BigDecimal("29.99")),
new Event(LocalDate.now(), 1002L, "purchase", new BigDecimal("149.50")),
new Event(LocalDate.now(), 1001L, "refund", new BigDecimal("-29.99")),
new Event(LocalDate.now(), 1003L, "purchase", new BigDecimal("9.99"))
);
try (PreparedStatement ps = connection.prepareStatement(
"INSERT INTO test_db.events " +
"(event_date, event_time, user_id, event_type, properties, amount) " +
"VALUES (?, ?, ?, ?, '{}', ?)"
)) {
for (Event e : events) {
ps.setDate(1, Date.valueOf(e.eventDate()));
ps.setTimestamp(2, Timestamp.from(Instant.now()));
ps.setLong(3, e.userId());
ps.setString(4, e.eventType());
ps.setBigDecimal(5, e.amount());
ps.addBatch();
}
ps.executeBatch();
}
// Verify row count
try (Statement stmt = connection.createStatement()) {
ResultSet rs = stmt.executeQuery("SELECT count() FROM test_db.events");
rs.next();
assertEquals(4, rs.getLong(1));
}
// Verify specific row
try (PreparedStatement ps = connection.prepareStatement(
"SELECT amount FROM test_db.events WHERE user_id = ? AND event_type = 'refund'"
)) {
ps.setLong(1, 1001L);
ResultSet rs = ps.executeQuery();
assertTrue(rs.next());
assertEquals(new BigDecimal("-29.99"), rs.getBigDecimal(1));
}
}Aggregation Correctness Testing
Aggregation accuracy is the core contract of an analytics database—verify it rigorously.
Testing COUNT, SUM, and AVG
@Test
void shouldCalculateCorrectAggregates() throws Exception {
// Insert known data
insertEvents(List.of(
new Event(LocalDate.of(2024, 1, 1), 100L, "purchase", new BigDecimal("100.00")),
new Event(LocalDate.of(2024, 1, 1), 101L, "purchase", new BigDecimal("200.00")),
new Event(LocalDate.of(2024, 1, 2), 100L, "purchase", new BigDecimal("300.00")),
new Event(LocalDate.of(2024, 1, 2), 102L, "view", new BigDecimal("0.00")),
new Event(LocalDate.of(2024, 1, 3), 101L, "purchase", new BigDecimal("150.00"))
));
try (Statement stmt = connection.createStatement()) {
// Count by event type
ResultSet purchaseCount = stmt.executeQuery(
"SELECT count() FROM test_db.events WHERE event_type = 'purchase'"
);
purchaseCount.next();
assertEquals(4L, purchaseCount.getLong(1));
// Sum of purchase amounts
ResultSet total = stmt.executeQuery(
"SELECT sum(amount) FROM test_db.events WHERE event_type = 'purchase'"
);
total.next();
assertEquals(new BigDecimal("750.00"), total.getBigDecimal(1));
// Average purchase amount
ResultSet avg = stmt.executeQuery(
"SELECT avg(amount) FROM test_db.events WHERE event_type = 'purchase'"
);
avg.next();
assertEquals(187.5, avg.getDouble(1), 0.01);
// Distinct user count
ResultSet uniqueUsers = stmt.executeQuery(
"SELECT uniq(user_id) FROM test_db.events"
);
uniqueUsers.next();
assertEquals(3L, uniqueUsers.getLong(1));
}
}Testing Time-Series Aggregations
@Test
void shouldAggregateEventsByDay() throws Exception {
LocalDate start = LocalDate.of(2024, 1, 1);
// Insert 10 events per day for 7 days
List<Event> events = new ArrayList<>();
for (int day = 0; day < 7; day++) {
for (int i = 0; i < 10; i++) {
events.add(new Event(
start.plusDays(day),
(long) (i + 1),
"view",
BigDecimal.ZERO
));
}
}
insertEvents(events);
try (Statement stmt = connection.createStatement()) {
ResultSet rs = stmt.executeQuery(
"SELECT toDate(event_date) as day, count() as event_count " +
"FROM test_db.events " +
"GROUP BY day " +
"ORDER BY day"
);
int dayCount = 0;
while (rs.next()) {
assertEquals(10L, rs.getLong("event_count"),
"Day " + rs.getDate("day") + " should have 10 events");
dayCount++;
}
assertEquals(7, dayCount, "Should have 7 days of data");
}
}Testing Window Functions
@Test
void shouldCalculateRunningTotalWithWindowFunction() throws Exception {
insertEvents(List.of(
new Event(LocalDate.of(2024, 1, 1), 1L, "purchase", new BigDecimal("100.00")),
new Event(LocalDate.of(2024, 1, 2), 1L, "purchase", new BigDecimal("50.00")),
new Event(LocalDate.of(2024, 1, 3), 1L, "purchase", new BigDecimal("75.00"))
));
try (Statement stmt = connection.createStatement()) {
ResultSet rs = stmt.executeQuery(
"SELECT event_date, amount, " +
"sum(amount) OVER (PARTITION BY user_id ORDER BY event_date) as running_total " +
"FROM test_db.events " +
"WHERE user_id = 1 " +
"ORDER BY event_date"
);
BigDecimal[] expectedTotals = {
new BigDecimal("100.00"),
new BigDecimal("150.00"),
new BigDecimal("225.00")
};
int i = 0;
while (rs.next()) {
assertEquals(expectedTotals[i].doubleValue(),
rs.getBigDecimal("running_total").doubleValue(), 0.01);
i++;
}
assertEquals(3, i);
}
}MergeTree Engine Tests
Testing ReplacingMergeTree Deduplication
ReplacingMergeTree removes duplicates during merge—but not immediately:
@Test
void shouldDeduplcateWithReplacingMergeTree() throws Exception {
try (Statement stmt = connection.createStatement()) {
stmt.execute("DROP TABLE IF EXISTS test_db.user_states");
stmt.execute("""
CREATE TABLE test_db.user_states (
user_id UInt64,
status LowCardinality(String),
updated_at DateTime,
version UInt64
) ENGINE = ReplacingMergeTree(version)
ORDER BY user_id
""");
}
// Insert multiple versions of user state
try (PreparedStatement ps = connection.prepareStatement(
"INSERT INTO test_db.user_states (user_id, status, updated_at, version) VALUES (?, ?, ?, ?)"
)) {
long userId = 42L;
Timestamp now = Timestamp.from(Instant.now());
ps.setLong(1, userId); ps.setString(2, "inactive");
ps.setTimestamp(3, now); ps.setLong(4, 1L);
ps.addBatch();
ps.setLong(1, userId); ps.setString(2, "active");
ps.setTimestamp(3, now); ps.setLong(4, 2L);
ps.addBatch();
ps.setLong(1, userId); ps.setString(2, "premium");
ps.setTimestamp(3, now); ps.setLong(4, 3L);
ps.addBatch();
ps.executeBatch();
}
// Force merge/deduplication
try (Statement stmt = connection.createStatement()) {
stmt.execute("OPTIMIZE TABLE test_db.user_states FINAL");
}
// After dedup, only the latest version should remain
try (Statement stmt = connection.createStatement()) {
ResultSet rs = stmt.executeQuery(
"SELECT status FROM test_db.user_states FINAL WHERE user_id = 42"
);
assertTrue(rs.next());
assertEquals("premium", rs.getString("status"), "Latest version should win");
assertFalse(rs.next(), "Only one row should exist after deduplication");
}
}Testing Partition Pruning
@Test
void shouldPrunePartitionsForDateRangeQueries() throws Exception {
// Insert data across 3 months
List<Event> events = new ArrayList<>();
for (int month = 1; month <= 3; month++) {
for (int day = 1; day <= 28; day++) {
events.add(new Event(
LocalDate.of(2024, month, day),
1L, "view", BigDecimal.ZERO
));
}
}
insertEvents(events);
// Query should only scan January partition
try (Statement stmt = connection.createStatement()) {
// Enable query logging to verify partition pruning in production
ResultSet rs = stmt.executeQuery(
"SELECT count() FROM test_db.events " +
"WHERE event_date >= '2024-01-01' AND event_date < '2024-02-01'"
);
rs.next();
assertEquals(28L, rs.getLong(1));
}
}Testing Materialized Views
@Test
void shouldUpdateMaterializedViewOnInsert() throws Exception {
try (Statement stmt = connection.createStatement()) {
// Create source and materialized view
stmt.execute("DROP TABLE IF EXISTS test_db.daily_stats");
stmt.execute("""
CREATE MATERIALIZED VIEW IF NOT EXISTS test_db.daily_purchase_stats
ENGINE = SummingMergeTree()
ORDER BY (event_date, event_type)
AS SELECT
event_date,
event_type,
count() as event_count,
sum(amount) as total_amount
FROM test_db.events
WHERE event_type = 'purchase'
GROUP BY event_date, event_type
""");
}
// Insert purchase events
insertEvents(List.of(
new Event(LocalDate.now(), 1L, "purchase", new BigDecimal("50.00")),
new Event(LocalDate.now(), 2L, "purchase", new BigDecimal("75.00")),
new Event(LocalDate.now(), 3L, "view", new BigDecimal("0.00"))
));
// Verify materialized view was populated (may need OPTIMIZE for SummingMergeTree)
try (Statement stmt = connection.createStatement()) {
stmt.execute("OPTIMIZE TABLE test_db.daily_purchase_stats FINAL");
ResultSet rs = stmt.executeQuery(
"SELECT event_count, total_amount " +
"FROM test_db.daily_purchase_stats FINAL " +
"WHERE event_date = today()"
);
assertTrue(rs.next());
assertEquals(2L, rs.getLong("event_count"));
assertEquals(new BigDecimal("125.00"), rs.getBigDecimal("total_amount"));
}
}Performance Testing
@Test
void shouldQueryMillionRowsUnderSLA() throws Exception {
int rowCount = 1_000_000;
// Bulk insert using ClickHouse's native format via VALUES or CSV
try (Statement stmt = connection.createStatement()) {
// Generate data using ClickHouse's built-in functions (fast)
stmt.execute("""
INSERT INTO test_db.events
SELECT
today() - intDiv(number, 1000) as event_date,
now() - number as event_time,
number % 10000 as user_id,
['purchase', 'view', 'click'][1 + number % 3] as event_type,
'{}' as properties,
rand() % 100 + 0.99 as amount
FROM numbers(%d)
""".formatted(rowCount));
}
// Test aggregation query performance
long start = System.currentTimeMillis();
try (Statement stmt = connection.createStatement()) {
ResultSet rs = stmt.executeQuery("""
SELECT
event_type,
count() as cnt,
sum(amount) as revenue,
avg(amount) as avg_amount,
uniqExact(user_id) as unique_users
FROM test_db.events
GROUP BY event_type
ORDER BY revenue DESC
""");
int rowsFetched = 0;
while (rs.next()) rowsFetched++;
assertEquals(3, rowsFetched); // 3 event types
}
long duration = System.currentTimeMillis() - start;
// Analytics query on 1M rows should be under 2 seconds
assertThat(duration)
.as("Aggregation query over 1M rows should complete under 2s")
.isLessThan(2000L);
}ClickHouse with HelpMeTest
Analytics dashboards powered by ClickHouse need end-to-end testing of the user-facing layer:
*** Test Cases ***
Revenue Dashboard Loads Within SLA
As Admin
Go To https://app.example.com/dashboard/revenue
Wait Until Element Is Visible [data-testid=revenue-chart] timeout=5s
Check For Visual Flaws
Element Should Contain [data-testid=total-revenue] $
Daily Active Users Chart Shows Correct Range
As Admin
Go To https://app.example.com/dashboard/users
Select Date Range Last 30 Days
Wait Until Element Is Visible [data-testid=dau-chart]
Element Count Should Equal [data-testid=chart-bar] 30HelpMeTest validates that your ClickHouse-backed analytics actually renders correctly in the browser—catching issues like query timeouts surfaced as empty charts, or incorrect date formatting.
Common Pitfalls
1. Not using FINAL with ReplacingMergeTree Queries on ReplacingMergeTree tables without FINAL may return duplicate rows before merge. In tests, use OPTIMIZE TABLE ... FINAL + query with FINAL.
2. Forgetting ClickHouse timestamps are DateTime (second precision) DateTime is second-precision; use DateTime64(3) for millisecond precision. Tests inserting high-frequency events may collapse to the same second.
3. Ignoring async_insert in production config If production uses async_insert=1, writes aren't synchronous. Tests against a container default won't catch async_insert-related issues unless you enable it in your test config.
4. Testing with trivial data sizes ClickHouse optimizations like vectorized execution and late materialization only activate at scale. Performance tests with 100 rows don't reflect production behavior.
Summary
ClickHouse testing requires understanding its columnar, analytics-first architecture:
- Testcontainers for reproducible, real ClickHouse instances
- Batch insert tests to verify data ingestion pipelines
- Aggregation correctness tests for COUNT, SUM, AVG, and window functions
- MergeTree behavior tests including deduplication and partition pruning
- Materialized view tests for pre-aggregation consistency
- Performance tests at realistic data scales (1M+ rows)
The analytics layer is often where subtle bugs hide—test it with the same rigor as application code.