IN THIS ARTICLE

This is also a heading
This is a heading

EMBEDDED ANALYTICS

How to Migrate From Any Data Source to ClickHouse: Complete Step-by-Step Guide

November 26, 2025

Copy link icon for sharing the blog URL.

Vishnupriya B

Data Analyst

Modern analytics teams want fast queries, scalable architecture, and a pipeline they can trust in production. ClickHouse delivers all of that but only when your migration strategy is built cleanly, end-to-end.

This article walks you through exactly how to migrate from any source system into ClickHouse Cloud, using 100% verified patterns. No shortcuts. No missing steps. No interpretation changes.

You’ll also see a full practical implementation using:

clickhouse-local for data ingestion
dbt-clickhouse for data modeling
Databrain for data analytics, data visualization, and embedding

By the end, you’ll have a scalable, automated, production-ready pipeline that runs automatically and scales effortlessly.

Understanding the Migration Landscape

Before you migrate a single row, you need to confront the most important question:
Where is your data coming from?
Speed, cost, reliability, long-term maintenance burden depends on this first step.

Identify Your Data Source

Your data will typically fall into one of four buckets:

Relational databases
Data warehouses
Cloud object storage
Streaming systems

This single step determines which ingestion method you should use.

‍

1. When Your Data Source Can Export to Object Storage

If your source can write data into:

AWS S3
Google Cloud Storage (GCS)
Azure Blob Storage

Then migration becomes extremely simple.

Use table functions (s3, gcs, azureBlobStorage)

These allow you to bulk-load structured files directly into ClickHouse Cloud.

This is ideal when:

Your system already writes data into S3/GCS
You export from a warehouse into Parquet
You want a fast, cost-efficient bulk ingest path

This method is widely used and officially supported.

‍

2. When Your Data Source Is a Database (MySQL, PostgreSQL, MongoDB, etc.)

If your data doesn’t already live in object storage, you have two clean, reliable ways to move it into ClickHouse:

A. Use clickhouse-local as your ETL host

Reads directly from the source DB
Writes directly into ClickHouse Cloud
Zero CSVs, zero temporary files, no staging layer needed

B. Use JDBC/ODBC bridge integrations

For systems already using connectors, these work smoothly.

Optional Two-Step Method

If preferred:

Export data from the source into S3/GCS
Use table functions to load into ClickHouse

All three methods are officially aligned and reliable at scale.

‍

3. When Your Data Source Is a Data Warehouse

Different warehouses require different approaches.

BigQuery

Export → GCS
Load using gcs table function
Supports bulk loads + incremental updates/CDC

Snowflake

Export to object storage
Load into ClickHouse
Snowflake doesn’t prescribe a single method — this is the standard approach

Amazon Redshift

Three common migration options:

UNLOAD → S3 → ClickHouse
Push from Redshift
Pull from ClickHouse

All three approaches are documented and valid.

4. When Direct Integration Isn’t Available

When your source system doesn’t have a native connector, use a generic ETL/ELT tool:

Airbyte
dbt (for transformations)
clickhouse-local (for ingestion)

Pipeline flow:

Source → ETL Tool → Object Storage / clickhouse-local → ClickHouse Cloud

This works reliably for less-common data sources.

‍

Real-Time, Streaming, and CDC-Based Migrations

When you need fresh data — inserts, updates, and deletes moving into ClickHouse with minimal delay, real-time pipelines are the pattern to follow.

Change Data Capture (CDC) tools
Third-party ingestion systems
Streaming pipelines

ClickHouse handles all major ingestion modes:

Bulk ingestion
Periodic updates
Real-time replication (as demonstrated with BigQuery documentation)

If your workload needs freshness, this is the right pattern.

Schema, Networking, and Security Requirements

Regardless of the migration method you choose, you must:

Align schemas

Column names and data types must match between the source and destination.

Ensure network access

ClickHouse Cloud must allow your source IPs.
Use IP Access List to whitelist them.

Choose optimized formats

If using object storage, prefer:

Parquet
Avro

Both formats accelerate bulk ingestion.

‍

Practical Migration: MySQL → ClickHouse → dbt → Databrain

This section walks through the actual implementation. All steps are preserved exactly as provided.

‍

Step 1: Inspect Data in MySQL Workbench

Before migrating anything, validate what you have. Run basic row-count checks and simple validation queries to confirm table completeness:

Shiki Code Block

SQL

SELECT COUNT(*) FROM accounts;
SELECT COUNT(*) FROM users;
SELECT COUNT(*) FROM subscriptions;
SELECT COUNT(*) FROM usage_events;

Table structures:

Understanding your schema upfront is critical, mismatched types cause the majority of migration failures.

accounts

id BIGINT
name VARCHAR
created_at DATETIME

users

id BIGINT
account_id BIGINT
email VARCHAR
created_at DATETIME

subscriptions

id BIGINT
account_id BIGINT
plan VARCHAR
status VARCHAR
started_at DATETIME
ended_at DATETIME

usage_events

id BIGINT
account_id BIGINT
event_type VARCHAR
event_time DATETIME
metadata JSON

No export needed. clickhouse-local ClickHouse-local can connect directly to MySQL.

‍

Step 2: Install clickhouse-local

Install clickhouse-local depending on your operating system:

Windows/Linux:

Shiki Code Block

Curl

curl https://clickhouse.com/ | sh

macOS:

Shiki Code Block

Bash

brew install clickhouse

This gives you a lightweight, local ClickHouse runner capable of querying and ingesting data without deploying a full server.

‍

Step 3: Create Destination Tables in ClickHouse Cloud

Create the destination tables in ClickHouse Cloud:

accounts
users
subscriptions
usage_events

Use the exact same schema as your MySQL source. The DDL must match types precisely. This avoids ingestion errors and type coercion issues during migration.

‍

Step 4: Migrate Each Table Using clickhouse-local

clickhouse-local handles both ingestion and transfer:

It reads from MySQL using the MySQL table function
It writes into ClickHouse Cloud using remoteSecure()

Run the four migration commands exactly as provided for:

accounts
users
subscriptions
usage_events

This completes a direct MySQL → ClickHouse migration with:

No CSV files
No temporary storage
No staging infrastructure

The cleanest possible pipeline.

‍

Step 5: Set Up dbt-clickhouse

Install dbt with ClickHouse support:

Shiki Code Block

Bash

pip install dbt-core dbt-clickhouse

‍

Initialize your dbt project:

Shiki Code Block

Bash

dbt init dbt_clickhouse_modeling
cd dbt_clickhouse_modeling

Finally, configure your profiles.yml exactly as specified to ensure dbt connects to ClickHouse Cloud without authentication issues.

‍

Step 6: Declare dbt Sources

Define your raw tables in src.yml, mapping them directly to the ClickHouse tables you created earlier.

This step tells dbt where your ingested MySQL data now lives.

‍

Step 7: Build dbt Models

Create your transformation models:

fct_account_profile

Joins accounts & active subscription.

fct_daily_usage

Incremental daily usage summary. SQL is preserved exactly.

Step 8: Run dbt

Execute your models:

Shiki Code Block

Bash

dbt run

‍

This builds materialized models inside ClickHouse Cloud using the transformations defined earlier.

‍

Step 9: Connect ClickHouse Cloud to Databrain

To visualize your transformed data, connect ClickHouse Cloud as a datasource in Databrain:

Add New Datasource
Choose ClickHouse Cloud
Enter host, port, SSL, credentials
Sync schema

Your dbt models now appear:

fct_account_profile
fct_daily_usage

This step walks through the quickest way to set up a dev ClickHouse cluster to Databrain. If you need the full connector setup (network access, SSL/TLS, permissions, and best practices for multi-tenant SaaS), check out our overview of the Databrain–ClickHouse integration.

Quick References:

‍

Step 10: Analyze Data in Databrain and build Dashboards

Now that your models are live, you can explore your data inside Databrain and build dashboards.

Account Overview

Active accounts
Plan distribution
Monthly account creation
Subscription statuses

Usage Analysis

Daily activity
Most frequent event types
Growth patterns
Usage spikes/drops

Step 11: Embed Databrain Dashboard

Once your dashboards are built, you can embed them directly into your application using Databrain’s embedding framework.

Helpful resources:

Embedding tutorial to seamlessly embed a Databrain dashboard into your application
Book a Demo to see Databrain in action
Explore Databrain Docs for detailed implementation guides
Try Databrain with your own data

Production Data Lifecycle

Once everything is deployed, your system operates as a fully automated, real-time data pipeline. The flow looks like this:

Your source database continues to update
clickhouse-local pulls incremental changes
dbt-clickhouse regenerates your models
Databrain dashboards refresh automatically
Embedded analytics inside your application update instantly

This forms a fully automated, real-time analytics pipeline.

Final Architecture

Your complete, end-to-end pipeline looks like this:

MySQL → clickhouse-local → ClickHouse Cloud → dbt → Databrain → Embedded Analytics

Clean. Fast. Scalable. Production-ready.

This is the foundation for a long-term, future-proof analytics stack.

FAQs

Q1. What is ClickHouse used for?

Ans. ClickHouse is a high-performance columnar database built for analytics, real-time dashboards, event data, and large-scale reporting. It can process billions of rows per second with sub-second latency.

Q2. What is the best way to migrate data into ClickHouse?

Ans. The most common migration paths include exporting data to Parquet, using Kafka or streaming connectors, loading batch files via clickhouse-client, or using ETL/ELT tools like Airbyte, Fivetran, Debezium, or dbt. The best method depends on the size, format, and update frequency of your data.

Q3. How do I ensure schema consistency when migrating to ClickHouse?

Ans. Best practices include:

matching data types from source to destination,
pre-creating tables,
validating using DESCRIBE TABLE,
and testing ingestion on small samples first.

Start using
Databrain today

Log in

Make analytics your competitive advantage

Get it touch with us and see how Databrain can take your customer-facing analytics to the next level.

Start building Talk to us

Interactive analytics dashboard with revenue insights, sales stats, and active deals powered by Databrain

How to Migrate From Any Data Source to ClickHouse: Complete Step-by-Step Guide

Understanding the Migration Landscape

Identify Your Data Source

1. When Your Data Source Can Export to Object Storage

Use table functions (s3, gcs, azureBlobStorage)

2. When Your Data Source Is a Database (MySQL, PostgreSQL, MongoDB, etc.)

A. Use clickhouse-local as your ETL host

B. Use JDBC/ODBC bridge integrations

Optional Two-Step Method

3. When Your Data Source Is a Data Warehouse

BigQuery

Snowflake

Amazon Redshift

4. When Direct Integration Isn’t Available

Real-Time, Streaming, and CDC-Based Migrations

Schema, Networking, and Security Requirements

Align schemas

Ensure network access

Choose optimized formats

Practical Migration: MySQL → ClickHouse → dbt → Databrain

Step 1: Inspect Data in MySQL Workbench

Step 2: Install clickhouse-local

Step 3: Create Destination Tables in ClickHouse Cloud

Step 4: Migrate Each Table Using clickhouse-local

Step 5: Set Up dbt-clickhouse

Step 6: Declare dbt Sources

Step 7: Build dbt Models

Step 8: Run dbt

Step 9: Connect ClickHouse Cloud to Databrain

Step 10: Analyze Data in Databrain and build Dashboards

Account Overview

Usage Analysis

Step 11: Embed Databrain Dashboard

Production Data Lifecycle

Final Architecture

FAQs

Related Articles

Make analytics your competitive advantage