What is ClickHouse?
ClickHouse is a high-performance, open-source columnar database management system optimized for online analytical processing (OLAP). It handles large volumes of data at high speeds, making it suited for real-time analytics and big data environments. Unlike traditional row-based databases, ClickHouse’s columnar storage structure processes and analyzes data in columns rather than rows, improving query speed for analytical workloads.
Available under the Apache-2.0 license, it has over 35K stars and 1500 contributors on GitHub. The repository can be found at https://github.com/ClickHouse/ClickHouse.
ClickHouse is commonly used in production and enterprise environments and provides several robust options for managing backup and restore of user databases. We’ll cover the most important options and how to implement them.
Overview of ClickHouse backup methods
ClickHouse provides multiple methods for creating and managing backups, each offering flexibility in terms of storage location, compression, encryption, and incremental backups. Here’s an overview of the main backup methods available:
- Local disk backup: Backups can be stored on a local disk by configuring a dedicated disk location in ClickHouse’s configuration file. The
BACKUP
command allows full or incremental backups of tables or entire databases, saving them to a specified local path. This is achieved by adding aconfiguration file that specifies the path and permissions for the backup location. Backups can then be created using
BACKUP TABLE
orBACKUP DATABASE
commands with the specified disk. - S3 and Azure Blob Storage: ClickHouse supports remote backups to S3-compatible storage and Azure Blob Storage, which is useful for distributed environments and ensures that data is safely stored off-site. Configuring these backups involves specifying endpoint URLs, access keys, and other required credentials in the
BACKUP
command. ClickHouse can also perform incremental backups on remote storage by referencing a base backup, which is beneficial for large datasets. - Incremental backups: Incremental backups store only the changes since the last backup, reducing storage costs and backup time for large datasets. This is done by specifying the base backup file when initiating a new backup. However, both the base and incremental backups are required during a restore.
- Compressed and encrypted backups: ClickHouse supports custom compression levels and methods, such as
lzma
andgzip
, which can reduce the backup size on disk. Password protection is also available for disk backups, providing an additional layer of security. - Partition-level backups: ClickHouse enables users to back up or restore selected table partitions instead of entire tables, allowing more control over data recovery. This is beneficial in scenarios where only parts of the data need to be restored.
- File system snapshots and third-party tools: ClickHouse can also leverage filesystem snapshots (e.g., ZFS) for creating backups or use third-party tools like
clickhouse-backup
. These alternatives offer various levels of integration with the underlying storage system, with features for managing snapshots outside of ClickHouse’s native commands.
Each backup type is configurable with additional settings, such as synchronous or asynchronous operations and concurrent backup restrictions.
Quick tutorial: How to set up ClickHouse backups
Setting up backups in ClickHouse can be done using the clickhouse-backup
utility, which provides a way to create local and remote backups while managing storage effectively. This tutorial walks through installing, configuring, and scheduling backups with clickhouse-backup
.
Step 1: Install clickhouse-backup utility
To install the clickhouse-backup
utility, run the following Bash script. This script downloads the specified version of clickhouse-backup
, extracts it, and moves it to /usr/bin
for easy access.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
#!/bin/bash OS=linux ARCH=amd64 CLICKHOUSE_BACKUP_VERSION=2.6.2 CLICKHOUSE_BACKUP_ARCHIVE=clickhouse-backup$OS-$ARCH.tar.gz sudo apt-get update wget https://github.com/Altinity/clickhouse-backup/releases/download/vCLICKHOUSE_BACKUP_VERSION/CLICKHOUSE_BACKUP_ARCHIVE tar -zxvf $CLICKHOUSE_BACKUP_ARCHIVE rm $CLICKHOUSE_BACKUP_ARCHIVE mv build/$OS/$ARCH/clickhouse-backup /usr/bin/clickhouse-backup rm -r build |
Step 2: Configure clickhouse-backup
Once installed, configure clickhouse-backup
by editing the configuration file /etc/clickhouse-backup/config.yml
. Here’s an example configuration:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
general: remote_storage: sftp backups_to_keep_local: 8 # Keep the last 8 backups locally backups_to_keep_remote: 0 # Remote server manages cleanup clickhouse: username: my_username password: my_password host: my_host port: 9000 disk_mapping: {} skip_tables: - system.* - INFORMATION_SCHEMA.* sftp: address: my_host username: my_username password: my_password port: 22 path: "clickhouse-backups/{shard}” |
If your sFTP server uses a PEM key, you can use the following configuration for SFTP (only that portion is shown).
1 2 3 4 5 6 7 8 9 10 |
sftp: enabled: true endpoint: sftp.example.com port: 22 username: your_sftp_user private_key: /path/to/your/private_key.pem passphrase: your_passphrase # Optional, only if your private key is encrypted path: /remote/backup/path compression_format: tar compression_level: 9 |
This configuration specifies details for both ClickHouse and the remote storage, such as credentials, backup location, and tables to exclude.
Step 3: Create a backup script
The following script enables full and incremental backups to either local or remote storage, depending on the arguments passed. Save this script as /etc/default/clickhouse-backup-run.sh
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
#!/bin/bash if [[ $1 == "full" ]]; then IS_FULL=true elif [[ $1 == "incremental" ]]; then IS_FULL=false else echo "Specify 'full' or 'incremental'" exit 1 fi if [[ $2 == "local" ]]; then CREATE_COMMAND="create" elif [[ $2 == "remote" ]]; then CREATE_COMMAND="create_remote" else echo "Specify 'local' or 'remote'" exit 1 fi DATETIME=$(date -u +%Y-%m-%dT%H-%M-%S) BACKUP_NAME_FULL="auto_full_$DATETIME" BACKUP_NAME_INCREMENTAL="auto_incremental_$DATETIME" if [[ $IS_FULL == true ]]; then echo "Starting full backup" clickhouse-backup $CREATE_COMMAND $BACKUP_NAME_FULL else PREV_BACKUP=$(clickhouse-backup list remote | grep -E '^auto_' | tail -n 1 | cut -d " " -f 1) echo "Starting incremental backup" clickhouse-backup $CREATE_COMMAND --diff-from-remote=$PREV_BACKUP $BACKUP_NAME_INCREMENTAL fi |
This script uses parameters to control whether the backup is full or incremental and whether it is stored locally or remotely. It checks the backup type and initiates the clickhouse-backup
utility with the appropriate command.
Step 4: Schedule backups with cron
To automate backups, schedule the script with cron. The example below sets up daily local backups and weekly remote backups for each shard:
- Daily local backup (for each replica): Create a cron job in
/etc/cron.d/clickhouse-backup
:10 5 * * * root /bin/bash /etc/default/clickhouse-backup-run.sh full local - Weekly full backup and daily incremental backup to remote storage (for one replica per shard): Configure the following cron jobs:
12345# Weekly full remote backup0 5 * * 6 root /bin/bash /etc/default/clickhouse-backup-run.sh full remote# Daily incremental remote backup0 5 * * 0-5 root /bin/bash /etc/default/clickhouse-backup-run.sh incremental remote
These cron jobs handle regular backups with minimal manual intervention, ensuring data safety and easy recovery.
Note: You can also use the crontab -e command to set up cron scripts. It allows you to use a text editor like nano or vim in Linux.
Step 5: Secure and manage backup storage
To secure backups, consider moving them from accessible directories to a protected storage area. This script moves backups from an FTP-accessible directory to a secured directory:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
#!/bin/bash BACKUPS_DIR=/mount/ftp/clickhouse-backups BACKUPS_HIDDEN_DIR=/mount/clickhouse-backups-hidden # Iterate over each shard for SHARD in $(ls -d $BACKUPS_DIR/shard-*); do BACKUPS=($(ls $SHARD | grep -E '^auto_' | sort -t_ -k3)) # Remove the last backup (most recent) unset 'BACKUPS[${#BACKUPS[@]}-1]' # Move backups to the hidden directory for BACKUP in "${BACKUPS[@]}"; do mkdir -p "$BACKUPS_HIDDEN_DIR/$SHARD" mv "$SHARD/$BACKUP" "$BACKUPS_HIDDEN_DIR/$SHARD/$BACKUP" done done |
Step 6: Cleanup old backups
This cleanup script keeps only the latest 20 backups in the hidden directory:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
#!/bin/bash BACKUPS_HIDDEN_DIR=/mount/clickhouse-backups-hidden BACKUPS_TO_KEEP=20 for SHARD in $(ls -d "$BACKUPS_HIDDEN_DIR"/shard-*); do BACKUPS=($(ls "$SHARD" | grep s-E '^auto_' | sort -t_ -k3)) if [[ ${#BACKUPS[@]} -gt $BACKUPS_TO_KEEP ]]; then for BACKUP in "${BACKUPS[@]:0:$((${#BACKUPS[@]} - $BACKUPS_TO_KEEP))}"; do rm -r "$SHARD/$BACKUP" done fi done |
With this setup, the ClickHouse backups are automated, secured, and regularly cleaned up, ensuring efficient use of storage space and reliable disaster recovery options.
Related content: Read our guide to ClickHouse tutorial
Tips from the expert

Suresh Vasanthakumar
Site Reliability Engineer
Suresh is a seasoned database engineer with over a decade of experience in designing, deploying and optimizing high-performance distributed systems. Specializing in in-memory data stores, Suresh has deep expertise in managing Redis and Valkey clusters for enterprise-scale applications.
In my experience, here are tips that can help you better manage and optimize ClickHouse backups:
- Use differential backups for complex data requirements: Consider differential backups in addition to full and incremental ones, particularly for complex, large datasets. Differential backups capture data since the last full backup, balancing storage efficiency with shorter recovery times compared to relying solely on incremental backups.
- Implement redundancy in backup storage: Distribute backups across multiple storage locations or providers. For example, you could store backups on both S3 and Azure Blob, or on different geographical regions within the same provider, increasing resilience in case of provider or regional outages.
- Automate snapshot backups for faster rollbacks: Use file system snapshots (like those offered by ZFS or LVM) alongside ClickHouse’s native backup tools. This allows for almost instantaneous rollbacks, making it useful for environments requiring frequent backups or quick rollbacks, such as in staging or test environments.
- Leverage partition-level backups to isolate critical data: For frequently accessed or regulatory-sensitive data, use partition-level backups. This approach allows you to isolate and restore only the most critical portions, reducing recovery time for specific datasets and enhancing data compliance.
- Configure pre- and post-backup scripts for data consistency: Implement scripts that prepare the system for backups and validate their completion. For example, pause ingesting processes or run final checkpoints before backups, then verify data consistency post-backup to ensure clean and reliable data snapshots.
Best practices for ClickHouse backups
Here are some best practices to consider when implementing backups in ClickHouse.
Regular Backup Scheduling
A consistent backup schedule is essential to protect data against unexpected loss. The frequency of backups should align with the data’s volatility and business requirements. For example , in environments where data changes frequently, daily full backups during low-traffic periods can minimize system impact while ensuring data is consistently protected.
Tools like clickhouse-backup
can automate this process, reducing the risk of human error and ensuring backups are performed reliably. Regular scheduling also supports compliance with data retention policies and regulatory requirements. By maintaining a predictable backup routine, organizations can ensure data is available for restoration within acceptable timeframes.
Incremental backups
To optimize storage usage and reduce the time required for backup operations, implement incremental backups that capture only the changes since the last full backup. This approach is particularly beneficial for large datasets, as it minimizes the amount of data processed during each backup cycle.
However, it’s important to ensure that both base and incremental backups are available and properly managed, as they are required together for a complete restore. Implementing incremental backups requires planning and verification to manage dependencies between backup sets. A clear retention policy for incremental backups helps prevent excessive accumulation of backup files.
Secure backup storage
Storing backups in secure, off-site locations helps protect against hardware failures, data center incidents, or other disasters. ClickHouse supports remote backups to S3-compatible storage and Azure Blob Storage, enabling off-site storage. Implementing encryption for backups adds an additional layer of protection, keeping sensitive data secure if backup media is compromised.
In addition to encryption, access controls should be enforced to restrict backup access to authorized personnel only. Regular audits of backup storage environments can help identify and mitigate potential security vulnerabilities. Diversifying storage locations across geographic regions can provide additional resilience against regional outages or disasters.
Backup verification
Regularly testing backup restoration processes helps confirm data integrity and the reliability of the backup strategy. It ensures that backups are functional and can be restored promptly when needed, reducing downtime and potential data loss. Establishing a routine for backup verification helps identify and address issues proactively.
Automated testing of backup restorations can simplify the verification process and provide timely feedback on backup health. Documenting restoration procedures and maintaining up-to-date recovery plans are also essential components of a comprehensive backup strategy.
Exclude non-essential data
To conserve storage space and streamline the backup process, exclude system tables and other non-critical data from backups. Configuring the backup tool to omit tables like system.* and INFORMATION_SCHEMA.* ensures that only essential data is backed up, reducing the size and complexity of backup files.
Regularly reviewing and updating the list of excluded data is important to adapt to changing data environments. As new tables or databases are introduced, assessing their criticality ensures that backup policies remain aligned with business priorities.
Monitor backup processes
Implementing monitoring for backup operations allows for the detection and prompt addressing of failures or issues. Setting up alerts for backup completion statuses and errors helps maintain the reliability of the backup system. Monitoring tools can provide insights into backup performance, duration, and success rates, enabling continuous improvement of the backup.
Regular analysis of monitoring data can reveal trends or recurring issues that may require attention. For example, increasing backup durations might indicate growing data volumes or performance bottlenecks.
Reliable data backups and restoration with Instaclustr for ClickHouse
Instaclustr for ClickHouse offers an outstanding solution for organizations seeking a powerful, fully-managed ClickHouse experience. A standout feature of the service is the robust backup system, designed to protect critical data and ensure peace of mind, no matter the scale of operations.
Instaclustr’s automated backup capabilities include incremental backups that capture data precisely when it is needed. These seamless backups are designed to minimize operational impact, allowing teams to focus on leveraging ClickHouse’s lightning-fast analytic queries without disruption. Whether analyzing large datasets or running real-time reports, Instaclustr’s backup solutions guarantee data is securely stored and easily retrievable when required.
Additionally, Instaclustr ensures effortless restoration, whether it’s for roll back to a specific point in time or to recover from the unexpected. Instaclustr’s approach not only reinforces business continuity but also aligns with best practices for data protection and compliance. Plus, with the constant support of Instaclustr’s expert team, organizations always have guidance to optimize ClickHouse environments.
By streamlining backup processes and ensuring top-tier reliability, Instaclustr for ClickHouse empowers businesses with data resilience to make confident decisions supported by secure, accessible, and well-managed data solutions.
For more information: