ClickHouse Cluster Backup and Restore
This document describes the backup and restore services available and enabled by default with all Instaclustr for ClickHouse clusters.
Backups
Backups taken for ClickHouse clusters comprise of a combinations of two different types of backups, depending on whether there are existing backups or not and the time they were taken, if any)
- Full backups: A full backup includes all the data on a ClickHouse cluster, including customer data, authentication details and table schema. A single copy of replicated data is included.
- Incremental Backups: An incremental backup is based on the most recent full backup and covers the delta since the full backup was taken. The two backups together allow restoring a cluster to the time the last incremental backup was taken.
All backup snapshots are uploaded to an external storage repository (e.g., an S3 storage bucket for an AWS cluster). For RIIA customers, the retention period for Instaclustr’s S3 bucket is 7 days. For RIYOA customers, retention policy must be set by customers on the backup bucket in their accounts to be at least 7 days to ensure validity of the incremental backups.
Automated Backups
All ClickHouse clusters will be configured to perform an automated backup. Depending on the SLA tier of a cluster, scheduling of automated backups differs, as shown in the table below.
SLA Tier | Full Backup Schedule | Incremental Backup Schedule |
Developer | Once every 6 days | Once every 24 hours |
Production | Once every 6 days | Once every 6 hours |
User Triggered Backups
Users can manually trigger backups using the Console or the Instaclustr API. The type of these backups will be automatically decided by the system to be either full or incremental depending on whether a full backup already exists in the remote repository and the time the previous backups were taken. Assuming you have already provisioned a ClickHouse cluster following the steps in Creating a ClickHouse Cluster, these steps below shows how you could trigger a manual backup.
-
Using Instaclustr Console
- Choose your cluster from the left panel and navigate to the Backup and Restore tab
- Go to the Backup tab on the page and click the “Start Backup” button to start a backup. Since there is currently no existing backup in the past 7 days, the immediate one taken will be a full backup.
- The backup usually takes some time depending on how much data needs to be backed-up, and you can safely navigate away from the page once you’ve triggered the operation. The page will display progress of the backups for each node while the operation is ongoing. Once backups are done, you can expect to see their “Status” as “completed” on the page.
-
Using Instaclustr API
- Designated API endpoints are available for triggering ClickHouse backups manually. Send a POST request to the endpoint below with your API keys to trigger a backup manually:
-
1https://api.instaclustr.com/cluster-management/v2/operations/applications/clickhouse/clusters/v2/<cluster-id>/trigger-backup/v2/
-
Restores
ClickHouse cluster backups may be used to restore data from a point-in-time to a new cluster, via the Console or using the Instaclustr API. Data is always restored to a new cluster.
-
Using Instaclustr Console
- Choose your cluster from the left panel and navigate to the Backup and Restore tab
- Select the Restore tab on the page and you should see a list of available backups for the nodes. Choose a point-in-time in the Local Time input field and click the Start Restore button to start restoring the backup from the nearest backup point before that time to a new cluster. For example, if a restore from 17:00:00 10th Oct 2024 AEST is requested on the cluster while there is none available at that exact time, the previous available backup, for example, from 15:29:17 10th Oct 2024 AEST, will be used for restoration.
- By clicking “Start Restore”, the system will start provisioning a new cluster with an identical setup (in terms of number of shards, replicas, ClickHouse Keepers, etc) as the current cluster and then restore the specified backup to it.
-
Using Instaclustr API
- Similar to backups, a designated API endpoint is available for restoring a cluster given a point-in-time where a backup is available for it. Send a POST request to the endpoint below with the attached payload to start the process:
- Endpoint:
1https://api.instaclustr.com/cluster-management/v2/operations/applications/clickhouse/restore/v2/ - Payload:
1{ "clusterId": "<cluster-id>","pointInTime": null }
- Endpoint:
- Once the request’s accepted, you will be able to view details of the restore cluster in your Instaclustr Console.