Configure a ClickHouse Cluster for Tiered Storage

Note: This feature is currently available for clusters provisioned with AWS and Azure providers only.

Get your AWS S3 bucket ready

To be able to use tiered storage for your cluster provisioned in AWS, an S3 bucket must be configured meeting the following conditions:

- It must be in the same region as the cluster.

Ideally it should be in the same account as the cluster is provisioned in. However, if it is in a different account, it must have bucket policy to allow access by the IAM role of the Cluster Data Centre (which would be created after the cluster has been provisioned). Check relevant documentation here.
It must not have any of the following features enabled:
- Versioning
- Object locking
- Lifecycle rules
- Intelligent-tiering archiving
- Server-side encryption using AWS Key Management Service (KMS)

Here are the steps to setup an S3 bucket:

Login to your AWS account. Go to S3 > Create a bucket in the same region as the intended cluster.
Give a name for the bucket.
Leave Bucket Versioning disabled.
For encryption, only Amazon S3 managed keys are currently supported.
In Advanced settings, make sure Object Lock is disabled.
Lastly, go ahead and create the bucket.
Create a ClickHouse cluster with Tiered Storage enabled following guides in Creating a ClickHouse Cluster.

Basic usage

Use of remote storage is governed by a couple of pre-configured storage policies. They are:

ic_tiered: This policy will initially store data on local volume (hot). However, it will start pushing data to the remote storage once 60% of the local disk is used. When performing read operations, data will be pulled and cached on local disk. Cache can grow up to 20% the size of the local data disk.
Important: For clusters where Tiered Storage has been successfully enabled and configured, it is not necessary to explicitly set the storage policy to ic_tiered at table creation time since we make it the default in that case. If a table however needs to be stored on local disk only, it is required to set storage_policy to ‘default’. Note that specifying ‘default’ is not required if the cluster does not have tiered storage enabled.
ic_remote_with_cache: This policy will force the entire table to be stored on remote storage. However, ClickHouse may start caching data when read operations are performed in the same way mentioned above. Depending on your read pattern and workload this policy may result in increased read latency. Therefore, it should be used only when it is suitable for your specific scenario.

The following example shows the general structure of how these policies can be specified at table creation time:

CREATE TABLE my_table (data Int32) 
ENGINE = ReplicatedMergeTree() 
ORDER BY data 
SETTINGS storage_policy = '<storage-policy-name>';

CREATE TABLE my_table (data Int32)

ENGINE = ReplicatedMergeTree()

ORDER BY data

SETTINGS storage_policy = '<storage-policy-name>';

The tiered storage feature can also be used in combination with Table TTLs to enable movement of data from local to remote storage based on the age of data. Let’s take a look at the following table creation example:

CREATE TABLE my_table_with_ttl ( 
    d DateTime, 
    n UInt32 
) ENGINE = MergeTree 
ORDER BY d 
PARTITION BY toYYYYMMDDhhmmss(d) 
TTL d + INTERVAL 1 WEEK TO VOLUME 'remote', 
    d + INTERVAL 5 WEEK DELETE; 
// Note: storage_policy defaults to 'ic_tiered'

CREATE TABLE my_table_with_ttl (

d DateTime,

n UInt32

) ENGINE = MergeTree

ORDER BY d

PARTITION BY toYYYYMMDDhhmmss(d)

TTL d + INTERVAL 1 WEEK TO VOLUME 'remote',

d + INTERVAL 5 WEEK DELETE;

// Note: storage_policy defaults to 'ic_tiered'

With the above table definition in place, data will be moved from local volume to remove storage once they age 1 week. They would then get deleted from the remote storage when they age 5 weeks.

What’s important to remember

It is highly recommended that the bucket/storage account designated as remote storage is not used for any other purpose, as accidentally deleted/mutated data may not be recoverable/revertible.
Deleting a cluster will not automatically delete data stored in remote storage.

Questions

Please contact [email protected] for any further inquiries.

Configure a ClickHouse Cluster for Tiered Storage

Get your AWS S3 bucket ready

Basic usage

What’s important to remember

Questions

Need help with
your Cluster?

Learn about our
Managed platform

Configure a ClickHouse Cluster for Tiered Storage

Get your AWS S3 bucket ready

Basic usage

What’s important to remember

Questions

Need help withyour Cluster?

Learn about ourManaged platform

Need help with
your Cluster?

Learn about our
Managed platform