Domain Integration

ClickHouse clusters by default have security restrictions preventing egress traffic but can be integrated with public domains to enable access.

Once a domain is integrated, the cluster will be able to access specific domains from the cluster, or use the ClickHouse table engine functions where access path is domain-based (e.g., URL, AzureBlobStorage, AzureQueue, S3, S3Queue, etc) to read and write data to that domain. Examples for both scenarios are provided later on the page. 

Clusters on the Netapp Instaclustr managed platform are secured through egress firewall rules to protect against data exfiltration. Integrating with Domains adds a whitelist rule to the firewall enabling access. Consider the security risk before enabling a Domain integration.

How To Enable

The following steps explain how to integrate a ClickHouse cluster with a Domain.

  1. First select the “Integrations” option in console. The page will show existing integrations.

  2. Select “Add New Integration” to configure a new
  3. For type select “Domain” then specify the domain to integrate with.
  4. Finally press “Add” to configure the integration.
  5. The Integrations table now shows the newly configured integration. An integration can be deleted by pressing the “Delete” button, disabling access to the region.

Once domain integration is enabled, you would be able to use certain domain-based ClickHouse table engines. Below are a few examples.  

How To Use ClickHouse URL Table Engine

ClickHouse’s URL table engine provide robust mechanisms for working with large datasets stored on the web. By leveraging these engines, you can efficiently manage and query your data directly from ClickHouse.  Brief examples regarding usage are included below.

For detailed information, refer to the official documentation:

URL Table Engine

The URL table engine allows you to create tables that read from and write to online data, in a range of formats.

Creating an S3 Table

To create a table using the S3 engine, you need to specify the URL and the format of the data. Here is an example:

 Loading Data

Load data into the table by inserting data directly:

Querying Data

Query data from the URL table as you would with any other table:

AzureBlobStorage Table Engine 

The AzureBlobStorage table engine provides an integration with Azure Blob Storage ecosystem, allowing you to create tables that read from and write to Azure Blob storage account data, in a range of formats. 

Creating an AzureBlobStorage Table 

To create a table using the AzureBlobStorage engine, you need to specify the storage account endpoint, the Shared Access Signatures (SAS), and the format of the data. Here is an example from the ClickHouse GitHub documentation: 

Loading Data 

Load data into the table by inserting data directly: 

Querying Data 

Query data from the AzureBlobStorage table as you would with any other table: 

AzureQueue Table Engine 

The AzureQueue table engine provides an integration with the Azure Blob Storage ecosystem, allowing streaming data import. 

Creating an AzureQueue Table 

Similar to creating an AzureBlobStorage table, an AzureQueue table could be created as follows (examples taken from the ClickHouse GitHub documentation):  

As an alternative to using account key for access, you could format a connection string as follows using the SAS token generated from the storage account level with desired permissions: 

Unlike AzureBlobStorage table engine though, the AzureQueue table engine is used for streaming data, therefore SELECT queries are not particularly useful as all files will only be read once. It is more practical to create real-time threads using materiralized views as follows: