ClickHouse · ShimonSte · Jan 8, 2026 · Dec 30, 2025 · Dec 31, 2025 · Dec 31, 2025
@@ -0,0 +1,311 @@
+---
+sidebar_label: 'Databricks'
+sidebar_position: 3
+slug: /integrations/data-ingestion/apache-spark/databricks
+description: 'Integrate ClickHouse with Databricks'
+keywords: ['clickhouse', 'databricks', 'spark', 'unity catalog', 'data']
+title: 'Integrating ClickHouse with Databricks'
+doc_type: 'guide'
+---
+
+import Image from '@theme/IdealImage';
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+import ClickHouseSupportedBadge from '@theme/badges/ClickHouseSupported';
+
+# Integrating ClickHouse with Databricks
+
+<ClickHouseSupportedBadge/>
+
+The ClickHouse Spark connector works seamlessly with Databricks. This guide covers platform-specific setup, installation, and usage patterns for Databricks.
+
+## API Selection for Databricks {#api-selection}
+
+By default, Databricks uses Unity Catalog, which blocks Spark catalog registration. In this case, you **must** use the **TableProvider API** (format-based access).
+
+However, if you disable Unity Catalog by creating a cluster with **No isolation shared** access mode, you can use the **Catalog API** instead. The Catalog API provides centralized configuration and native Spark SQL integration.
+
+| Unity Catalog Status | Recommended API | Notes |
+|---------------------|------------------|-------|
+| **Enabled** (default) | TableProvider API (format-based) | Unity Catalog blocks Spark catalog registration |
+| **Disabled** (No isolation shared) | Catalog API | Requires cluster with "No isolation shared" access mode |
+
+## Installation on Databricks {#installation}
+
+### Option 1: Upload JAR via Databricks UI {#installation-ui}
+
+1. Build or [download](https://repo1.maven.org/maven2/com/clickhouse/spark/) the runtime JAR:
+   ```bash
+   clickhouse-spark-runtime-{{ spark_binary_version }}_{{ scala_binary_version }}-{{ stable_version }}.jar
+   ```
+
+2. Upload the JAR to your Databricks workspace:
+   - Go to **Workspace** → Navigate to your desired folder
+   - Click **Upload** → Select the JAR file
+   - The JAR will be stored in your workspace
+
+3. Install the library on your cluster:
+   - Go to **Compute** → Select your cluster
+   - Click the **Libraries** tab
+   - Click **Install New**
+   - Select **DBFS** or **Workspace** → Navigate to the uploaded JAR file
+   - Click **Install**
+
+<Image img={require('@site/static/images/integrations/data-ingestion/apache-spark/databricks/databricks-libraries-tab.png')} alt="Databricks Libraries tab" />
+
+<Image img={require('@site/static/images/integrations/data-ingestion/apache-spark/databricks/databricks-install-from-volume.png')} alt="Installing library from workspace volume" />
+
+4. Restart the cluster to load the library
+
+### Option 2: Install via Databricks CLI {#installation-cli}
+
+```bash
+# Upload JAR to DBFS
+databricks fs cp clickhouse-spark-runtime-{{ spark_binary_version }}_{{ scala_binary_version }}-{{ stable_version }}.jar \
+  dbfs:/FileStore/jars/
+
+# Install on cluster
+databricks libraries install \
+  --cluster-id <your-cluster-id> \
+  --jar dbfs:/FileStore/jars/clickhouse-spark-runtime-{{ spark_binary_version }}_{{ scala_binary_version }}-{{ stable_version }}.jar
+```
+
+### Option 3: Maven Coordinates (Recommended) {#installation-maven}
+
+1. Navigate to your Databricks workspace:
+   - Go to **Compute** → Select your cluster
+   - Click the **Libraries** tab
+   - Click **Install New**
+   - Select **Maven** tab
+
+2. Add the Maven coordinates:
+
+```text
+com.clickhouse.spark:clickhouse-spark-runtime-{{ spark_binary_version }}_{{ scala_binary_version }}:{{ stable_version }}
+```
+
+<Image img={require('@site/static/images/integrations/data-ingestion/apache-spark/databricks/databricks-maven-tab.png')} alt="Databricks Maven libraries configuration" />
+
+3. Click **Install** and restart the cluster to load the library
+
+## Using TableProvider API {#tableprovider-api}
+
+When Unity Catalog is enabled (default), you **must** use the TableProvider API (format-based access) because Unity Catalog blocks Spark catalog registration. If you've disabled Unity Catalog by using a cluster with "No isolation shared" access mode, you can use the [Catalog API](/docs/integrations/data-ingestion/apache-spark/spark-native-connector#register-the-catalog-required) instead.
+
+### Reading Data {#reading-data-table-provider}
+
+<Tabs groupId="databricks_usage">
+<TabItem value="Python" label="Python" default>
+
+```python
+# Read from ClickHouse using TableProvider API
+df = spark.read \
+    .format("clickhouse") \
+    .option("host", "your-clickhouse-cloud-host.clickhouse.cloud") \
+    .option("protocol", "https") \
+    .option("http_port", "8443") \
+    .option("database", "default") \
+    .option("table", "events") \
+    .option("user", "default") \
+    .option("password", dbutils.secrets.get(scope="clickhouse", key="password")) \
+    .option("ssl", "true") \
+    .load()
+
+# Schema is automatically inferred
+df.display()
+```
+
+</TabItem>
+<TabItem value="Scala" label="Scala">
+
+```scala
+val df = spark.read
+  .format("clickhouse")
+  .option("host", "your-clickhouse-cloud-host.clickhouse.cloud")
+  .option("protocol", "https")
+  .option("http_port", "8443")
+  .option("database", "default")
+  .option("table", "events")
+  .option("user", "default")
+  .option("password", dbutils.secrets.get(scope="clickhouse", key="password"))
+  .option("ssl", "true")
+  .load()
+
+df.show()
+```
+
+</TabItem>
+</Tabs>
+
+### Writing Data {#writing-data-unity}
+
+<Tabs groupId="databricks_usage">
+<TabItem value="Python" label="Python" default>
+
+```python
+# Write to ClickHouse - table will be created automatically if it doesn't exist
+df.write \
+    .format("clickhouse") \
+    .option("host", "your-clickhouse-cloud-host.clickhouse.cloud") \
+    .option("protocol", "https") \
+    .option("http_port", "8443") \
+    .option("database", "default") \
+    .option("table", "events_copy") \
+    .option("user", "default") \
+    .option("password", dbutils.secrets.get(scope="clickhouse", key="password")) \
+    .option("ssl", "true") \
+    .option("order_by", "id") \  # Required: specify ORDER BY when creating a new table
+    .option("settings.allow_nullable_key", "1") \  # Required for ClickHouse Cloud if ORDER BY has nullable columns
+    .mode("append") \
+    .save()
+```
+
+</TabItem>
+<TabItem value="Scala" label="Scala">
+
+```scala
+df.write
+  .format("clickhouse")
+  .option("host", "your-clickhouse-cloud-host.clickhouse.cloud")
+  .option("protocol", "https")
+  .option("http_port", "8443")
+  .option("database", "default")
+  .option("table", "events_copy")
+  .option("user", "default")
+  .option("password", dbutils.secrets.get(scope="clickhouse", key="password"))
+  .option("ssl", "true")
+  .option("order_by", "id")  // Required: specify ORDER BY when creating a new table
+  .option("settings.allow_nullable_key", "1")  // Required for ClickHouse Cloud if ORDER BY has nullable columns
+  .mode("append")
+  .save()
+```
+
+</TabItem>
+</Tabs>
+
+:::note
+This example assumes preconfigured secret scopes in Databricks. For setup instructions, see the Databricks [Secret management documentation](https://docs.databricks.com/aws/en/security/secrets/).
+:::
+
+## Databricks-Specific Considerations {#considerations}
+
+### Secret Management {#secret-management}
+
+Use Databricks secret scopes to securely store ClickHouse credentials:
+
+```python
+# Access secrets
+password = dbutils.secrets.get(scope="clickhouse", key="password")
+```
+
+For setup instructions, see the Databricks [Secret management documentation](https://docs.databricks.com/aws/en/security/secrets/).
+
+<!-- TODO: Add screenshot of Databricks secret scopes configuration -->
+
+### ClickHouse Cloud Connection {#clickhouse-cloud}
+
+When connecting to ClickHouse Cloud from Databricks:
+
+1. Use **HTTPS protocol** (`protocol: https`, `http_port: 8443`)
+2. Enable **SSL** (`ssl: true`)
+
+## Examples {#examples}
+
+### Complete Workflow Example {#workflow-example}
+
+<Tabs groupId="databricks_usage">
+<TabItem value="Python" label="Python" default>
+
+```python
+from pyspark.sql import SparkSession
+from pyspark.sql.functions import col
+
+# Initialize Spark with ClickHouse connector
+spark = SparkSession.builder \
+    .config("spark.jars.packages", "com.clickhouse.spark:clickhouse-spark-runtime-3.4_2.12:0.9.0") \
+    .getOrCreate()
+
+# Read from ClickHouse
+df = spark.read \
+    .format("clickhouse") \
+    .option("host", "your-host.clickhouse.cloud") \
+    .option("protocol", "https") \
+    .option("http_port", "8443") \
+    .option("database", "default") \
+    .option("table", "source_table") \
+    .option("user", "default") \
+    .option("password", dbutils.secrets.get(scope="clickhouse", key="password")) \
+    .option("ssl", "true") \
+    .load()
+
+# Transform data
+transformed_df = df.filter(col("status") == "active")
+
+# Write to ClickHouse
+transformed_df.write \
+    .format("clickhouse") \
+    .option("host", "your-host.clickhouse.cloud") \
+    .option("protocol", "https") \
+    .option("http_port", "8443") \
+    .option("database", "default") \
+    .option("table", "target_table") \
+    .option("user", "default") \
+    .option("password", dbutils.secrets.get(scope="clickhouse", key="password")) \
+    .option("ssl", "true") \
+    .option("order_by", "id") \
+    .mode("append") \
+    .save()
+```
+
+</TabItem>
+<TabItem value="Scala" label="Scala">
+
+```scala
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.functions.col
+
+// Initialize Spark with ClickHouse connector
+val spark = SparkSession.builder
+  .config("spark.jars.packages", "com.clickhouse.spark:clickhouse-spark-runtime-3.4_2.12:0.9.0")
+  .getOrCreate()
+
+// Read from ClickHouse
+val df = spark.read
+  .format("clickhouse")
+  .option("host", "your-host.clickhouse.cloud")
+  .option("protocol", "https")
+  .option("http_port", "8443")
+  .option("database", "default")
+  .option("table", "source_table")
+  .option("user", "default")
+  .option("password", dbutils.secrets.get(scope="clickhouse", key="password"))
+  .option("ssl", "true")
+  .load()
+
+// Transform data
+val transformedDF = df.filter(col("status") === "active")
+
+// Write to ClickHouse
+transformedDF.write
+  .format("clickhouse")
+  .option("host", "your-host.clickhouse.cloud")
+  .option("protocol", "https")
+  .option("http_port", "8443")
+  .option("database", "default")
+  .option("table", "target_table")
+  .option("user", "default")
+  .option("password", dbutils.secrets.get(scope="clickhouse", key="password"))
+  .option("ssl", "true")
+  .option("order_by", "id")
+  .mode("append")
+  .save()
+```
+
+</TabItem>
+</Tabs>
+
+## Related Documentation {#related}
+
+- [Spark Native Connector Guide](/docs/integrations/data-ingestion/apache-spark/spark-native-connector) - Complete connector documentation
+- [TableProvider API Documentation](/docs/integrations/data-ingestion/apache-spark/spark-native-connector#using-the-tableprovider-api-format-based-access) - Format-based access details
+- [Catalog API Documentation](/docs/integrations/data-ingestion/apache-spark/spark-native-connector#register-the-catalog-required) - Catalog-based access details