You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/bigquery.textile
+50-50Lines changed: 50 additions & 50 deletions
Original file line number
Diff line number
Diff line change
@@ -1,46 +1,25 @@
1
1
---
2
-
title: BigQuery rule
2
+
title: Google BigQuery
3
3
meta_description: "Stream realtime event data from Ably into Google BigQuery using the Firehose BigQuery rule. Configure, and analyze your data efficiently."
4
4
---
5
5
6
-
Stream events published to Ably directly into a table in "BigQuery":https://cloud.google.com/bigquery?utm_source=google&utm_medium=cpc&utm_campaign=emea-es-all-en-dr-bkws-all-all-trial-e-gcp-1707574&utm_content=text-ad-none-any-dev_c-cre_574561258287-adgp_Hybrid+%7C+BKWS+-+EXA+%7C+Txt+-+Data+Analytics+-+BigQuery+-+v1-kwid_43700072692462237-kwd-12297987241-userloc_1005419&utm_term=kw_big+query-net_g-plac_&&gad_source=1&gclid=Cj0KCQiAwtu9BhC8ARIsAI9JHanslQbN6f8Ho6rvEvozknlBMbqaea0s6ILK-VA9YpQhRr_IUrVz6rYaAtXeEALw_wcB&gclsrc=aw.ds&hl=en for analytical or archival purposes. General use cases include:
6
+
Stream events published to Ably directly into a table in "BigQuery":https://cloud.google.com/bigquery for analytical or archival purposes. General use cases include:
7
7
8
8
* Realtime analytics on message data.
9
9
* Centralized storage for raw event data, enabling downstream processing.
10
10
* Historical auditing of messages.
11
11
12
+
To stream data from Ably into BigQuery, you need to create a BigQuery "rule":#rule.
13
+
12
14
<aside data-type='note'>
13
-
<p>Ably's BigQuery integration rule for "Firehose":/integrations/streaming is in development status.</p>
15
+
<p>Ably's BigQuery integration for "Firehose":/docs/integrations/streaming is in alpha status.</p>
14
16
</aside>
15
17
16
-
h3(#create-rule). Create a BigQuery rule
17
-
18
-
Set up the necessary BigQuery resources, permissions, and authentication to enable Ably to securely write data to a BigQuery table:
19
-
20
-
* Create or select a BigQuery dataset in the Google Cloud Console.
21
-
* Create a BigQuery table in that dataset:
22
-
** Use the "JSON schema":#schema.
23
-
** For large datasets, partition the table by ingestion time, with daily partitioning recommended for optimal performance.
24
-
* Create a Google Cloud Platform (GCP) "service account":https://cloud.google.com/iam/docs/service-accounts-create?utm_source=chatgpt.com with the minimal required BigQuery permissions.
25
-
* Grant the service account table-level access control to allow access to the specific table.
26
-
** @bigquery.tables.get@: to read table metadata.
27
-
** @bigquery.tables.updateData@: to insert records.
28
-
* Generate and securely store the JSON key file for the service account.
29
-
** Ably requires this key file to authenticate and write data to your table.
18
+
h2(#rule). Create a BigQuery rule
30
19
31
-
h3(#settings). BigQuery rule settings
20
+
A rule defines what data gets sent, where it goes, and how it's authenticated. For example, you can improve query performance by configuring a rule to stream data from a specific channel and write them into a "partitioned":https://cloud.google.com/bigquery/docs/partitioned-tables table.
32
21
33
-
The following explains the components of the BigQuery rule settings:
34
-
35
-
|_. Section |_. Purpose |
36
-
| *Source* | Defines the type of event(s) for delivery. |
37
-
| *Channel filter* | A regular expression to filter which channels to capture. Only events on channels matching this regex are streamed into BigQuery. |
38
-
| *Table* | The full destination table path in BigQuery, typically in the format @project_id.dataset_id.table_id@. |
39
-
| *Service account Key* | A JSON key file Ably uses to authenticate with Google Cloud. You must upload or provide the contents of this key file. |
40
-
| *Partitioning* | _(Optional)_ The table must be created with the desired partitioning settings in BigQuery before making the rule in Ably. |
41
-
| *Advanced settings* | Any additional configuration or custom fields relevant to your BigQuery setup (for future enhancements). |
42
-
43
-
h4(#dashboard). Create a BigQuery rule in the dashboard
22
+
h3(#dashboard). Create a rule using the Ably dashboard
44
23
45
24
The following steps to create a BigQuery rule using the Ably dashboard:
46
25
@@ -49,23 +28,55 @@ The following steps to create a BigQuery rule using the Ably dashboard:
49
28
* Click *New integration rule*.
50
29
* Select *Firehose*.
51
30
* Choose *BigQuery* from the list of available Firehose integrations.
52
-
* Configure the rule settings as described below.Then, click *Create*.
31
+
* "Configure":#configure the rule settings. Then, click *Create*.
53
32
54
-
h4(#api-rule). Create a BigQuery rule using the Control API
33
+
h3(#api-rule). Create a rule using the ABly Control API
55
34
56
-
The following steps to create a BigQuery rule using the "Control API:":https://ably.com/docs/api#control-api
35
+
The following steps to create a BigQuery rule using the Control API:
57
36
58
-
* Using the required "rules":/control-api#examples-rules to specify the following parameters:
37
+
* Using the required "rules":/docs/control-api#examples-rules to specify the following parameters:
59
38
** @ruleType@: Set this to "bigquery" to define the rule as a BigQuery integration.
60
39
** destinationTable: Specify the BigQuery table where the data will be stored.
61
40
** @serviceAccountCredentials@: Provide the necessary GCP service account JSON key to authenticate and authorize data insertion.
62
41
** @channelFilter@ (optional): Use a regular expression to apply the rule to specific channels.
63
42
** @format@ (optional): Define the data format based on how you want messages to be structured.
64
43
* Make an HTTP request to the Control API to create the rule.
65
44
66
-
h3(#schema). JSON Schema
45
+
h2(#configure). Configure BigQuery
46
+
47
+
Using the Google Cloud "Console":https://cloud.google.com/bigquery/docs/bigquery-web-ui, configure the required BigQuery resources, permissions, and authentication to allow Ably to write data securely to BigQuery.
48
+
49
+
The following steps configure BigQuery using the Google Cloud Console:
50
+
51
+
* Create or select a *BigQuery dataset* in the Google Cloud Console.
52
+
* Create a *BigQuery table* in that dataset.
53
+
** Use the "JSON schema":#schema.
54
+
** For large datasets, partition the table by ingestion time, with daily partitioning recommended for optimal performance.
67
55
68
-
You can run queries directly against the Ably-managed BigQuery table. For example, if the message payloads are stored as raw JSON in the data column, you can parse them using the following query:
56
+
The following steps set up permissions and authentication using the Google Cloud Console:
57
+
58
+
* Create a Google Cloud Platform (GCP) "service account":https://cloud.google.com/iam/docs/service-accounts-create with the minimal required BigQuery permissions.
59
+
* Grant the service account table-level access control to allow access to the specific table.
60
+
** @bigquery.tables.get@: to read table metadata.
61
+
** @bigquery.tables.updateData@: to insert records.
62
+
* Generate and securely store the *JSON key file* for the service account.
63
+
** Ably requires this key file to authenticate and write data to your table.
64
+
65
+
h3(#settings). BigQuery configuration options
66
+
67
+
The following explains the BigQuery configuration options:
68
+
69
+
|_. Section |_. Purpose |
70
+
| *Source* | Defines the type of event(s) for delivery. |
71
+
| *Channel filter* | A regular expression to filter which channels to capture. Only events on channels matching this regex are streamed into BigQuery. |
72
+
| *Table* | The full destination table path in BigQuery, typically in the format @project_id.dataset_id.table_id@. |
73
+
| *Service account Key* | A JSON key file Ably uses to authenticate with Google Cloud. You must upload or provide the contents of this key file. |
74
+
| *Partitioning* | _(Optional)_ The table must be created with the desired partitioning settings in BigQuery before making the rule in Ably. |
75
+
| *Advanced settings* | Any additional configuration or custom fields relevant to your BigQuery setup (for future enhancements). |
76
+
77
+
h2(#schema). JSON Schema
78
+
79
+
To store and structure message data in BigQuery, you need a schema that defines the expected fields to help ensure consistency. The following is an example JSON schema for a BigQuery table:
69
80
70
81
```[json]
71
82
{
@@ -76,9 +87,9 @@ You can run queries directly against the Ably-managed BigQuery table. For exampl
76
87
}
77
88
```
78
89
79
-
h3(#queries). Direct queries
90
+
h2(#queries). Direct queries
80
91
81
-
Run queries directly against the Ably-managed table. For instance, to parse JSON payloads stored in @data@:
92
+
In Ably-managed BigQuery tables, message payloads are stored in the data column as raw JSON. You can extract fields using the following query. The following example query converts the @data@ column from @BYTES@ to @STRING@, parses it into a JSON object, and filters results by their channel name:
82
93
83
94
```[sql]
84
95
SELECT
@@ -87,20 +98,9 @@ FROM project_id.dataset_id.table_id
87
98
WHERE channel = “my-channel”
88
99
```
89
100
90
-
The following explains the components of the query:
91
-
92
-
|_. Query Function |_. Purpose |
93
-
| @CAST(data AS STRING)@ | Converts the data column from BYTES (if applicable) into a STRING format. |
94
-
| @PARSE_JSON(…)@ | Parses the string into a structured JSON object for easier querying. |
95
-
| @WHERE channel = “my-channel”@ | Filters results to retrieve messages only from a specific Ably channel. |
96
-
97
-
<aside data-type='note'>
98
-
<p>Parsing JSON at query time can be computationally expensive for large datasets. If your queries need frequent JSON parsing, consider pre-processing and storing structured fields in a secondary table using an ETL pipeline for better performance.</p>
99
-
</aside>
100
-
101
-
h4(#etl). Extract, Transform, Load (ETL)
101
+
h2(#etl). Extract, Transform, Load (ETL)
102
102
103
-
ETL is recommended for large-scale analytics and performance optimization, ensuring data is structured, deduplicated, and efficiently stored for querying. Transform raw data (JSON or BYTES) into a more structured format, remove duplicates, and write it into a secondary table optimized for analytics:
103
+
ETL is recommended for large-scale analytics to structure, deduplicate, and optimize data for querying. Since parsing JSON at query time can be costly for large datasets, pre-process and store structured fields in a secondary table instead. Convert raw data (JSON or BYTES), remove duplicates, and write it into an optimized table for better performance:
104
104
105
105
* Convert data from raw (BYTES/JSON) into structured columns for example geospatial data fields or numeric data types, for detailed analysis.
106
106
* Write transformed records to a new optimized table tailored for query performance.
0 commit comments