Skip to content

Commit a2cfa41

Browse files
fixup! EDU-1502: Adds bigQuery page
1 parent 54b2537 commit a2cfa41

File tree

1 file changed

+50
-50
lines changed

1 file changed

+50
-50
lines changed

content/bigquery.textile

Lines changed: 50 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,25 @@
11
---
2-
title: BigQuery rule
2+
title: Google BigQuery
33
meta_description: "Stream realtime event data from Ably into Google BigQuery using the Firehose BigQuery rule. Configure, and analyze your data efficiently."
44
---
55

6-
Stream events published to Ably directly into a table in "BigQuery":https://cloud.google.com/bigquery?utm_source=google&utm_medium=cpc&utm_campaign=emea-es-all-en-dr-bkws-all-all-trial-e-gcp-1707574&utm_content=text-ad-none-any-dev_c-cre_574561258287-adgp_Hybrid+%7C+BKWS+-+EXA+%7C+Txt+-+Data+Analytics+-+BigQuery+-+v1-kwid_43700072692462237-kwd-12297987241-userloc_1005419&utm_term=kw_big+query-net_g-plac_&&gad_source=1&gclid=Cj0KCQiAwtu9BhC8ARIsAI9JHanslQbN6f8Ho6rvEvozknlBMbqaea0s6ILK-VA9YpQhRr_IUrVz6rYaAtXeEALw_wcB&gclsrc=aw.ds&hl=en for analytical or archival purposes. General use cases include:
6+
Stream events published to Ably directly into a table in "BigQuery":https://cloud.google.com/bigquery for analytical or archival purposes. General use cases include:
77

88
* Realtime analytics on message data.
99
* Centralized storage for raw event data, enabling downstream processing.
1010
* Historical auditing of messages.
1111

12+
To stream data from Ably into BigQuery, you need to create a BigQuery "rule":#rule.
13+
1214
<aside data-type='note'>
13-
<p>Ably's BigQuery integration rule for "Firehose":/integrations/streaming is in development status.</p>
15+
<p>Ably's BigQuery integration for "Firehose":/docs/integrations/streaming is in alpha status.</p>
1416
</aside>
1517

16-
h3(#create-rule). Create a BigQuery rule
17-
18-
Set up the necessary BigQuery resources, permissions, and authentication to enable Ably to securely write data to a BigQuery table:
19-
20-
* Create or select a BigQuery dataset in the Google Cloud Console.
21-
* Create a BigQuery table in that dataset:
22-
** Use the "JSON schema":#schema.
23-
** For large datasets, partition the table by ingestion time, with daily partitioning recommended for optimal performance.
24-
* Create a Google Cloud Platform (GCP) "service account":https://cloud.google.com/iam/docs/service-accounts-create?utm_source=chatgpt.com with the minimal required BigQuery permissions.
25-
* Grant the service account table-level access control to allow access to the specific table.
26-
** @bigquery.tables.get@: to read table metadata.
27-
** @bigquery.tables.updateData@: to insert records.
28-
* Generate and securely store the JSON key file for the service account.
29-
** Ably requires this key file to authenticate and write data to your table.
18+
h2(#rule). Create a BigQuery rule
3019

31-
h3(#settings). BigQuery rule settings
20+
A rule defines what data gets sent, where it goes, and how it's authenticated. For example, you can improve query performance by configuring a rule to stream data from a specific channel and write them into a "partitioned":https://cloud.google.com/bigquery/docs/partitioned-tables table.
3221

33-
The following explains the components of the BigQuery rule settings:
34-
35-
|_. Section |_. Purpose |
36-
| *Source* | Defines the type of event(s) for delivery. |
37-
| *Channel filter* | A regular expression to filter which channels to capture. Only events on channels matching this regex are streamed into BigQuery. |
38-
| *Table* | The full destination table path in BigQuery, typically in the format @project_id.dataset_id.table_id@. |
39-
| *Service account Key* | A JSON key file Ably uses to authenticate with Google Cloud. You must upload or provide the contents of this key file. |
40-
| *Partitioning* | _(Optional)_ The table must be created with the desired partitioning settings in BigQuery before making the rule in Ably. |
41-
| *Advanced settings* | Any additional configuration or custom fields relevant to your BigQuery setup (for future enhancements). |
42-
43-
h4(#dashboard). Create a BigQuery rule in the dashboard
22+
h3(#dashboard). Create a rule using the Ably dashboard
4423

4524
The following steps to create a BigQuery rule using the Ably dashboard:
4625

@@ -49,23 +28,55 @@ The following steps to create a BigQuery rule using the Ably dashboard:
4928
* Click *New integration rule*.
5029
* Select *Firehose*.
5130
* Choose *BigQuery* from the list of available Firehose integrations.
52-
* Configure the rule settings as described below.Then, click *Create*.
31+
* "Configure":#configure the rule settings. Then, click *Create*.
5332

54-
h4(#api-rule). Create a BigQuery rule using the Control API
33+
h3(#api-rule). Create a rule using the ABly Control API
5534

56-
The following steps to create a BigQuery rule using the "Control API:":https://ably.com/docs/api#control-api
35+
The following steps to create a BigQuery rule using the Control API:
5736

58-
* Using the required "rules":/control-api#examples-rules to specify the following parameters:
37+
* Using the required "rules":/docs/control-api#examples-rules to specify the following parameters:
5938
** @ruleType@: Set this to "bigquery" to define the rule as a BigQuery integration.
6039
** destinationTable: Specify the BigQuery table where the data will be stored.
6140
** @serviceAccountCredentials@: Provide the necessary GCP service account JSON key to authenticate and authorize data insertion.
6241
** @channelFilter@ (optional): Use a regular expression to apply the rule to specific channels.
6342
** @format@ (optional): Define the data format based on how you want messages to be structured.
6443
* Make an HTTP request to the Control API to create the rule.
6544

66-
h3(#schema). JSON Schema
45+
h2(#configure). Configure BigQuery
46+
47+
Using the Google Cloud "Console":https://cloud.google.com/bigquery/docs/bigquery-web-ui, configure the required BigQuery resources, permissions, and authentication to allow Ably to write data securely to BigQuery.
48+
49+
The following steps configure BigQuery using the Google Cloud Console:
50+
51+
* Create or select a *BigQuery dataset* in the Google Cloud Console.
52+
* Create a *BigQuery table* in that dataset.
53+
** Use the "JSON schema":#schema.
54+
** For large datasets, partition the table by ingestion time, with daily partitioning recommended for optimal performance.
6755

68-
You can run queries directly against the Ably-managed BigQuery table. For example, if the message payloads are stored as raw JSON in the data column, you can parse them using the following query:
56+
The following steps set up permissions and authentication using the Google Cloud Console:
57+
58+
* Create a Google Cloud Platform (GCP) "service account":https://cloud.google.com/iam/docs/service-accounts-create with the minimal required BigQuery permissions.
59+
* Grant the service account table-level access control to allow access to the specific table.
60+
** @bigquery.tables.get@: to read table metadata.
61+
** @bigquery.tables.updateData@: to insert records.
62+
* Generate and securely store the *JSON key file* for the service account.
63+
** Ably requires this key file to authenticate and write data to your table.
64+
65+
h3(#settings). BigQuery configuration options
66+
67+
The following explains the BigQuery configuration options:
68+
69+
|_. Section |_. Purpose |
70+
| *Source* | Defines the type of event(s) for delivery. |
71+
| *Channel filter* | A regular expression to filter which channels to capture. Only events on channels matching this regex are streamed into BigQuery. |
72+
| *Table* | The full destination table path in BigQuery, typically in the format @project_id.dataset_id.table_id@. |
73+
| *Service account Key* | A JSON key file Ably uses to authenticate with Google Cloud. You must upload or provide the contents of this key file. |
74+
| *Partitioning* | _(Optional)_ The table must be created with the desired partitioning settings in BigQuery before making the rule in Ably. |
75+
| *Advanced settings* | Any additional configuration or custom fields relevant to your BigQuery setup (for future enhancements). |
76+
77+
h2(#schema). JSON Schema
78+
79+
To store and structure message data in BigQuery, you need a schema that defines the expected fields to help ensure consistency. The following is an example JSON schema for a BigQuery table:
6980

7081
```[json]
7182
{
@@ -76,9 +87,9 @@ You can run queries directly against the Ably-managed BigQuery table. For exampl
7687
}
7788
```
7889

79-
h3(#queries). Direct queries
90+
h2(#queries). Direct queries
8091

81-
Run queries directly against the Ably-managed table. For instance, to parse JSON payloads stored in @data@:
92+
In Ably-managed BigQuery tables, message payloads are stored in the data column as raw JSON. You can extract fields using the following query. The following example query converts the @data@ column from @BYTES@ to @STRING@, parses it into a JSON object, and filters results by their channel name:
8293

8394
```[sql]
8495
SELECT
@@ -87,20 +98,9 @@ FROM project_id.dataset_id.table_id
8798
WHERE channel = “my-channel”
8899
```
89100

90-
The following explains the components of the query:
91-
92-
|_. Query Function |_. Purpose |
93-
| @CAST(data AS STRING)@ | Converts the data column from BYTES (if applicable) into a STRING format. |
94-
| @PARSE_JSON(…)@ | Parses the string into a structured JSON object for easier querying. |
95-
| @WHERE channel = “my-channel”@ | Filters results to retrieve messages only from a specific Ably channel. |
96-
97-
<aside data-type='note'>
98-
<p>Parsing JSON at query time can be computationally expensive for large datasets. If your queries need frequent JSON parsing, consider pre-processing and storing structured fields in a secondary table using an ETL pipeline for better performance.</p>
99-
</aside>
100-
101-
h4(#etl). Extract, Transform, Load (ETL)
101+
h2(#etl). Extract, Transform, Load (ETL)
102102

103-
ETL is recommended for large-scale analytics and performance optimization, ensuring data is structured, deduplicated, and efficiently stored for querying. Transform raw data (JSON or BYTES) into a more structured format, remove duplicates, and write it into a secondary table optimized for analytics:
103+
ETL is recommended for large-scale analytics to structure, deduplicate, and optimize data for querying. Since parsing JSON at query time can be costly for large datasets, pre-process and store structured fields in a secondary table instead. Convert raw data (JSON or BYTES), remove duplicates, and write it into an optimized table for better performance:
104104

105105
* Convert data from raw (BYTES/JSON) into structured columns for example geospatial data fields or numeric data types, for detailed analysis.
106106
* Write transformed records to a new optimized table tailored for query performance.

0 commit comments

Comments
 (0)