Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partitioning by day no longer works for DATE column #1162

Open
bnaul opened this issue Jan 24, 2025 · 2 comments · May be fixed by #1179
Open

Partitioning by day no longer works for DATE column #1162

bnaul opened this issue Jan 24, 2025 · 2 comments · May be fixed by #1179
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API.

Comments

@bnaul
Copy link
Contributor

bnaul commented Jan 24, 2025

It appears my fix from #1074 was undone by #1113. In 1.12.0 the following works:

import datetime
import sqlalchemy

import sqlalchemy_bigquery
from google.cloud.bigquery import TimePartitioning

!bq rm -f tmp.test_table_create
engine = sqlalchemy.create_engine(f"bigquery:///tmp", echo=True)
meta = sqlalchemy.MetaData()
table = sqlalchemy.Table(
    f"tmp.test_table_create",
    meta,
    sqlalchemy.Column("integer_c", sqlalchemy.Integer, doc="column description"),
    sqlalchemy.Column("float_c", sqlalchemy.Float),
    sqlalchemy.Column("decimal_c", sqlalchemy.DECIMAL),
    sqlalchemy.Column("string_c", sqlalchemy.String),
    sqlalchemy.Column("text_c", sqlalchemy.Text),
    sqlalchemy.Column("boolean_c", sqlalchemy.Boolean),
    sqlalchemy.Column("timestamp_c", sqlalchemy.TIMESTAMP),
    sqlalchemy.Column("datetime_c", sqlalchemy.DATETIME),
    sqlalchemy.Column("date_c", sqlalchemy.DATE),
    sqlalchemy.Column("time_c", sqlalchemy.TIME),
    sqlalchemy.Column("binary_c", sqlalchemy.BINARY),
    bigquery_description="test table description",
    bigquery_friendly_name="test table name",
    bigquery_expiration_timestamp=datetime.datetime(2183, 3, 26, 8, 30, 0),
    bigquery_time_partitioning=TimePartitioning(
        field="date_c",
        expiration_ms=1000 * 60 * 60 * 24 * 30,  # 30 days
        type_="DAY"
    ),
    bigquery_require_partition_filter=True,
    bigquery_default_rounding_mode="ROUND_HALF_EVEN",
    bigquery_clustering_fields=["integer_c", "decimal_c"],
)
table.create(engine)

whereas in 1.12.1 it fails. From the description of #1113 it seems this omission was intentional, but I don't see why it's necessary to remove support for this (extremely fundamental and ubiquitous) type of partitioning: the old special casing for DATE seems like it would apply just as well here and IMO should be restored. Same goes for the change #1116 which was similarly not addressed.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. label Jan 24, 2025
@chalmerlowe
Copy link
Collaborator

@bnaul: apologize for the delay and any confusion.

I suspect that I fell afoul of a discrepancy in the "documentation" and ended up going with the most restrictive. Let me talk to some folks to figure what is supposed to be the approved and supported API behavior.

Context

I may not have fully recognized the discrepancy when I originally approved PR #1074.

I got more insight when I was rooting through all the internals to enable the follow-on PR ##1113

What discrepancy? We are provided different guidance in the error message returned as a DatabaseError (which implies DATE_TRUNC should only handle month/year) and the guidance found in the online BigQuery Documentation. See below (which implies DATE_TRUNC can handle 'day', 'month', 'year'):

DatabaseError

DatabaseError: (google.cloud.bigquery.dbapi.exceptions.DatabaseError) 400 

PARTITION BY expression must be

_PARTITIONDATE,
DATE(_PARTITIONTIME),
DATE(<timestamp_column>),
DATE(<datetime_column>),
DATETIME_TRUNC(<datetime_column>, DAY/HOUR/MONTH/YEAR),
a DATE column,
TIMESTAMP_TRUNC(<timestamp_column>, DAY/HOUR/MONTH/YEAR),
DATE_TRUNC(<date_column>, MONTH/YEAR),
or RANGE_BUCKET(<int64_column>, GENERATE_ARRAY(<int64_value>, <int64_value>[, <int64_value>]))

Where I saw the above: The description from Issue 1056 where the above error is quoted.

BigQuery Documentation

Time-unit column partitioning
You can partition a table on a DATE,TIMESTAMP, or DATETIME column in the table. When you write data to the table, BigQuery automatically puts the data into the correct partition, based on the values in the column.

For TIMESTAMP and DATETIME columns, the partitions can have either hourly, daily, monthly, or yearly granularity. For DATE columns, the partitions can have daily, monthly, or yearly granularity. Partitions boundaries are based on UTC time.

Source: https://cloud.google.com/bigquery/docs/partitioned-tables#date_timestamp_partitioned_tables

@chalmerlowe chalmerlowe linked a pull request Mar 19, 2025 that will close this issue
@chalmerlowe
Copy link
Collaborator

@bnaul

I am wondering whether this PR solves the problem you mention here.
#1184

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants