Skip to content

Partitioned model invalid SQL CREATE TABLE syntax generated when using the db_tablespace option #256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tatuylonen opened this issue Dec 16, 2024 · 1 comment

Comments

@tatuylonen
Copy link

Invalid SQL CREATE TABLE syntax is generated when using db_tablespace option with PostgresPartitionedModel. For example, tbe following model creates incorrect SQL syntax:

class ATestModel(PostgresPartitionedModel):
    id = models.AutoField(primary_key=True)
    class Meta:
        db_tablespace="some_ts"
    class PartitioningMeta:
        method = PostgresPartitioningMethod.HASH
        key = ["id"]

The resulting SQL (as reported by manager.py sqlmigrate) is

CREATE TABLE "app_atestmodel" ("id" integer NOT NULL USING INDEX TABLESPACE "some_ts" GENERATED BY DEFAULT AS IDENTITY) TABLESPACE "some_ts, PRIMARY KEY ("id")) PARTITION BY HASH ("id");

However, the Postgresql CREATE TABLE syntax requires PRIMARY KEY or UNIQUE before index_parameters (i.e., the USING INDEX clause), see https://www.postgresql.org/docs/current/sql-createtable.html. This results in an exception being thrown during manage.py migrate:

django.db.utils.ProgrammingError: syntax error at or near "USING"
LINE 1: ...TABLE "app_atestmodel" ("id" integer NOT NULL USING INDE...

The bug appears to be in backend/schema.py, function create_partitioned_model, where it calls sql.replace to remove PRIMARY KEY from the SQL statement and inserts a different PRIMARY KEY clause at the end. This corrupts the SQL statement syntax.

Since table partitioning would typically be used with large databases and/or large tables, the partitioned tables would frequently be the very tables for which one might want to specify a separate tablespace. In any case, being able to specify tablespaces is important for placing some tables on SSD and some very large tables on HDD. It is also important to be able to specify tablespaces for indexes (in this example, the model tablespace is automatically propagated to the indexes).

@tatuylonen
Copy link
Author

The generated SQL is incorrect also in another way after the above is corrected (I simply commented out the lines that mungles the PRIMARY KEY clauses, as I only wanted to partition using the primary key). The create_partitioned_model (backend/schema.py) also appends the PARTITION BY clause at the end of the SQL statement. However, this also generates incorrect SQL when the table has a db_tablespace option, which causes django to add a TABLESPACE clause at the end of the statement. However, the CREATE TABLE syntax in postgres requires the PARTITION TABLE clause to preceed the TABLESPACE clause.

A quick fix to this second problem is to replace the sql += self.sql_partition % ... line by the following code:

       idx = sql.rfind(")")
        assert idx > 0
        sql = sql[:idx + 1] + part_by + sql[idx + 1:]

i.e., to insert the PARTITION BY clause after the last closing parenthesis in the statement generated by django. However, this is a kludge and there could be additional options that could disrupt this; a better approach would be to count parentheses and insert PARTITION BY after the initial parenthesis is closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant