Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add indexes #113

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Add indexes #113

wants to merge 5 commits into from

Conversation

knyghty
Copy link
Contributor

@knyghty knyghty commented Oct 13, 2024

Fixes #15

See the ticket for all the initial discussion, but this seems the best bang for the buck at the moment, but would be interested to test with different indexes and data.

@knyghty
Copy link
Contributor Author

knyghty commented Oct 13, 2024

The error for MySQL seems quite clear, we need to add max_length to the queue_name if we want to index it. But I'm not sure of a reasonable length. I suspect if we're doing that we might as well make it a CharField?

The sqlite error is also clear, I'm just not sure how to handle it other than with a custom migration, which seems a bit... annoying. I should probably also test if a partial index would work well, maybe we could avoid that or only use that for sqlite.

@RealOrangeOne
Copy link
Owner

64 characters for a queue name ought to be enough for anyone (famous last words, I know...).

The SQLite error is very weird, I agree. It reads like an SQL syntax error. If you slowly remove the use of NULLS LAST, can you tell which column is causing it?

@knyghty
Copy link
Contributor Author

knyghty commented Oct 25, 2024

I fixed the MySQL issue, though now it looks like there is an issue on some Django versions but not others. I'm not sure if this is just flaky?

The SQLite problem is tricker, it doesn't support using nulls last in indexes: https://www.sqlite.org/lang_createindex.html#nulls_first_and_nulls_last

I tried running a version of the query (and of the index) that I thought would work for ordering but it actually doesn't get used at all on SQLite for some reason. I suspect it's something to do with how the database is stored on disk but no idea really.

So it seems somewhat reasonable at this point to simply not create this index on SQLite. Unfortunately, this seems a bit tricky (especially on train wifi). I will tinker, possibly ask around, and see what I can figure out. I suspect that this solution probably wouldn't fly in core Django though. Possibly a database feature could be added, but it's unclear to me exactly how you'd use it to conditionally create an index. Maybe you have some ideas around this? Otherwise I will poke around.

@knyghty
Copy link
Contributor Author

knyghty commented Oct 28, 2024

I'm also a bit confused by one thing - @RealOrangeOne is there any particular reason NULLS LAST is used? It's not clear to me at first glance.

@RealOrangeOne
Copy link
Owner

The reason to use nulls_last is because tasks without a run_after should be considered after a task with a populated run_after.

Given 3 tasks, this is the order they'd execute:

  1. run_after = timezone.now() - relativedelta(days=1) (yesterday)
  2. run_after = None
  3. run_after = timezone.now() + relativedelta(days=1) (tomorrow)

@jonatron
Copy link
Contributor

I had a look at this. Firstly, should run_after be ascending not descending?
I can't get Postgres / MySQL to do an index scan with the following:
).filter(models.Q(run_after=None) | models.Q(run_after__lte=timezone.now()))
A workaround would be to make run_after not null, instead using a fixed low date like 0001-01-01 or datetime(1, 1, 1).
Then that filter can be changed to:
).filter(run_after__lte=timezone.now())
Then it should do an index scan without a sort, with an index like:

models.Index(
    F("status"),
    F("priority").desc(),
    F("run_after").asc(),
    name="django_task_ordering_idx",
),

I've just been testing by copying some code from test_run_after_priority into a new management command to add a large amount of data.

@RealOrangeOne RealOrangeOne added the database-backend Issues relating to the database backend label Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database-backend Issues relating to the database backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimise DB worker queries
3 participants