-
Notifications
You must be signed in to change notification settings - Fork 366
Add searchable field column to handle full text search #8544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add searchable field column to handle full text search #8544
Conversation
treeherder/model/models.py
Outdated
] | ||
|
||
def save(self, *args, **kwargs): | ||
self.search_vector = SearchVector( | ||
self.revision, self.author, Substr(self.comments, 1, 100000), config="english" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple of questions:
- would it be possible to add the search_vector column to the push model instead of the commit model? This would save us a join.
- I believe we can reduce the value
100000
, we are really interested in the first line. Actually I wonder if we could just get the first line (that is, substr to the first "\n")?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would still need a join to access comments
if we move up to the push model
treeherder/webapp/api/push.py
Outdated
) | ||
.filter( | ||
search=SearchQuery(search_param, config="english"), | ||
Commit.objects.filter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can probably switch to the code that you tried before and that avoids the subquery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry what code did I try before? you mean the one without the search vector field?
field=django.contrib.postgres.search.SearchVectorField( | ||
blank=True, null=True | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we populate it for all existing commits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes. I only did that locally by running
from django.contrib.postgres.search import SearchVector
from treeherder.model.models import Commit
Commit.objects.update(
search_vector=SearchVector('comments', 'author', 'revision')
)
I'll have to create a migration to update existing data
@@ -1,8 +1,7 @@ | |||
# Generated by Django 5.1.5 on 2025-02-27 18:06 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The migration has been applied, a new migration is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rolled back to the old migration 0037_bugjob...
then deleted my 0038_commit_search..
migration. so when I ran makemigration
and another one got generated, I renamed it to 0038_commit_search..
and applied migration. Would that still be a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A migration 0039 landed which depends on the old 0038. Please use a new migration 0040.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay got it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created a new migration for this
7cbf478
to
7114688
Compare
@@ -0,0 +1,20 @@ | |||
# Generated by Django 5.1.5 on 2025-03-05 23:46 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fold this migration into the 0040 one.
# Test search by comments | ||
resp = client.get( | ||
reverse("push-list", kwargs={"project": test_repository.name}) + "?search=bug" | ||
) | ||
assert resp.status_code == 200 | ||
|
||
results = resp.json()["results"] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug code to remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea missed that :)
class Migration(migrations.Migration): | ||
|
||
replaces = [ | ||
("model", "0040_remove_commit_search_vector_idx_commit_search_vector_and_more"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two migrations don't exist (outside your local database if it hasn't been reset). replaces
might even break the migration of the ones mentioned here are missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay will take replaces
out
tests/webapp/api/test_push_api.py
Outdated
# Test search by comments | ||
resp = client.get( | ||
reverse("push-list", kwargs={"project": test_repository.name}) + "?search=bug" | ||
) | ||
assert resp.status_code == 200 | ||
|
||
results = resp.json()["results"] | ||
|
||
print(list(Commit.objects.values("revision", "author", "comments", "search_vector"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Substr
for comments
should still be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added Substr
to the migration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Archaeopteryx when you have the time, can you review the PR? I've updated the recommended changes and rebased branch
2f4baa3
to
d08e40d
Compare
@@ -0,0 +1,39 @@ | |||
# Generated by Django 5.1.5 on 2025-03-06 14:49 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file needs a small rebase, another migration has landed.
…earch vector column and update existing fields
d08e40d
to
2c3e665
Compare
This PR is an alternative approach for handling full-text search using SearchVector.
Changes:
Added a
search_vector
field to theCommit
model:search_vector = SearchVectorField(null=True, blank=True)
Now when we run a query, instead of creating the SearchVector dynamically, the search now runs against the already populated
search_vector field
:The
search_vector
is updated when a new commit is addedThis can also be done with Postgres Triggers(but I didn't want to tamper with the db) also it's easier to debug if django handles it.
Endpoint:
http://localhost:8000/api/project/try/push/?search=1906541
Result: Returns results matching the query across relevant fields such as
bug_numbers
,summary
,author
, andrevisions
.When I print
filtered_commits.query
I getand when I run
EXPLAIN ANALYZE
on the query, I getVideo of
http://localhost:8000/api/project/try/push/?search=1906541
results on browser17.01.2025_07.48.28_REC.mp4