Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching the catalog by reader is slow #155

Closed
garethsime opened this issue Jan 17, 2023 · 2 comments · Fixed by #157
Closed

Searching the catalog by reader is slow #155

garethsime opened this issue Jan 17, 2023 · 2 comments · Fixed by #157

Comments

@garethsime
Copy link
Contributor

Hi! I'm currently running the latest stuff from librivox-ansible locally, which is using librivox-catalog 2d270e76f8bfbb0e9f4e75a68c06d3e81ae2a799 from 30th December.

When I run the catalog locally and do a reader-only search, it takes about 16s to respond. I noticed #127, which looks like it solves this problem, but it's still slow for me both locally and when I run the same queries against the production site.

Expectation

Searching the catalog by just reader returns results reasonably quickly.

Actual

Searching the catalog by just reader returns results after about 16 seconds.

What I've tried so far

It looks like it takes about 16s to respond locally - it takes about 8 seconds per call to Librivox_search#advanced_title_search, of which there are two (one for the first page of results, and one to decide how many pages there are in total).

I noticed that the vast majority of the time is spent on fetching projects by section author when building $reader_clause:

$reader_clause = '';
if (!empty($params['reader']))
{
  ...
  $section_project_ids = $this->_get_projects_by_section_author($params['author']);
  ...
}

But that variable (and $section_reader_clause) is never used, so it's sucking up a lot of time doing not much. (It looks like it might've been copy/pasted from building the author clause.) When I remove that call, the query loads in ~320ms rather than 16s.

Including an author in the search also gives good performance since $params['author'] isn't empty then, so we don't try and do big joins and string operations on every single author in the database.

I'm running all of this locally using Docker rather than VirtualBox, but the numbers seem comparable to what I get if I do the same queries against the production site. If you agree that the $section_project_ids variable isn't being used and is sucking up a lot of time, then I'm happy to raise a PR to remove it 🙂

Thanks!

@notartom
Copy link
Member

Hello!

But that variable (and $section_reader_clause) is never used, so it's sucking up a lot of time doing not much. (It looks like it might've been copy/pasted from building the author clause.) When I remove that call, the query loads in ~320ms rather than 16s.

I'm not sure that's true, looking further down at the code I see the following:

                WHERE 1
                ' . $status . '
                ' . $project_type . '
                ' . $title . '
                ' . $author_clause . '
                ' . $reader_clause . '                                                                                
                ' . $language_clause . ' ';

at [1].

On the other hand, $section_reader_clause does appear completely unused, so that looks OK to drop.

I'll happily review any PR that comes in :)

[1] https://github.com/LibriVox/librivox-catalog/blob/master/application/libraries/Librivox_search.php#L203

@garethsime
Copy link
Contributor Author

Awesome, thanks. My description wasn't very clear sorry, I was trying to say that $section_project_ids was the expensive one that isn't used, not that we should remove the whole reader clause :)

garethsime added a commit to garethsime/librivox-catalog that referenced this issue Feb 9, 2023
Fixes LibriVox#155.

Calling `$this->_get_projects_by_section_author($params['author']);`
without an author ID is really expensive, and we just throw the result
away anyway. I also removed a few other unused variables while I was
there, though the performance impact of those were negligible.
garethsime added a commit to garethsime/librivox-catalog that referenced this issue Feb 9, 2023
Fixes LibriVox#155.

Calling `$this->_get_projects_by_section_author($params['author']);`
without an author ID is really expensive, and we just throw the result
away anyway. I also removed a few other unused variables while I was
there, though the performance impact of those was negligible.
notartom pushed a commit that referenced this issue Feb 11, 2023
Fixes #155.

Calling `$this->_get_projects_by_section_author($params['author']);`
without an author ID is really expensive, and we just throw the result
away anyway. I also removed a few other unused variables while I was
there, though the performance impact of those was negligible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants