Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding librivox recordings for a project gutenberg book #200

Closed
doug-wade opened this issue Mar 2, 2024 · 3 comments
Closed

Finding librivox recordings for a project gutenberg book #200

doug-wade opened this issue Mar 2, 2024 · 3 comments

Comments

@doug-wade
Copy link

doug-wade commented Mar 2, 2024

I'm working on a tool to try to find the top x books from project gutenberg matching a search term or topic, and I'm having trouble with false positives -- books that appear in the project gutenberg catalog and in the librivox catalog, but that, when I search them by their title, I don't get a match. For example, when I run the tool for all shelves that contain the substring children, the first result is "A Christmas Carol in Prose; Being a Ghost Story of Christmas by Dickens, Charles (https://www.gutenberg.org/ebooks/46.html.images)". However, in librivox, when I search for this title, I don't get any results, I think because its listed as "A Christmas Carol", rather than "A Christmas Carol in Prose; Being a Ghost Story of Christmas".

I would like to request a new feature, a new search param in the url called projectgutenbergid. I would be able to make a request like:

» curl https://librivox.org/api/feed/audiobooks/?projectgutenbergid\=46\&format\=json\&limit\=1

And get a response like

{
  "books": [
    {
      "id": "140",
      "title": "Christmas Carol",
      "description": "A classic tale of what comes to those whose hearts are hard. In a series of ghostly visits, Scrooge visits his happy past, sees the difficulties of the present, views a bleak future, and in the end amends his mean ways. (Summary written by Kristen McQuillin)",
      "url_text_source": "https://www.gutenberg.org/etext/46",
      "language": "English",
      "copyright_year": "1843",
      "num_sections": "5",
      "url_rss": "https://librivox.org/rss/140",
      "url_zip_file": "https://www.archive.org/download/A_Christmas_Carol/A_Christmas_Carol_64kb_mp3.zip",
      "url_project": "https://en.wikipedia.org/wiki/A_Christmas_Carol",
      "url_librivox": "https://librivox.org/a-christmas-carol-by-charles-dickens/",
      "url_other": "",
      "totaltime": "3:14:29",
      "totaltimesecs": 11669,
      "projectgutenbergid": "46",
      "authors": [
        {
          "id": "91",
          "first_name": "Charles",
          "last_name": "Dickens",
          "dob": "1812",
          "dod": "1870"
        }
      ]
    }
  ]
}

I was looking at the librivox recording details page (for example this one), and I see that in the "links" there is an "online text" link that has the project gutenberg link, which iiuc means we have the data in the database to support such an option, though it might not be 100% accurate, since the librivox folks may have linked to a different version of the online text. (edit: also, it's already in the api as url_text_source 🤣)

If the project is willing to support this feature, I'd be interested in contributing.

@redrun45
Copy link
Collaborator

redrun45 commented Mar 2, 2024

Well, we don't have much in terms of supporting features, and I can't say how much of a priority it would be for other volunteers, but let's talk.

Just to be sure this angle is covered: I see your edit, is there any chance you could reasonably parse and reconstruct the url_text_source to search by? To the best of my (limited) knowledge, Gutenberg IDs don't exist as separate objects in the database, they would need to be parsed out on one end or the other. 😄

I'll note that some of our database entries point to the actual text in one of the various formats Gutenberg makes available, along these lines:
https://www.gutenberg.org/cache/epub/26090/pg26090-images.html

...but most of them are supposed to link to the book's overview page, like so:
https://www.gutenberg.org/ebooks/46

@redrun45
Copy link
Collaborator

@doug-wade - wanted to see if there's something you still need anything from this end, or if that url_text_source is going to do the trick for you. 😃

@redrun45
Copy link
Collaborator

Closing this issue for now. If you didn't get what you needed, do come back and say so. 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants