Skip to content

scholarly.fill() doesnot completely fill author publication entry but it completely fills search publication search snippet. So I am not able to get bibtex citation with scholarly.bibtex() from author publication entry #557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

ssdv1
Copy link

@ssdv1 ssdv1 commented Nov 27, 2024

Fixed issue 556 .

Checklist

  • Check that the base branch is set to develop and not main.
  • Ensure that the documentation will be consistent with the code upon merging.
  • Add a line or a few lines that check the new features added.
  • Ensure that unit tests pass.
    If you don't have a premium proxy, some of the tests will be skipped.
    The tests that are run should pass without raising
    MaxTriesExceededException or other exceptions.

@mindeye33
Copy link

Tested this and it works. Please accept pull request :D

Copy link
Collaborator

@arunkannawadi arunkannawadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ssdv1 for the PR and @mindeye33 for testing this! We definitely need more testers. My only hesitation to accept this as is is that it adds two extra scraper calls to scholar database which has stronger anti-scraping policies. It increases the number of API calls for those who use premium proxy and increases the chances of failure (which is already quite high) for users with cheaper proxies (like FreeProxy). I'd be happy if we could do this only if the user explicitly asks to fill in this information.

@mindeye33
Copy link

Thanks for your consideration and time. I was actually considering removing that comment, because exactly what you said, it increases the calls and thus likely to be blocked by google's aggressive anti-bot policy. My hack to get this to work is to add long waits before and after the added blocks. This way it tends to scrape a few more papers before it gets blocked.

More details on what I wanted to do and how it has gone: I wanted to scrape my personal google scholar for bibtex files so that they are ordered the same way they show up in google scholar sorted by year. My google scholar has 20 publications, and running scholarly with this PR kept getting blocked to death by google. I made a python script to save intermediate runs in json file to keep progress before it gets blocked, and used a paid vpn service to switch proxy repeatedly and save progress into json. It took me a week to final retrieve bibtex for those 20 publications, but yeah in hindsight it wasn't worth it lol. Although this PR was the most helpful in getting the bibtex, it was the main cause for getting blocked. Also, "FreeProxy" was not helping at all and made getting blocked much faster, so the paid VPN subscription is what saved the day for me.

@ssdv1
Copy link
Author

ssdv1 commented Apr 28, 2025

Hey @mindeye33 thanks for testing my code. Unfortunately those API calls had to be added in order to get bibtex, looks like there is no way around it. @arunkannawadi I can refactor the code to only do it if user wants to, I dont know how you want me to go about it since documentation should be changed. One option is to accept argument while filling it, since we are worried api calls a better option is to change bibtex function instead of fill, this way api calls will be made only when user wants bibtex and calls that function. This is a better option since without this change bibtex function is anyway throwing error when prompting to fill author publication entry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants