-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some websites load their text using javascript, so it isn't in the HTML source #3
Comments
@Ankit-Gupta18 Can you take a shot at evaluating the feasibility of using a headless browser on the server to load ref pages? Process:
Possible optimisations / concurrency handling (for future iterations - NOT needed for first pass):
|
@enterprisey @siddharthvp |
I'm not sure what's not clear. The intent is to use a headless browser to load the page to examine its content. This mimics a human opening a browser, and ensures javascripts used by the page get run (which does not happen with a I suggest using playwright library - https://playwright.dev/docs/library. First try running it on your local. Browser can also be launched in headful mode for debugging so you can see what's going on (by passing |
try example code from https://playwright.dev/docs/library |
I want to work on this issue , please assign me this issue. |
Hello, I am interested in contributing to open source projects and would love to participate in any that you have available. I have experience in web Development and am eager to learn and grow my skills through working on these projects. Please let me know if there are any opportunities for me to get involved, I would greatly appreciate it. Thank you! |
steps:
expected:
quote is accepted (because that text is on the page), shows next step
actual:
quote is rejected (because the text is not in the HTML source, but is instead loaded by javascript)
The text was updated successfully, but these errors were encountered: